Commit Graph

1852 Commits

Author SHA1 Message Date
Brian Behlendorf
325f023544 Add linux kernel device support
This branch contains the majority of the changes required to cleanly
intergrate with Linux style special devices (/dev/zfs).  Mainly this
means dropping all the Solaris style callbacks and replacing them
with the Linux equivilants.

This patch also adds the onexit infrastructure needed to track
some minimal state between ioctls.  Under Linux it would be easy
to do this simply using the file->private_data.  But under Solaris
they apparent need to pass the file descriptor as part of the ioctl
data and then perform a lookup in the kernel.  Once again to keep
code change to a minimum I've implemented the Solaris solution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:50 -07:00
Brian Behlendorf
47d0ed1e6f Add linux spl debug support
Use spl debug if HAVE_SPL defined

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:50 -07:00
Brian Behlendorf
d2c15e84e9 Add linux mlslabel support
The ZFS update to onnv_141 brought with it support for a
security label attribute called mlslabel.  This feature
depends on zones to work correctly and thus I am disabling
it under Linux.  Equivilant functionality could be added
at some point in the future.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:49 -07:00
Brian Behlendorf
266852767f Add linux events
This topic branch leverages the Solaris style FMA call points
in ZFS to create a user space visible event notification system
under Linux.  This new system is called zevent and it unifies
all previous Solaris style ereports and sysevent notifications.

Under this Linux specific scheme when a sysevent or ereport event
occurs an nvlist describing the event is created which looks almost
exactly like a Solaris ereport.  These events are queued up in the
kernel when they occur and conditionally logged to the console.
It is then up to a user space application to consume the events
and do whatever it likes with them.

To make this possible the existing /dev/zfs ABI has been extended
with two new ioctls which behave as follows.

* ZFS_IOC_EVENTS_NEXT
Get the next pending event.  The kernel will keep track of the last
event consumed by the file descriptor and provide the next one if
available.  If no new events are available the ioctl() will block
waiting for the next event.  This ioctl may also be called in a
non-blocking mode by setting zc.zc_guid = ZEVENT_NONBLOCK.  In the
non-blocking case if no events are available ENOENT will be returned.
It is possible that ESHUTDOWN will be returned if the ioctl() is
called while module unloading is in progress.  And finally ENOMEM
may occur if the provided nvlist buffer is not large enough to
contain the entire event.

* ZFS_IOC_EVENTS_CLEAR
Clear are events queued by the kernel.  The kernel will keep a fairly
large number of recent events queued, use this ioctl to clear the
in kernel list.  This will effect all user space processes consuming
events.

The zpool command has been extended to use this events ABI with the
'events' subcommand.  You may run 'zpool events -v' to output a
verbose log of all recent events.  This is very similar to the
Solaris 'fmdump -ev' command with the key difference being it also
includes what would be considered sysevents under Solaris.  You
may also run in follow mode with the '-f' option.  To clear the
in kernel event queue use the '-c' option.

$ sudo cmd/zpool/zpool events -fv
TIME                        CLASS
May 13 2010 16:31:15.777711000 ereport.fs.zfs.config.sync
        class = "ereport.fs.zfs.config.sync"
        ena = 0x40982b7897700001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xed976600de75dfa6
        (end detector)

        time = 0x4bec8bc3 0x2e5aed98
        pool = "zpios"
        pool_guid = 0xed976600de75dfa6
        pool_context = 0x0

While the 'zpool events' command is handy for interactive debugging
it is not expected to be the primary consumer of zevents.  This ABI
was primarily added to facilitate the addition of a user space
monitoring daemon.  This daemon would consume all events posted by
the kernel and based on the type of event perform an action.  For
most events simply forwarding them on to syslog is likely enough.
But this interface also cleanly allows for more sophisticated
actions to be taken such as generating an email for a failed drive.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:36 -07:00
Brian Behlendorf
c9c0d073da Add build system
Add autoconf style build infrastructure to the ZFS tree.  This
includes autogen.sh, configure.ac, m4 macros, some scripts/*,
and makefiles for all the core ZFS components.
2010-08-31 13:41:27 -07:00
Brian Behlendorf
6656bf5621 Fix stack traverse_visitbp()
Due to  limited stack space recursive functions are frowned upon in
the Linux kernel.  However, they often are the most elegant solution
to a problem.  The following code preserves the recursive function
traverse_visitbp() but moves the local variables AND function
arguments to the heap to minimize the stack frame size.  Enough
space is initially allocated on the stack for 20 levels of recursion.
This change does ugly-up-the-code but it reduces the worst case
usage from roughly 4160 bytes to 960 bytes on x86_64 archs.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:50 -07:00
Ned Bass
da6b4005c9 Fix stack zio_execute()
Implement zio_execute() as a wrapper around the static function
__zio_execute() so that we can force  __zio_execute() to be inlined.
This reduces stack overhead which is important because __zio_execute()
is called recursively in several zio code paths.  zio_execute() itself
cannot be inlined because it is externally visible.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:50 -07:00
Brian Behlendorf
c776b317e4 Fix stack zio_done()
Eliminated local variables pointing to members of the zio struct.
Just refer to the struct members directly.  This saved about 32 bytes per
call, but this function can be called recurisvely up to 19 levels deep,
so we potentially save up to 608 bytes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:50 -07:00
Brian Behlendorf
5fed499def Fix stack vdev_cache_read()
Moving the vdev_cache_entry_t struct ve_search from the stack to
the heap saves ~100 bytes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:49 -07:00
Brian Behlendorf
47050a88ac Fix stack traverse_impl()
Stack use reduced from 560 bytes to 128 bytes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:49 -07:00
Brian Behlendorf
60948de1ef Fix stack noinline
Certain function must never be automatically inlined by gcc because
they are stack heavy or called recursively.  This patch flags all
such functions I've found as 'noinline' to prevent gcc from making
the optimization.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:49 -07:00
Brian Behlendorf
18a89ba43d Fix stack lzjb
Reduce kernel stack usage by lzjb_compress() by moving uint16 array
off the stack and on to the heap.  The exact performance implications
of this I have not measured but we absolutely need to keep stack
usage to a minimum.  If/when this becomes and issue we optimize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:49 -07:00
Brian Behlendorf
bf701a83c5 Fix stack inline
Decrease stack usage for various call paths by forcing certain
functions to be inlined.  By inlining the functions the overhead
of a new stack frame is removed at the cost of increased code size.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:48 -07:00
Brian Behlendorf
161ce7ce3c Fix stack dsl_scan_visitbp()
To reduce stack overhead this topic branch moves the 128 byte
blkptr_t data strucutre in dsl_scan_visitbp() to the heap.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:48 -07:00
Brian Behlendorf
fcf37ec6c2 Fix stack dsl_dir_open_spa()
Reduce stack usage by 256 bytes by moving buf char array from
the stack to the heap.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:48 -07:00
Brian Behlendorf
48c67dc8f8 Fix stack dsl_deleg_get()
Reduce stack usage in dsl_deleg_get, gcc flagged it as consuming a
whopping 1040 bytes or potentially 1/4 of a 4K stack.  This patch
moves all the large structures and buffer off the stack and on to
the heap.  This includes 2 zap_cursor_t structs each 52 bytes in
size, 2 zap_attribute_t structs each 280 bytes in size, and 1
256 byte char array.  The total saves on the stack is 880 bytes
after you account for the 5 new pointers added.

Also the source buffer length has been increased from MAXNAMELEN
to MAXNAMELEN+strlen(MOS_DIR_NAME)+1 as described by the comment in
dsl_dir_name().  A buffer overrun may have been possible with the
slightly smaller buffer.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:48 -07:00
Brian Behlendorf
81a4966389 Fix stack dsl_dataset_destroy()
Move dsl_dataset_t local variable from the stack to the heap.
This reduces the stack usage of this function from 2048 bytes
to 176 bytes for x84_64 arches.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:48 -07:00
Brian Behlendorf
a8ac8e715e Fix stack dmu_objset_snapshot()
Reduce stack usage by 276 bytes by moving the snaparg struct from the
stack to the heap.  We have limited stack space we must not waste.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:47 -07:00
Brian Behlendorf
fc5bb51f08 Fix stack dbuf_hold_impl()
This commit preserves the recursive function dbuf_hold_impl() but moves
the local variables and function arguments to the heap to minimize
the stack frame size.  Enough space is initially allocated on the
stack for 20 levels of recursion.  This technique was based on commit
34229a2f2ac07363f64ddd63e014964fff2f0671 which reduced stack usage of
traverse_visitbp().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:47 -07:00
Brian Behlendorf
5ac1241a95 Fix dnode_move() scope
The dnode_move() functionality is only used in the kernel build.
As such we should be careful to wrap all of the related code
with '#ifdef _KERNEL' to avoid gcc warnings about unused code.
2010-08-31 08:38:47 -07:00
Brian Behlendorf
8a8f5c6b3c Fix zfs_ioc_objset_stats
Interestingly this looks like an upstream bug as well.  If for some
reason we are unable to get a zvols statistics, because perhaps the
zpool is hopelessly corrupt, we would trigger the VERIFY.  This
commit adds the proper error handling just to propagate the error
back to user space.  Now the user space tools still must handle this
properly but in the worst case the tool will crash or perhaps have
some missing output.  That's far far better than crashing the host.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:47 -07:00
Brian Behlendorf
5cc556b447 Fix zio_taskq_dispatch to use TQ_NOSLEEP
The zio_taskq_dispatch() function may be called at interrupt time
and it is critical that we never sleep.

Additionally, wrap taskq_dispatch() in a while loop because it may
fail.  This is non optimal but is OK for now.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:46 -07:00
Brian Behlendorf
ef5319df8e Fix rw_init() usage
Properly initialize rwlock primitives.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:46 -07:00
Brian Behlendorf
eaa8687be3 Fix zmod.h usage in userspace
Do not use zmod.h in userspace.

This has also been filed with the ZFS team. It makes the userspace
libzpool code use the zlib API, instead of the Solaris-only and
non-standard zmod.h.  The zlib API is almost identical and is a de
facto standard, so this is a no-brainer.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:46 -07:00
Brian Behlendorf
3f50448292 Fix missing newlines
Add missing \n's to dprintf()s

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:46 -07:00
Brian Behlendorf
22c81dd8a9 Fix metaslab
If your only going to allow one allocator to be used and it is defined
at compile time there is no point including the others in the build.
This patch could/should be refined for Linux to make the metaslab
configurable at run time.  That might be a bit tricky however since
you would need to quiese all IO.  Short of that making it configurable
as a module load option would be a reasonable compromise.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:45 -07:00
Brian Behlendorf
98f72a539c Fix list handling to only use the API
Remove all instances of list handling where the API is not used
and instead list data members are directly accessed.  Doing this
sort of thing is bad for portability.

Additionally, ensure that list_link_init() is called on newly
created list nodes.  This ensures the node is properly initialized
and does not rely on the assumption that zero'ing the list_node_t
via kmem_zalloc() is the same as proper initialization.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:45 -07:00
Brian Behlendorf
59e6e7ca85 Fix kstat xuio
Move xiou stat structures from a header to the dmu.c source as is
done with all the other kstat interfaces.  This information is local
to dmu.c registered the xuio kstat and should stay that way.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:45 -07:00
Brian Behlendorf
754c6663a3 Fix dbuf eviction assertion
Replace non-fatal assertion with warning.  This was being observed
during testing and it should not be fatal.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:45 -07:00
Brian Behlendorf
753972fccf Fix dbuf_dirty_record_t leaks
Fix two leaks with dbuf_dirty_record_t

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:44 -07:00
Brian Behlendorf
5631c03889 Fix variables named current
In the linux kernel 'current' is defined to mean the current process
and can never be used as a local variable in a function.  Simply
replace all usage of 'current' with 'curr' in this function.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:44 -07:00
Ricardo M. Correia
090ff0929e Fix commit callbacks
The upstream commit cb code had a few bugs:

1) The arguments of the list_move_tail() call in txg_dispatch_callbacks()
were reversed by mistake. This caused the commit callbacks to not be
called at all.

2) ztest had a bug in ztest_dmu_commit_callbacks() where "error" was not
initialized correctly. This seems to have caused the test to always take
the simulated error code path, which made ztest unable to detect whether
commit cbs were being called for transactions that successfuly complete.

3) ztest had another bug in ztest_dmu_commit_callbacks() where the commit
cb threshold was not being compared correctly.

4) The commit cb taskq was using 'max_ncpus * 2' as the maxalloc argument
of taskq_create(), which could have caused unnecessary delays in the txg
sync thread.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:44 -07:00
Brian Behlendorf
d4ed667343 Fix gcc uninitialized variable warnings
Gcc -Wall warn: 'uninitialized variable'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:43 -07:00
Brian Behlendorf
1fde1e3720 Fix gcc unused variable warnings
Gcc -Wall warn: 'unused variable'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:43 -07:00
Brian Behlendorf
c65aa5b2b9 Fix gcc missing parenthesis warnings
Gcc -Wall warn: 'missing parenthesis'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:35 -07:00
Brian Behlendorf
e75c13c353 Fix gcc missing case warnings
Gcc ASSERT() missing cases are impossible

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:34:03 -07:00
Brian Behlendorf
2598c0012d Fix gcc missing braces warnings
Resolve compiler warnings concerning missing braces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:34:03 -07:00
Brian Behlendorf
0bc8fd7884 Fix gcc invalid prototype warnings
Gcc -Wall warn: 'invalid prototype'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:34:03 -07:00
Ricardo M. Correia
e5dc681a50 Fix gcc ident pragma warnings
Remove all ident pragmas which are unknown to gcc.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:34:02 -07:00
Brian Behlendorf
f709a82dc1 Fix gcc useless debug warnings
Gcc useless debugging.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:34:01 -07:00
Brian Behlendorf
b8864a233c Fix gcc cast warnings
Gcc -Wall warn: 'lacks a cast'
Gcc -Wall warn: 'comparison between pointer and integer'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:33:32 -07:00
Brian Behlendorf
d6320ddb78 Fix gcc c90 compliance warnings
Fix non-c90 compliant code, for the most part these changes
simply deal with where a particular variable is declared.
Under c90 it must alway be done at the very start of a block.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:28:32 -07:00
Ricardo M. Correia
c5b3a7bbcc Fix gcc 64-bit constant warnings
Add 'ull' suffix to 64-bit constants.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-26 15:18:01 -07:00
Brian Behlendorf
572e285762 Update to onnv_147
This is the last official OpenSolaris tag before the public
development tree was closed.
2010-08-26 14:24:34 -07:00
Brian Behlendorf
428870ff73 Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
Brian Behlendorf
fa42225a3d Add Solaris FMA style support 2010-04-29 10:37:15 -07:00
Brian Behlendorf
0aa61e8427 Remove zvol.c when updating in update-zfs.sh Linux version available. 2009-11-15 16:20:01 -08:00
Brian Behlendorf
45d1cae3b8 Rebase master to b121 2009-08-18 11:43:27 -07:00
Brian Behlendorf
9babb37438 Rebase master to b117 2009-07-02 15:44:48 -07:00
Brian Behlendorf
d164b20935 Rebase master to b108 2009-02-18 12:51:31 -08:00
Brian Behlendorf
fb5f0bc833 Rebase master to b105 2009-01-15 13:59:39 -08:00
Brian Behlendorf
172bb4bd5e Move the world out of /zfs/ and seperate out module build tree 2008-12-11 11:08:09 -08:00