Commit Graph

161 Commits

Author SHA1 Message Date
Darik Horn
c95b308d12 Correct typos in the spl proc handler.
Correct a format typo that causes /proc/sys/kernel/spl/hostid
to return a decimal number instead of a hexadecimal number.
2011-04-24 20:56:07 -05:00
Darik Horn
5b8f76ea16 Make the SPL kernel messages consistent with ZFS.
Change the SPL kernel messages for module loading and module
unloading so that they are similar to the ZFS kernel messages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-04-21 09:41:13 -07:00
Darik Horn
ad35b6a6e9 Remove the gawk dependency.
This reverts commit 1814251453.

Demote the gawk call back to awk and ensure that stderr is attached.  GNU gawk
tolerates a missing stderr handle, but many utilities do not, which could be
why a regular awk call was unexplainably failing on some systems.

Use argv[0] instead of sh_path for consistency internally and with other Linux
drivers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-04-21 09:41:09 -07:00
Darik Horn
fa6f7d8f9d Import spl_hostid as a module parameter.
Provide a call_usermodehelper() alternative by letting the hostid be passed as
a module parameter like this:

  $ modprobe spl spl_hostid=0x12345678

Internally change the spl_hostid variable to unsigned long because that is the
type that the coreutils /usr/bin/hostid returns.

Move the hostid command into GET_HOSTID_CMD for consistency with the similar
GET_KALLSYMS_ADDR_CMD invocation.

Use argv[0] instead of sh_path for consistency internally and with other Linux
drivers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-04-21 09:41:01 -07:00
Brian Behlendorf
3dfc591ac4 Linux 2.6.39 compat, zlib_deflate_workspacesize()
The function zlib_deflate_workspacesize() now take 2 arguments.
This was done to avoid always having to allocate the maximum size
workspace (268K).  The caller can now specific the windowBits and
memLevel compression parameters to get a smaller workspace.

For our purposes we introduce a spl_zlib_deflate_workspacesize()
wrapper which accepts both arguments.  When the two argument
version of zlib_deflate_workspacesize() is available the arguments
are passed through.  When it's not we assume the worst case and
a maximally sized workspace is used.
2011-04-20 14:39:15 -07:00
Brian Behlendorf
b1cbc4610c Linux 2.6.39 compat, kern_path_parent()
The path_lookup() function has been renamed to kern_path_parent()
and the flags argument has been removed.  The only behavior now
offered is that of LOOKUP_PARENT.  The spl already always passed
this flag so dropping the flag does not impact us.
2011-04-20 12:30:17 -07:00
Brian Behlendorf
83c623aa1a Linux 2.6.39 compat, DEFINE_SPINLOCK()
This is a long over due compatibility change.  Way, way, way back
in 2007 there was a push to remove all consumers of SPIN_LOCK_UNLOCKED.
Finally, in 2011 with 2.6.39 all the consumers have been updated
and SPIN_LOCK_UNLOCKED was removed.  It's about time we use the
new API as well, this change does exactly that.  DEFINE_SPINLOCK()
was available as far back as 2.6.12 so there doesn't need to be
any additional autoconf-foo for this change.
2011-04-20 12:01:11 -07:00
Brian Behlendorf
98e2afd1c5 Fix unused variable
Flagged by the default -Wunused-but-set-variable gcc option when
running under Fedora 15.  Since it's correct this variable is
entirely unused this commit removes it.
2011-04-19 09:45:36 -07:00
Brian Behlendorf
9b0f9079d2 Linux 2.6.39 compat, invalidate_inodes()
To resolve a potiential filesystem corruption issue a second
argument was added to invalidate_inodes().  This argument controls
whether dirty inodes are dropped or treated as busy when invalidating
a super block.  When only the legacy API is available the second
argument will be dropped for compatibility.
2011-04-19 09:08:08 -07:00
Brian Behlendorf
e76f4bf11d Add dnlc_reduce_cache() support
Provide the dnlc_reduce_cache() function which attempts to prune
cached entries from the dcache and icache.  After the entries are
pruned any slabs which they may have been using are reaped.

Note the API takes a reclaim percentage but we don't have easy
access to the total number of cache entries to calculate the
reclaim count.  However, in practice this doesn't need to be
exactly correct.  We simply need to reclaim some useful fraction
(but not all) of the cache.  The caller can determine if more
needs to be done.
2011-04-06 20:06:03 -07:00
Brian Behlendorf
3336e29cc2 Add slab usage summeries to /proc
One of the most common things you want to know when looking at
the slab is how much memory is being used.  This information was
available in /proc/spl/kmem/slab but only on a per-slab basis.
This commit adds the following /proc/sys/kernel/spl/kmem/slab*
entries to make total slab usage easily available at a glance.

  slab_kmem_total - Total kmem slab size
  slab_kmem_avail - Alloc'd kmem slab size
  slab_kmem_max   - Max observed kmem slab size
  slab_vmem_total - Total vmem slab size
  slab_vmem_avail - Alloc'd vmem slab size
  slab_vmem_max   - Max observed vmem slab size

NOTE: The slab_*_max values are expected to over report because
they show maximum values since boot, not current values.
2011-04-06 20:06:03 -07:00
Brian Behlendorf
d0a1038ff3 Update /proc/spl/kmem/slab output
The 'slab_fail', 'slab_create', and 'slab_destroy' columns in the slab
output have been removed because they are virtually always zero and
not very useful.

The much more useful 'size' and 'alloc' columns have been added which
show the total slab size and how much of the total size has been
allocated to objects.

Finally, the formatting has been updated to be much more human
readable while still being friendly for tool like awk to parse.
2011-04-06 20:06:03 -07:00
Brian Behlendorf
495bd532ab Linux shrinker compat
The Linux shrinker has gone through three API changes since 2.6.22.
Rather than force every caller to understand all three APIs this
change consolidates the compatibility code in to the mm-compat.h
header.  The caller then can then use a single spl provided
shrinker API which does the right thing for your kernel.

SPL_SHRINKER_CALLBACK_PROTO(shrinker_callback, cb, nr_to_scan, gfp_mask);
SPL_SHRINKER_DECLARE(shrinker_struct, shrinker_callback, seeks);
spl_register_shrinker(&shrinker_struct);
spl_unregister_shrinker(&&shrinker_struct);
spl_exec_shrinker(&shrinker_struct, nr_to_scan, gfp_mask);
2011-04-06 20:06:03 -07:00
Brian Behlendorf
734fcac78d Add crgetfsuid()/crgetfsgid() helpers
Solaris credentials don't have an fsuid/fsguid field but Linux
credentials do.  To handle this case the Solaris API is being
modestly extended to include the crgetfsuid()/crgetfsgid()
helper functions.

Addititionally, because the crget*() helpers are implemented
identically regardless of HAVE_CRED_STRUCT they have been
moved outside the #ifdef to common code.  This simplification
means we only have one version of the helper to keep to to date.
2011-03-22 12:18:44 -07:00
Brian Behlendorf
2092cf68d8 Disable vmalloc() direct reclaim
As part of vmalloc() a __pte_alloc_kernel() allocation may occur.  This
internal allocation does not honor the gfp flags passed to vmalloc().
This means even when vmalloc(GFP_NOFS) is called it is possible that a
synchronous reclaim will occur.  This reclaim can trigger file IO which
can result in a deadlock.  This issue can be avoided by explicitly
setting PF_MEMALLOC on the process to subvert synchronous reclaim when
vmalloc() is called with !__GFP_FS.

An example stack of the deadlock can be found here (1), along with the
upstream kernel bug (2), and the original bug discussion on the
linux-mm mailing list (3).  This code can be properly autoconf'ed
when the upstream bug is fixed.

1) http://github.com/behlendorf/zfs/issues/labels/Vmalloc#issue/133
2) http://bugzilla.kernel.org/show_bug.cgi?id=30702
3) http://marc.info/?l=linux-mm&m=128942194520631&w=4
2011-03-20 15:12:08 -07:00
Brian Behlendorf
47995fa691 Remove xvattr support
The xvattr support in the spl has always simply consisted of
defining a couple structures and a few #defines.  This was enough
to enable compilation of code which just passed xvattr types
around but not enough to effectively manipulate them.

This change removes even this minimal support leaving it up
to packages which leverage the spl to prove the full xvattr
support.  By removing it from the spl we ensure not conflict
with the higher level packages.

This just leaves minimal vnode support for basical manipulation
of files.  This code is does have the proper support functions
in the spl and a set of regression tests.

Additionally, this change removed the unused 'caller_context_t *'
type and replaces it with a 'void *'.
2011-03-02 11:34:46 -08:00
Brian Behlendorf
19c1eb829d Add zlib regression test
A zlib regression test has been added to verify the correct behavior
of z_compress_level() and z_uncompress.  The test case simply takes
a 128k buffer, it compresses the buffer, it them uncompresses the
buffer, and finally it compares the buffers after the transform.
If the buffers match then everything is fine and no data was lost.
It performs this test for all 9 zlib compression levels.
2011-02-25 16:56:46 -08:00
Brian Behlendorf
5c1967ebe2 Fix zlib compression
While portions of the code needed to support z_compress_level() and
z_uncompress() where in place.  In reality the current implementation
was non-functional, it just was compilable.

The critical missing component was to setup a workspace for the
compress/uncompress stream structures to use.  A kmem_cache was
added for the workspace area because we require a large chunk
of memory.  This avoids to need to continually alloc/free this
memory and vmap() the pages which is very slow.  Several objects
will reside in the per-cpu kmem_cache making them quick to acquire
and release.  A further optimization would be to adjust the
implementation to additional ensure the memory is local to the cpu.
Currently that may not be the case.
2011-02-25 16:56:22 -08:00
Brian Behlendorf
914b063133 Linux compat 2.6.37, invalidate_inodes()
In the 2.6.37 kernel the function invalidate_inodes() is no longer
exported for use by modules.  This memory management functionality
is needed to invalidate the inodes attached to a super block without
unmounting the filesystem.

Because this function still exists in the kernel and the prototype
is available is a common header all we strictly need is the symbol
address.  The address is obtained using spl_kallsyms_lookup_name()
and assigned to the variable invalidate_inodes_fn.  Then a #define
is used to replace all instances of invalidate_inodes() with a
call to the acquired address.  All the complexity is hidden behind
HAVE_INVALIDATE_INODES and invalidate_inodes() can be used as usual.

Long term we should try to get this, or another, interface made
available to modules again.
2011-02-23 12:44:32 -08:00
Brian Behlendorf
d599e4fa79 Block in cv_destroy() on all waiters
Previously we would ASSERT in cv_destroy() if it was ever called
with active waiters.  However, I've now seen several instances in
OpenSolaris code where they do the following:

  cv_broadcast();
  cv_destroy();

This leaves no time for active waiters to be woken up and scheduled
and we trip the ASSERT.  This has not been observed to be an issue
on OpenSolaris because their cv_destroy() basically does nothing.
They still do run the risk of the memory being free'd after the
cv_destroy() and hitting a bad paging request.  But in practice
this race is so small and unlikely it either doesn't happen, or
is so unlikely when it does happen the root cause has not yet been
identified.

Rather than risk the same issue in our code this change updates
cv_destroy() to block until all waiters have been woken and
scheduled.  This may take some time because each waiter must
acquire the mutex.

This change may have an impact on performance for frequently
created and destroyed condition variables.  That however is a price
worth paying it avoid crashing your system.  If performance issues
are observed they can be addressed by the caller.
2011-02-04 14:09:08 -08:00
Brian Behlendorf
647fa73cf3 Remove VN_HOLD/VN_RELE/VOP_PUTPAGE
Previously these were defined to noops but rather than give
the misleading impression that these are actually implemented
I'm removing the type entirely for clarity.
2011-01-12 11:38:05 -08:00
Brian Behlendorf
a5b40eed17 Make vn_cache|vn_file_cache kmem caches
Both of these caches were previously allowed to be either a
vmem or kmem cache based on the size of the object involved.
Since we know the object won't be to large and performce is
much better for a kmem cache for them to be kmem backed.
2011-01-12 11:38:05 -08:00
Brian Behlendorf
dcd9cb5a17 Clean vattr_t and vsecattr_t types
Minor cleanup for the vattr_t and vsecattr_t types.
2011-01-12 11:38:04 -08:00
Brian Behlendorf
4295b530ee Add vn_mode_to_vtype/vn_vtype to_mode helpers
Add simple helpers to convert a vnode->v_type to a inode->i_mode.
These should be used sparingly but they are handy to have.
2011-01-12 11:38:04 -08:00
Neependra Khare
3f688a8c38 Add cv_timedwait_interruptible() function
The cv_timedwait() function by definition must wait unconditionally
for cv_signal()/cv_broadcast() before waking.  This causes processes
to go in the D state which increases the load average.  The load
average is the summation of processes in D state and run queue.

To avoid this it can be desirable to sleep interruptibly.  These
processes do not count against the load average but may be woken by
a signal.  It is up to the caller to determine why the process
was woken it may be for one of three reasons.

  1) cv_signal()/cv_broadcast()
  2) the timeout expired
  3) a signal was received

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-01-11 12:14:48 -08:00
Brian Behlendorf
6bf4d76f47 Linux Compat: inode->i_mutex/i_sem
Create spl_inode_lock/spl_inode_unlock compability macros to simply
access to the inode mutex/sem.  This avoids the need to have to ugly
up the code with the required #define's at every call site.  At the
moment the SPL only uses this in one place but higher layers can
benefit from the macro.
2011-01-11 12:14:48 -08:00
Brian Behlendorf
b7dc313837 Add Thread Specific Data (TSD) Regression Test
To validate the correct behavior of the TSD interfaces it's
important that we add a regression test.  This test is designed
to minimally exercise the fundamental TSD behavior, it does not
attempt to validate all potential corner cases.

The test will first create 32 keys via tsd_create() and register
a common destructor.  Next 16 wait threads will be created each
of which set/verify a random value for all 32 keys, then block
waiting to be released by the control thread.  Meanwhile the
control thread verifies that none of the destructors have been
run prematurely.

The next phase of the test is to create 16 exit threads which
set/verify a random value for all 32 keys.  They then immediately
exit.  This is is designed to verify tsd_exit() which will be
called via thread_exit().  This must result in all registered
destructors being run and the memory for the tsd being free'd.

After this tsd_destroy() is verified by destroying all 32 keys.
Once again we must see the expected number of destructors run
and the tsd memory free'd.  At this point the blocked threads
are released and they exit calling tsd_exit() which should do
very little since all the tsd has already been destroyed.

If this all goes off without a hitch the test passes.  To ensure
no memory has been leaked, I have manually verified that after
spl module unload no memory is reported leaked.
2010-12-07 10:02:44 -08:00
Brian Behlendorf
9fe45dc1ac Add Thread Specific Data (TSD) Implementation
Thread specific data has implemented using a hash table, this avoids
the need to add a member to the task structure and allows maximum
portability between kernels.  This implementation has been optimized
to keep the tsd_set() and tsd_get() times as small as possible.

The majority of the entries in the hash table are for specific tsd
entries.  These entries are hashed by the product of their key and
pid because by design the key and pid are guaranteed to be unique.
Their product also has the desirable properly that it will be uniformly
distributed over the hash bins providing neither the pid nor key is zero.
Under linux the zero pid is always the init process and thus won't be
used, and this implementation is careful to never to assign a zero key.
By default the hash table is sized to 512 bins which is expected to
be sufficient for light to moderate usage of thread specific data.

The hash table contains two additional type of entries.  They first
type is entry is called a 'key' entry and it is added to the hash during
tsd_create().  It is used to store the address of the destructor function
and it is used as an anchor point.  All tsd entries which use the same
key will be linked to this entry.  This is used during tsd_destory() to
quickly call the destructor function for all tsd associated with the key.
The 'key' entry may be looked up with tsd_hash_search() by passing the
key you wish to lookup and DTOR_PID constant as the pid.

The second type of entry is called a 'pid' entry and it is added to the
hash the first time a process set a key.  The 'pid' entry is also used
as an anchor and all tsd for the process will be linked to it.  This
list is using during tsd_exit() to ensure all registered destructors
are run for the process.  The 'pid' entry may be looked up with
tsd_hash_search() by passing the PID_KEY constant as the key, and
the process pid.  Note that tsd_exit() is called by thread_exit()
so if your using the Solaris thread API you should not need to call
tsd_exit() directly.
2010-12-07 10:02:32 -08:00
Brian Behlendorf
058de03caa Clear cv->cv_mutex when not in use
For debugging purposes the condition varaibles keep track of the
mutex used during a wait.  The idea is to validate that all callers
always use the same mutex.  Unfortunately, we have seen cases where
the caller reuses the condition variable with a different mutex but
in a way which is known to be safe.  My reading of the man pages
suggests you should not do this and always cv_destroy()/cv_init()
a new mutex.  However, there is overhead in doing this and it does
appear to be allowed under Solaris.

To accomidate this behavior cv_wait_common() and __cv_timedwait()
have been modified to clear the associated mutex when the last
waiter is dropped.  This ensures that while the condition variable
is in use the incorrect mutex case is detected.  It also allows the
condition variable to be safely recycled without requiring the
overhead of a cv_destroy()/cv_init() as long as it isn't currently
in use.

Finally, spin lock cv->cv_lock was removed because it is not required.
When the condition variable is used properly the caller will always
be holding the mutex so the spin lock is redundant.  The lock was
originally added because I expected to need to protect more than
just the cv->cv_mutex.  It turns out that was not the case.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-11-29 11:02:34 -08:00
Brian Behlendorf
8655ce492f Linux 2.6.36 compat, use fops->unlocked_ioctl()
As of linux-2.6.36 the last in-tree consumer of fops->ioctl() has
been removed and thus fops()->ioctl() has also been removed.  The
replacement hook is fops->unlocked_ioctl() which has existed in
kernel since 2.6.12.  Since the SPL only contains support back
to 2.6.18 vintage kernels, I'm not adding an autoconf check for
this and simply moving everything to use fops->unlocked_ioctl().
2010-11-10 13:16:12 -08:00
Brian Behlendorf
9b2048c26b Linux 2.6.36 compat, fs_struct->lock type change
In the linux-2.6.36 kernel the fs_struct lock was changed from a
rwlock_t to a spinlock_t.  If the kernel would export the set_fs_pwd()
symbol by default this would not have caused us any issues, but they
don't.  So we're forced to add a new autoconf check which sets the
HAVE_FS_STRUCT_SPINLOCK define when a spinlock_t is used.  We can
then correctly use either spin_lock or write_lock in our custom
set_fs_pwd() implementation.
2010-11-09 13:29:47 -08:00
Brian Behlendorf
1e18307b61 Fix incorrect krw_type_t type
Flagged by the default compile options on archlinux 2010.05, we should
be using the krw_t type not the krw_type_t type in the private data.

  module/splat/splat-rwlock.c: In function ‘splat_rwlock_test4_func’:
  module/splat/splat-rwlock.c:432:6: warning: case value ‘1’ not in
  enumerated type ‘krw_type_t’
2010-11-09 10:18:01 -08:00
Brian Behlendorf
23aa63cbf5 Fix 2.6.35 shrinker callback API change
As of linux-2.6.35 the shrinker callback API now takes an additional
argument.  The shrinker struct is passed to the callback so that users
can embed the shrinker structure in private data and use container_of()
to access it.  This removes the need to always use global state for the
shrinker.

To handle this we add the SPL_AC_3ARGS_SHRINKER_CALLBACK autoconf
check to properly detect the API.  Then we simply setup a callback
function with the correct number of arguments.  For now we do not make
use of the new 3rd argument.
2010-10-22 14:51:26 -07:00
Brian Behlendorf
a7958f7eef Support custom build directories
One of the neat tricks an autoconf style project is capable of
is allow configurion/building in a directory other than the
source directory.  The major advantage to this is that you can
build the project various different ways while making changes
in a single source tree.

For example, this project is designed to work on various different
Linux distributions each of which work slightly differently.  This
means that changes need to verified on each of those supported
distributions perferably before the change is committed to the
public git repo.

Using nfs and custom build directories makes this much easier.
I now have a single source tree in nfs mounted on several different
systems each running a supported distribution.  When I make a
change to the source base I suspect may break things I can
concurrently build from the same source on all the systems each
in their own subdirectory.

wget -c http://github.com/downloads/behlendorf/spl/spl-x.y.z.tar.gz
tar -xzf spl-x.y.z.tar.gz
cd spl-x-y-z

------------------------- run concurrently ----------------------
<ubuntu system>  <fedora system>  <debian system>  <rhel6 system>
mkdir ubuntu     mkdir fedora     mkdir debian     mkdir rhel6
cd ubuntu        cd fedora        cd debian        cd rhel6
../configure     ../configure     ../configure     ../configure
make             make             make             make
make check       make check       make check       make check

This is something the project has almost supported for a long time
but finishing this support should save me lots of time.
2010-09-05 21:49:05 -07:00
Brian Behlendorf
2b3543025c Stub out kmem cache defrag API
At some point we are going to need to implement the kmem cache
move callbacks to allow for kmem cache defragmentation.  This
commit simply lays a small part of the API ground work, it does
not actually implement any of this feature.  This is safe for
now because the move callbacks are just an optimization.  Even
if they are registered we don't ever really have to call them.
2010-08-27 14:23:42 -07:00
Li Wei
4be55565fe Fix stack overflow in vn_rdwr() due to memory reclaim
Unless __GFP_IO and __GFP_FS are removed from the file mapping gfp
mask we may enter memory reclaim during IO.  In this case shrink_slab()
entered another file system which is notoriously hungry for stack.
This additional stack usage may cause a stack overflow.  This patch
removes __GFP_IO and __GFP_FS from the mapping gfp mask of each file
during vn_open() to avoid any reclaim in the vn_rdwr() IO path.  The
original mask is then restored at vn_close() time.  Hats off to the
loop driver which does something similiar for the same reason.

  [...]
  shrink_slab+0xdc/0x153
  try_to_free_pages+0x1da/0x2d7
  __alloc_pages+0x1d7/0x2da
  do_generic_mapping_read+0x2c9/0x36f
  file_read_actor+0x0/0x145
  __generic_file_aio_read+0x14f/0x19b
  generic_file_aio_read+0x34/0x39
  do_sync_read+0xc7/0x104
  vfs_read+0xcb/0x171
  :spl:vn_rdwr+0x2b8/0x402
  :zfs:vdev_file_io_start+0xad/0xe1
  [...]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-12 09:34:33 -07:00
Ricardo M. Correia
26f7245c7c Fix taskq code to not drop tasks when TQ_SLEEP is used.
When TQ_SLEEP is used, taskq_dispatch() should always succeed even if the
number of pending tasks is above tq->tq_maxalloc. This semantic is similar
to KM_SLEEP in kmem allocations, which also always succeed.

However, we cannot block forever otherwise there is a risk of deadlock.
Therefore, we still allow the number of pending tasks to go above
tq->tq_maxalloc with TQ_SLEEP, but we may sleep up to 1 second per task
dispatch, thereby throttling the task dispatch rate.

One of the existing splat tests was also augmented to test for this scenario.
The test would fail with the previous implementation but now it succeeds.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-02 11:20:31 -07:00
Brian Behlendorf
41f84a8d56 Strfree() should call kfree() not kmem_free()
Using kmem_free() results in deducting X bytes from the memory
accounting when --enable-debug is set.  Unfortunately, currently
the counterpart kmem_asprintf() and friends do not properly
account for memory allocated, so we must do the same on free.
If we don't then we end up with a negative number of lost bytes
reported when the module is unloaded.

A better long term fix would be to add the accounting in to the
allocation side but that's a project for another day.
2010-07-30 22:20:58 -07:00
Brian Behlendorf
099dc9c2d2 Add uninstall Makefile targets
Extend the Makefiles with an uninstall target to cleanly
remove a package which was installed with 'make install'.

Additionally, ensure a 'depmod -a' is run as part of the
install to update the module dependency information.
2010-07-28 14:55:32 -07:00
Brian Behlendorf
10129680f8 Ensure kmem_alloc() and vmem_alloc() never fail
The Solaris semantics for kmem_alloc() and vmem_alloc() are that they
must never fail when called with KM_SLEEP.  They may only fail if
called with KM_NOSLEEP otherwise they must block until memory is
available.  This is quite different from how the Linux memory
allocators work, under Linux a memory allocation failure is always
possible and must be dealt with.

At one point in the past the kmem code did properly implement this
behavior, however as the code evolved this behavior was overlooked
in places.  This patch goes through all three implementations of
the kmem/vmem allocation functions and ensures that they will all
block in the KM_SLEEP case when memory is not available.  They
may still fail in the KM_NOSLEEP case in which case the caller
is responsible for handling the failure.

Special care is taken in vmalloc_nofail() to avoid thrashing the
system on the virtual address space spin lock.  The down side of
course is if you do see a failure here, which is unlikely for
64-bit systems, your allocation will delay for an entire second.
Still this is preferable to locking up your system and it is the
best we can do given the constraints.

Additionally, the code was cleaned up to be much more readable
and comments were added to describe the various kmem-debug-*
configure options.  The default configure options remain:
"--enable-debug-kmem --disable-debug-kmem-tracking"
2010-07-26 15:47:55 -07:00
Brian Behlendorf
849c50e7f2 Fix two minor compiler warnings
In cmd/splat.c there was a comparison between an __u32 and an int.  To
resolve the issue simply use a __u32 and strtoul() when converting the
provided user string.

In module/spl/spl-vnode.c we should explicitly cast nd->last.name to
a const char * which is what is expected by the prototype.
2010-07-26 10:24:26 -07:00
Brian Behlendorf
8b0eb3f0dc Remove deadcode caused by removal of format1 arg
Commit 55abb0929e removed the never
used format1 argument of spl_debug_msg().  That in turn resulted
in some deadcode which should be removed since it's now useless.
2010-07-21 16:31:42 -07:00
Ricardo M. Correia
81672c0122 Display DEBUG keyword during module load when --enable-debug is used.
Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-07-20 15:31:03 -07:00
Ricardo M. Correia
2c762de830 Fix buggy kmem_{v}asprintf() functions
When the kvasprintf() call fails they should reset the arguments
by calling va_start()/va_copy() and va_end() inside the loop,
otherwise they'll try to read more arguments rather than starting
over and reading them from the beginning.

Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-07-20 13:51:46 -07:00
Brian Behlendorf
b17edc10a9 Prefix all SPL debug macros with 'S'
To avoid conflicts with symbols defined by dependent packages
all debugging symbols have been prefixed with a 'S' for SPL.
Any dependent package needing to integrate with the SPL debug
should include the spl-debug.h header and use the 'S' prefixed
macros.  They must also build with DEBUG defined.
2010-07-20 13:30:40 -07:00
Brian Behlendorf
55abb0929e Split <sys/debug.h> header
To avoid symbol conflicts with dependent packages the debug
header must be split in to several parts.  The <sys/debug.h>
header now only contains the Solaris macro's such as ASSERT
and VERIFY.  The spl-debug.h header contain the spl specific
debugging infrastructure and should be included by any package
which needs to use the spl logging.  Finally the spl-trace.h
header contains internal data structures only used for the log
facility and should not be included by anythign by spl-debug.c.

This way dependent packages can include the standard Solaris
headers without picking up any SPL debug macros.  However, if
the dependant package want to integrate with the SPL debugging
subsystem they can then explicitly include spl-debug.h.

Along with this change I have dropped the CHECK_STACK macros
because the upstream Linux kernel now has much better stack
depth checking built in and we don't need this complexity.

Additionally SBUG has been replaced with PANIC and provided as
part of the Solaris macro set.  While the Solaris version is
really panic() that conflicts with the Linux kernel so we'll
just have to make due to PANIC.  It should rarely be called
directly, the prefered usage would be an ASSERT or VERIFY.

There's lots of change here but this cleanup was overdue.
2010-07-20 13:29:35 -07:00
Ned Bass
8f813bb168 Proposed fix for oops on SIGINT in splat atomic:64-bit test.
The threads in the splat atomic:64-bit test share the data structure
atomic_priv_t ap, which lives on the kernel stack of the splat user-space
utility.  If splat terminates before the threads, accesses to that memory
location by the other threads become invalid.  Splat synchronizes with
the threads with the call:

wait_event_interruptible(ap.ap_waitq, splat_atomic_test1_cond(&ap, i));

Apparently, the SIGINT wakes and terminates splat prematurely, so that
GPFs or other bad things happen when the threads subsequently access ap.
This commit prevents this by using the uninterruptible form:

wait_event(ap.ap_waitq, splat_atomic_test1_cond(&ap, i));
2010-07-15 12:50:15 -07:00
Brian Behlendorf
d0bd694ca9 Fix -Werror=format-security compiler option
Noticed under Ubuntu kernel builds we should be passing a
format specifier and the string, not just the string.
2010-07-14 11:53:57 -07:00
Brian Behlendorf
f0ff89fc86 Linux 2.6.35 compat: filp_fsync() dropped 'stuct dentry *'
The prototype for filp_fsync() drop the unused argument 'stuct dentry *'.
I've fixed this by adding the needed autoconf check and moving all of
those filp related functions to file_compat.h.  This will simplify
handling any further API changes in the future.
2010-07-14 11:40:55 -07:00
Brian Behlendorf
a4bfd8ea1b Add __divdi3(), remove __udivdi3() kernel dependency
Up until now no SPL consumer attempted to perform signed 64-bit
division so there was no need to support this.  That has now
changed so I adding 64-bit division support for 32-bit platforms.
The signed implementation is based on the unsigned version.

Since the have been several bug reports in the past concerning
correct 64-bit division on 32-bit platforms I added some long
over due regression tests.  Much to my surprise the unsigned
64-bit division regression tests failed.

This was surprising because __udivdi3() was implemented by simply
calling div64_u64() which is provided by the kernel.  This meant
that the linux kernels 64-bit division algorithm on 32-bit platforms
was flawed.  After some investigation this turned out to be exactly
the case.

Because of this I was forced to abandon the kernel helper and
instead to fully implement 64-bit division in the spl.  There are
several published implementation out there on how to do this
properly and I settled on one proposed in the book Hacker's Delight.
Their proposed algoritm is freely available without restriction
and I have just modified it to be linux kernel friendly.

The update implementation now passed all the unsigned and signed
regression tests.  This should be functional, but not fast, which is
good enough for out purposes.  If you want fast too I'd strongly
suggest you upgrade to a 64-bit platform.  I have also reported the
kernel bug and we'll see if we can't get it fixed up stream.
2010-07-13 16:44:02 -07:00