A race condition in rwsem_is_locked() was fixed in Linux 2.6.33 and the fix was
backported to RHEL5 as of kernel 2.6.18-190.el5. Details can be found here:
https://bugzilla.redhat.com/show_bug.cgi?id=526092
The race condition was fixed in the kernel by acquiring the semaphore's
wait_lock inside rwsem_is_locked(). The SPL worked around the race condition
by acquiring the wait_lock before calling that function, but with the fix in
place it must not do that.
This commit implements an autoconf test to detect whether the fixed version of
rwsem_is_locked() is present. The previous version of rwsem_is_locked() was an
inline static function while the new version is exported as a symbol which we
can check for in module.symvers. Depending on the result we correctly
implement the needed compatibility macros for proper spinlock handling.
Finally, we do the right thing with spin locks in RW_*_HELD() by using the
new compatibility macros. We only only acquire the semaphore's wait_lock if
it is calling a rwsem_is_locked() that does not itself try to acquire the lock.
Some new overhead and a small harmless race is introduced by this change.
This is because RW_READ_HELD() and RW_WRITE_HELD() now acquire and release
the wait_lock twice: once for the call to rwsem_is_locked() and once for
the call to rw_owner(). This can't be avoided if calling a rwsem_is_locked()
that takes the wait_lock, as it will in more recent kernels.
The other case which only occurs in legacy kernels could be optimized by
taking the lock only once, as was done prior to this commit. However, I
decided that the performance gain probably wasn't significant enough to
justify the messy special cases required.
The function spl_rw_get_owner() was only used to enable the afore-mentioned
optimization. Since it is no longer used, I removed it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
To avoid symbol conflicts with dependent packages the debug
header must be split in to several parts. The <sys/debug.h>
header now only contains the Solaris macro's such as ASSERT
and VERIFY. The spl-debug.h header contain the spl specific
debugging infrastructure and should be included by any package
which needs to use the spl logging. Finally the spl-trace.h
header contains internal data structures only used for the log
facility and should not be included by anythign by spl-debug.c.
This way dependent packages can include the standard Solaris
headers without picking up any SPL debug macros. However, if
the dependant package want to integrate with the SPL debugging
subsystem they can then explicitly include spl-debug.h.
Along with this change I have dropped the CHECK_STACK macros
because the upstream Linux kernel now has much better stack
depth checking built in and we don't need this complexity.
Additionally SBUG has been replaced with PANIC and provided as
part of the Solaris macro set. While the Solaris version is
really panic() that conflicts with the Linux kernel so we'll
just have to make due to PANIC. It should rarely be called
directly, the prefered usage would be an ASSERT or VERIFY.
There's lots of change here but this cleanup was overdue.
Remove RW_COUNT() from the rwlock implementation. The idea was that it
could be used as a generic wrapper for getting at the internal state
of a rwlock. While a good idea it's proven problematic to keep it
correct for multiple archs and internal implementation changes. In
short it hasn't been worth the trouble.
With that and simplicity in mind things have been updated to use the
rwsem_is_locked() function instead of RW_COUNT for the RW_*_HELD()
functions. As for rw_upgrade() it remains only implemented for
the generic rwsem implemenation. It remains to be determined if its
worth the effort of adding a custom implementation for each arch.
Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added
standardized headers to all source file to clearly indicate the
copyright, license, and to give credit where credit is due.
For kernels using the CONFIG_RWSEM_GENERIC_SPINLOCK implementation
nothing has changed. But if your kernel is building with arch
specific rwsems rw_tryupgrade() has been disabled until it can
be implemented correctly. In particular, the x86 implementation
now leverages atomic primatives for serialization rather than
spinlocks. So to get this working again it will need to be
implemented as a cmpxchg for x86 and likely something similiar
for other arches we are interested in. For now it's safest
to simply disable it.
As part of the 2.6.28 cleanup which moved all the linux/include/asm/
headers in to linux/arch, the guard headers for many header files
changed. The i386 rwsem implementation keys off this header to
ensure the internal members of the rwsem structure are interpreted
correctly. This change checks for the new guard macro in addition
to the only one, the implementation of the rwsem has not changed
for i386 so this is safe and correct.
We need to directly call __init_rwsem() or the name gets expanded
to SEM(lock-name). This is safe and correct for the support arches
x86/x86_64/ppc/ppc64.
The behavior of RW_*_HELD was updated because it was not quite right.
It is not sufficient to return non-zero when the lock is help, we must
only do this when the current task in the holder.
This means we need to track the lock owner which is not something
tracked in a Linux semaphore. After some experimentation the
solution I settled on was to embed the Linux semaphore at the start
of a larger krwlock_t structure which includes the owner field.
This maintains good performance and allows us to cleanly intergrate
with the kernel lock analysis tools. My reasons:
1) By placing the Linux semaphore at the start of krwlock_t we can
then simply cast krwlock_t to a rw_semaphore and pass that on to
the linux kernel. This allows us to use '#defines so the preprocessor
can do direct replacement of the Solaris primative with the linux
equivilant. This is important because it then maintains the location
information for each rw_* call point.
2) Additionally, by adding the owner to krwlock_t we can keep this
needed extra information adjacent to the lock itself. This removes
the need for a fancy lookup to get the owner which is optimal for
performance. We can also leverage the existing spin lock in the
semaphore to ensure owner is updated correctly.
3) All helper functions which do not need to strictly be implemented
as a define to preserve location information can be done as a static
inline function.
4) Adding the owner to krwlock_t allows us to remove all memory
allocations done during lock initialization. This is good for all
the obvious reasons, we do give up the ability to specific the lock
name. The Linux profiling tools will stringify the lock name used
in the code via the preprocessor and use that.
Update rwlocks validated on:
- SLES10 (ppc64)
- SLES11 (x86_64)
- CHAOS4.2 (x86_64)
- RHEL5.3 (x86_64)
- RHEL6 (x86_64)
- FC11 (x86_64)
It turns out that the previous rwlock implementation worked well but
did not integrate properly with the upstream kernel lock profiling/
analysis tools. This is a major problem since it would be awfully
nice to be able to use the automatic lock checker and profiler.
The problem is that the upstream lock tools use the pre-processor
to create a lock class for each uniquely named locked. Since the
rwsem was embedded in a wrapper structure the name was always the
same. The effect was that we only ended up with one lock class for
the entire SPL which caused the lock dependency checker to flag
nearly everything as a possible deadlock.
The solution was to directly map a krwlock to a Linux rwsem using
a typedef there by eliminating the wrapper structure. This was not
done initially because the rwsem implementation is specific to the arch.
To fully implement the Solaris krwlock API using only the provided rwsem
API is not possible. It can only be done by directly accessing some of
the internal data member of the rwsem structure.
For example, the Linux API provides a different function for dropping
a reader vs writer lock. Whereas the Solaris API uses the same function
and the caller does not pass in what type of lock it is. This means to
properly drop the lock we need to determine if the lock is currently a
reader or writer lock. Then we need to call the proper Linux API function.
Unfortunately, there is no provided API for this so we must extracted this
information directly from arch specific lock implementation. This is
all do able, and what I did, but it does complicate things considerably.
The good news is that in addition to the profiling benefits of this
change. We may see performance improvements due to slightly reduced
overhead when creating rwlocks and manipulating them.
The only function I was forced to sacrafice was rw_owner() because this
information is simply not stored anywhere in the rwsem. Luckily this
appears not to be a commonly used function on Solaris, and it is my
understanding it is mainly used for debugging anyway.
In addition to the core rwlock changes, extensive updates were made to
the rwlock regression tests. Each class of test was extended to provide
more API coverage and to be more rigerous in checking for misbehavior.
This is a pretty significant change and with that in mind I have been
careful to validate it on several platforms before committing. The full
SPLAT regression test suite was run numberous times on all of the following
platforms. This includes various kernels ranging from 2.6.16 to 2.6.29.
- SLES10 (ppc64)
- SLES11 (x86_64)
- CHAOS4.2 (x86_64)
- RHEL5.3 (x86_64)
- RHEL6 (x86_64)
- FC11 (x86_64)
- Replacing all BUG_ON()'s with proper ASSERT()'s
- Using ENTRY,EXIT,GOTO, and RETURN macro to instument call paths
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@78 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
your task is rescheduled to a different cpu after you've
taken the lock but before calling RW_LOCK_HELD is called.
We need the spinlock to ensure there is a wmb() there.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@68 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
Update check.sh script to take V=1 env var so you can run it verbosely as
follows if your chasing something: sudo make check V=1
Add new kobj api and needed regression tests to allow reading of files from
within the kernel. Normally thats not something I support but the spa layer
needs the support for its config file.
Add some more missing stub headers
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@38 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
muck with #includes in existing Solaris style source to get it
to find the right stuff.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@18 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c