It turns out that the previous rwlock implementation worked well but
did not integrate properly with the upstream kernel lock profiling/
analysis tools. This is a major problem since it would be awfully
nice to be able to use the automatic lock checker and profiler.
The problem is that the upstream lock tools use the pre-processor
to create a lock class for each uniquely named locked. Since the
rwsem was embedded in a wrapper structure the name was always the
same. The effect was that we only ended up with one lock class for
the entire SPL which caused the lock dependency checker to flag
nearly everything as a possible deadlock.
The solution was to directly map a krwlock to a Linux rwsem using
a typedef there by eliminating the wrapper structure. This was not
done initially because the rwsem implementation is specific to the arch.
To fully implement the Solaris krwlock API using only the provided rwsem
API is not possible. It can only be done by directly accessing some of
the internal data member of the rwsem structure.
For example, the Linux API provides a different function for dropping
a reader vs writer lock. Whereas the Solaris API uses the same function
and the caller does not pass in what type of lock it is. This means to
properly drop the lock we need to determine if the lock is currently a
reader or writer lock. Then we need to call the proper Linux API function.
Unfortunately, there is no provided API for this so we must extracted this
information directly from arch specific lock implementation. This is
all do able, and what I did, but it does complicate things considerably.
The good news is that in addition to the profiling benefits of this
change. We may see performance improvements due to slightly reduced
overhead when creating rwlocks and manipulating them.
The only function I was forced to sacrafice was rw_owner() because this
information is simply not stored anywhere in the rwsem. Luckily this
appears not to be a commonly used function on Solaris, and it is my
understanding it is mainly used for debugging anyway.
In addition to the core rwlock changes, extensive updates were made to
the rwlock regression tests. Each class of test was extended to provide
more API coverage and to be more rigerous in checking for misbehavior.
This is a pretty significant change and with that in mind I have been
careful to validate it on several platforms before committing. The full
SPLAT regression test suite was run numberous times on all of the following
platforms. This includes various kernels ranging from 2.6.16 to 2.6.29.
- SLES10 (ppc64)
- SLES11 (x86_64)
- CHAOS4.2 (x86_64)
- RHEL5.3 (x86_64)
- RHEL6 (x86_64)
- FC11 (x86_64)
- Configure check, the div64_64() function was renamed to
div64_u64() as of 2.6.26.
- Configure check, the global_page_state() fuction was introduced
in 2.6.18 kernels. The earlier 2.6.16 based SLES10 must not try
and use it, thankfully get_zone_counts() is still available.
- To simplify debugging poison all symbols aquired dynamically
using spl_kallsyms_lookup_name() with SYMBOL_POISON.
- Add console messages when the user mode helpers fail.
- spl_kmem_init_globals() use bit shifts instead of division.
- When the monotonic clock is unavailable __gethrtime() must perform
the HZ division as an 'unsigned long long' because the SPL only
implements __udivdi3(), and not __divdi3() for 'long long' division
on 32-bit arches.
We need dependent packages to be able to include spl_config.h so they
can leverage the configure checks the SPL has done. This is important
because several of the spl headers need the results of these checks to
work properly. Unfortunately, the autoheader build product is always
private to a particular build and defined certain common things.
(PACKAGE, VERSION, etc). This prevents other packages which also use
autoheader from being include because the definitions conflict. To
avoid this problem the SPL build system leverage AH_BOTTOM to include
a spl_unconfig.h at the botton of the autoheader build product. This
custom include undefs all known shared symbols to prevent the confict.
This does however mean that those definition are also not availble
to the SPL package either. The SPL package therefore uses the
equivilant SPL_META_* definitions.
Remove all instances of functions being reimplemented in the SPL.
When the prototypes are available in the linux headers but the
function address itself is not exported use kallsyms_lookup_name()
to find the address. The function name itself can them become a
define which calls a function pointer. This is preferable to
reimplementing the function in the SPL because it ensures we get
the correct version of the function for the running kernel. This
is actually pretty safe because the prototype is defined in the
headers so we know we are calling the function properly.
This patch also includes a rhel5 kernel patch we exports the needed
symbols so we don't need to use kallsyms_lookup_name(). There are
autoconf checks to detect if the symbol is exported and if so to
use it directly. We should add patches for stock upstream kernels
as needed if for no other reason than so we can easily track which
additional symbols we needed exported. Those patches can also be
used by anyone willing to rebuild their kernel, but this should
not be a requirement. The rhel5 version of the export-symbols
patch has been applied to the chaos kernel.
Additional fixes:
1) Implement vmem_size() function using get_vmalloc_info()
2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead
of $LINUX because Module.symvers is a build product. When
$LINUX_OBJ != $LINUX we will not properly detect exported symbols.
3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and
$LINUX/include search paths to allow proper compilation when
the kernel target build directory is not the source directory.
Minimal support added for the zone_get_hostid() function. Only
global zones are supported therefore this function must be called
with a NULL argumment. Additionally, I've added the HW_HOSTID_LEN
define and updated all instances where a hard coded magic value
of 11 was used; "A good riddance of bad rubbish!"
Support added to provide reasonable values for the global Solaris
VM variables: minfree, desfree, lotsfree, needfree. These values
are set to the sum of their per-zone linux counterparts which
should be close enough for Solaris consumers.
When a non-GPL app links against the SPL we cannot use the udev
interfaces, which means non of the device special files are created.
Because of this I had added a poor mans udev which cause the SPL
to invoke an upcall and create the basic devices when a minor
is registered. When a minor is unregistered we use the vnode
interface to unlink the special file.