Commit Graph

3536 Commits

Author SHA1 Message Date
Brian Behlendorf 5f6d0b6f5a Handle block pointers with a corrupt logical size
The general strategy used by ZFS to verify that blocks are valid is
to checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long as bad data is never
written with a valid checksum.  If this does somehow occur due to
a software bug or a memory bit-flip on a non-ECC system it may
result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2678
2014-10-23 09:20:52 -07:00
Ned Bass bc151f7b31 Remove checks for mandatory locks
The Linux VFS handles mandatory locks generically so we shouldn't
need to check for conflicting locks in zfs_read(), zfs_write(), or
zfs_freesp().  Linux 3.18 removed the lock_may_read() and
lock_may_write() interfaces which we were relying on for this
purpose.  Rather than emulating those interfaces we remove the
redundant checks.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2804
2014-10-22 11:06:53 -07:00
Matthew Ahrens 88904bb3e3 Illumos 5162 - zfs recv should use loaned arc buffer to avoid copy
5162 zfs recv should use loaned arc buffer to avoid copy
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <Bayard.Bell@nexenta.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/5162
  https://github.com/illumos/illumos-gate/commit/8a90470

Porting notes:
  Fix spelling error 's/arena/area/' in dmu.c.
  In restore_write() declare bonus and abuf at the top of the function.

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2696
2014-10-21 16:32:11 -07:00
Matthew Ahrens 4b20a6f509 Illumos 5150 - zfs clone of a defer_destroy snapshot causes strangeness
When a clone is created of a snapshot that has been marked for
deferred destroy (with "zfs destroy -d"), the clone "inherits" the
defer_destroy flag from the origin, and any snapshots of the clone
"inherit" the defer_destroy flag from the clone. This causes a strange
situation where the clone's snapshots are marked for defer_destroy but
they have no holds or clones. If the clone's snapshot gets a hold or
clone, which is then deleted, we will honor the incorrectly-set
defer_destroy flag and delete the snapshot!

Steps to reproduce:

  * zpool create test c1t1d0
  * zfs create test/fs
  * zfs snapshot test/fs@a
  * zfs clone test/fs@a test/clone
  * zfs destroy -d test/fs@a
  * zfs clone test/fs@a test/clone2
  * zfs snapshot test/clone2@a
  * zfs hold hld test/clone2@a
  * zfs release hld test/clone2@a
  * zfs list -r -t all test

  <test/clone2@a has been destroyed>

We noticed that this causes dcenter to get very confused, because it
treats snapshots that are marked defer_destroy as not existing. So it
won't see any snapshots of the clone that's marked defer_destroy.

5150 - zfs clone of a defer_destroy snapshot causes strangeness
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/projects/illumos-gate//issues/5150
  https://github.com/illumos/illumos-gate/commit/42fcb65

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2690
2014-10-21 15:26:58 -07:00
Matthew Ahrens 6c59307a3c Illumos 3693 - restore_object uses at least two transactions to restore an object
Restore_object should not use two transactions to restore an object:
  * one transaction is used for dmu_object_claim
  * another transaction is used to set compression, checksum and most
    importantly bonus data
  * furthermore dmu_object_reclaim internally uses multiple transactions
  * dmu_free_long_range frees chunks in separate transactions
  * dnode_reallocate is executed in a distinct transaction

The fact the dnode_allocate/dnode_reallocate are executed in one
transaction and bonus (re-)population is executed in a different
transaction may lead to violation of ZFS consistency assertions if the
transactions are assigned to different transaction groups.  Also, if
the first transaction group is successfully written to a permanent
storage, but the second transaction is lost, then an invalid dnode may
be created on the stable storage.

3693 restore_object uses at least two transactions to restore an object
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andriy Gapon <andriy.gapon@hybridcluster.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Original authors: Matthew Ahrens and Andriy Gapon

References:
  https://www.illumos.org/issues/3693
  https://github.com/illumos/illumos-gate/commit/e77d42e

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2689
2014-10-21 15:26:50 -07:00
Tim Chase 356d9ed4c8 Don't perform ACL-to-mode translation on empty ACL
In zfs_acl_chown_setattr(), the zfs_mode_comput() function is used to
create a traditional mode value based on an ACL.  If no ACL exists, this
processing shouldn't be done.  Problems caused by this were most evident
on version 4 filesystems which not only don't have system attributes,
and also frequently have empty ACLs. On such filesystems, performing a
chown() operation could have the effect of dirtying the mode bits in
memory but not on the file system as follows:

	# create a file with typical mode of 664
	echo test > test
	chown anyuser test
	ls -l test

and the mode will show up as all zeroes.  Unmounting/mounting and/or
exporting/importing the filesystem will reveal the proper mode again.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1264
2014-10-21 09:23:27 -07:00
Daniil Lunev 62bdd5eb7a Illumos 4924 - LZ4 Compression for metadata
Reviewed by Matthew Ahrens <mahrens@delphix.com>
Reviewed by Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://github.com/illumos/illumos-gate/commit/b8289d2
  https://www.illumos.org/issues/3756

Porting notes:

The static function zfs_prop_activate_feature() was removed because
this change removes the only caller.  The function was not removed
from Illumos but instead left as dead code.  However, to keep gcc
happy it was removed from Linux and may be easily restored if needed.

Ported by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1540
2014-10-20 16:17:49 -07:00
Brian Behlendorf ba232d8aea Suppress AIO kmem warnings
The new zpl_aio_write() and zpl_aio_read() functions use kmem_alloc()
to allocate enough memory to hold the vectorized IO.  While this
allocation will be small it's been observed in practice to sometimes
slightly exceed the 8K warning threshold by a few kilobytes.
Therefore, the KM_NODEBUG flag has been added to suppress warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #2774
2014-10-20 16:10:25 -07:00
Brian Behlendorf 599662c538 Remove kern_path() wrapper
The kern_path() function has been available since Linux 2.6.28.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf 3d5392cefa Remove kvasprintf() wrapper
The kvasprintf() function has been available since Linux 2.6.22.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf 0fac9c9e6d Remove proc_handler() wrapper
As of Linux 2.6.32 the proc handlers where updated to expect only
five arguments.  Therefore there is no longer a need to maintain
this compatibility code and this infrastructure can be simplified.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf 68a829b29d Remove credential configure checks.
The groups_search() function was never exported by a mainline kernel
therefore we drop this compatibility code and always provide our own
implementation.

Additionally, the cred_t structure has been available since 2.6.29
so there is no longer a need to maintain compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 137af025f6 Remove set_fs_pwd() configure check
This function has never been exported by any mainline and was only
briefly available under RHEL5.  Therefore this check is being removed
and the code update to always use the wrapper function.

The next step will be to eliminate all this code.  If ZFS were updated
not to assume that it's pwd was / there would be no need for this.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 3c49a16989 Remove user_path_dir() wrapper
The user_path_dir() function has been available since Linux 2.6.27.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 44778f4110 Remove kallsyms_lookup_name() wrapper
After the removable of get_vmalloc_info(), the unused global memory
variables, and the optional dcache/icache shrinkers there is no
longer a need for the kallsyms compatibility code.  This allows
us to eliminate another brittle area of the code by removing the
kernel upcall this functionality depended on for older kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 89a461e70c Remove shrink_{i,d}node_cache() wrappers
This is optional functionality which may or may not be useful to
ZFS when using older kernels.  It is never a hard requirement.
Therefore this functionality is being removed from the SPL and
a simpler slimmed down version will be added to ZFS.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 8bbbe46f86 Remove global memory variables
Platforms such as Illumos and FreeBSD have historically provided
global variables which summerize the memory state of a system.
Linux on the otherhand doesn't expose any of this information
to kernel modules and uses entirely different mechanisms for
memory management.

In order to simplify the original ZFS port to Linux these global
variables were emulated by the SPL for the benefit of ZFS.  As ZoL
has matured over the years it has moved steadily away from these
interfaces and now no longer depends on them at all.

Therefore, this patch completely removes the global variables
availrmem, minfree, desfree, lotsfree, needfree, swapfs_minfree,
and swapfs_reserve.  This greatly simplifies the memory management
code and eliminates a common area of confusion.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf e1310afae3 Remove get_vmalloc_info() wrapper
The get_vmalloc_info() function was used to back the vmem_size()
function.  This was always problematic and resulted in brittle
code because the kernel never provided a clean interface for
modules.

However, it turns out that the only caller of this function in
ZFS uses it to determine the total virtual address space size.
This can be determined easily without get_vmalloc_info() so
vmem_size() has been updated to take this approach which allows
us to shed the get_vmalloc_info() dependency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 50e41ab1e1 Remove on_each_cpu() wrapper
The on_each_cpu() function has been available since Linux 2.6.27.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 2bc5666f53 Remove i_mutex() configure check
The inode structure has used i_mutex as its internal locking
primitive since 2.6.16.  The compatibility code to check for
the previous semaphore primitive has been removed.  However,
the wrapper function itself is being kept because it's entirely
possible this primitive will change again to allow finer grained
locking.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 82f2f1a3af Simplify the time compatibility wrappers
Many of the time functions had grown overly complex in order to
handle kernel compatibility issues.  However, as of Linux 2.6.26
all the required functionality is available.  This allows us to
retire numerous configure checks and greatly simplify the time
compatibility wrappers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf 87f8055a91 Map highbit64() to fls64()
The fls64() function has been available since Linux 2.6.16 and
it should be used to implemented highbit64().  This allows us
to provide an optimized implementation and simplify the code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf 9c91800d19 Remove CTL_UNNUMBERED sysctl interface
Support for the CTL_UNNUMBERED sysctl interface was removed in
Linux 2.6.19.  There is no longer any reason to maintain this
compatibility code.  There also issue any reason to keep around
the CTL_NAME macro and helpers so they have been retired.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf b38bf6a4e3 Remove register_sysctl() compatibility code
The register_sysctl() interface has been stable since Linux 2.6.21.
There is no longer a need to maintain compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf bb4dee3df2 Remove utsname() wrapper
There is no longer a need to wrap this because utsname() is provided
by the kernel and can be called directly.  This will require a small
change in the ZFS code because utsname is expected to be a global
structure and not a function.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:41 -07:00
Brian Behlendorf aa363c5c05 Remove sysctl_vfs_cache_pressure assumption
The generic SPL cache shrinkers make the assumption that the
caches only contain VFS cache data and therefore should be scaled
based on vfs_cache_pressure.  This is not strictly true and it
should not be assumed.

Removing this tuning should not have any impact on the stock
behavior because vfs_cache_pressure=100 by default.  This means
that no scaling will take place.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf a80d69caf0 Remove adaptive mutex implementation
Since the Linux 2.6.29 kernel all mutexes have been adaptive mutexs.
There is no longer any point in keeping this code so it is being
removed to simplify the code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf 3a92530563 Update code to use misc_register()/misc_deregister()
When the SPL was originally written it was designed to use the
device_create() and device_destroy() functions.  Unfortunately,
these functions changed considerably over the years making them
difficult to rely on.

As it turns out a better choice would have been to use the
misc_register()/misc_deregister() functions.  This interface
for registering character devices has remained stable, is simple,
and provides everything we need.

Therefore the code has been reworked to use this interface.  The
higher level ZFS code has always depended on these same interfaces
so this is also as a step towards minimizing our kernel dependencies.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf 0cb3dafccd Update SPLAT to use kmutex_t for portability
For consistency throughout the code update the SPLAT infrastructure
to use the wrapped mutex interfaces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf 6203295438 Make license compatibility checks consistent
Apply the license specified in the META file to ensure the
compatibility checks are all performed consistently.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf 33074f2254 Handle NULL mirror child vdev
When selecting a mirror child it's possible that map allocated by
vdev_mirror_map_allc() contains a NULL for the child vdev.  In
this case the child should be skipped and the read issues to
another member of the mirror.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #1744
2014-10-17 14:59:05 -07:00
Brian Behlendorf f0e324f25d Update utsname support
Modify the code to use the utsname() kernel function rather than
a global variable.  This results is cleaner more portable code
because utsname() is already provided by the kernel and can be
easily emulated in user space via uname(2).  This means that it
will behave consistently in both contexts.

This is also has the benefit that it allows the removal of a few
_KERNEL pre-processor conditions.  And it also is a pre-requisite
for a proper FUSE port because we need to provide a valid utsname.

Finally, it allows us to remove this functionality from the SPL
and all the related compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2757
2014-10-17 14:58:57 -07:00
Brian Behlendorf 050d22b068 Remove shrink_dcache_memory() and shrink_icache_memory()
This functionality is optional and until Linux 3.0, which
provided per-filesystem shinkers, they was never a reasonable
interface.  Therefore, this functionality is being dropped
for earlier kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2757
2014-10-17 14:58:50 -07:00
Brian Behlendorf 60bba62814 Update code to use misc_register()/misc_deregister()
When ZPIOS was originally written it was designed to use the
device_create() and device_destroy() functions.  Unfortunately,
these functions changed considerably over the years making them
difficult to rely on.

As it turns out a better choice would have been to use the
misc_register()/misc_deregister() functions.  This interface
for registering character devices has remained stable, is simple,
and provides everything we need.

Therefore the code has been reworked to use this interface.  The
higher level ZFS code has always depended on these same interfaces
so this is also as a step towards minimizing our kernel dependencies.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2757
2014-10-17 14:58:44 -07:00
Brian Behlendorf e6659763c6 Improve VERIFY() error in dmu_write()
This is a debug patch designed to ensure an error code is logged
to the console when this VERIFY() is hit.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #1440
2014-10-08 09:18:14 -07:00
Brian Behlendorf 8878261fc9 Fix CPU_SEQID use in preemptible context
Commit e022864 introduced a regression for kernels which are built
with CONFIG_DEBUG_PREEMPT.  The use of CPU_SEQID in a preemptible
context causes zio_nowait() to trigger the BUG.  Since CPU_SEQID
is simply being used as a random index the usage here is safe. To
resolve the issue preempt is disable while calling CPU_SEQID.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #2769
2014-10-07 16:40:29 -07:00
Matthew Ahrens e022864d19 Illumos 5176 - lock contention on godfather zio
5176 lock contention on godfather zio
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Bayard Bell <Bayard.Bell@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/5176
  https://github.com/illumos/illumos-gate/commit/6f834bc

Porting notes:

Under Linux max_ncpus is defined as num_possible_cpus().  This is
largest number of cpu ids which might be available during the life
time of the system boot.  This value can be larger than the number
of present cpus if CONFIG_HOTPLUG_CPU is defined.

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2711
2014-10-07 11:24:24 -07:00
Brian Behlendorf 81857a34d1 Fix bug in SPLAT taskq:front
While running SPLAT on a kernel with CONFIG_DEBUG_ATOMIC_SLEEP
enabled the taskq:front was flagged as a test which might sleep
which in an unsafe context.  Specifically, the splat_vprint()
function which internally takes a mutex was being called under
a spin lock.  Moving the log function outside the spin lock
cleanly solves this issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-03 10:42:20 -07:00
Richard Yao 83e9986f6e Implement -t option to zpool create for temporary pool names
Creating virtual machines that have their rootfs on ZFS on hosts that
have their rootfs on ZFS causes SPA namespace collisions when the
standard name rpool is used. The solution is either to give each guest
pool a name unique to the host, which is not always desireable, or boot
a VM environment containing an ISO image to install it, which is
cumbersome.

26b42f3f9d introduced `zpool import -t
...` to simplify situations where a host must access a guest's pool when
there is a SPA namespace conflict. We build upon that to introduce
`zpool import -t tname ...`. That allows us to create a pool whose
in-core name is tname, but whose on-disk name is the normal name
specified.

This simplifies the creation of machine images that use a rootfs on ZFS.
That benefits not only real world deployments, but also ZFSOnLinux
development by decreasing the time needed to perform rootfs on ZFS
experiments.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2417
2014-09-30 10:46:59 -07:00
Tim Chase cb08f06307 Perform whole-page page truncation for hole-punching under a range lock
As an attempt to perform the page truncation more optimally, the
hole-punching support added in 223df0161f
truncated performed the operation in two steps: first, sub-page "stubs"
were zeroed under the range lock in zfs_free_range() using the new
zfs_zero_partial_page() function and then the whole pages were truncated
within zfs_freesp().  This left a window of opportunity during which
the full pages could be touched.

This patch closes the window by moving the whole-page truncation into
zfs_free_range() under the range lock.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2733
2014-09-29 09:22:03 -07:00
Max Grossman 36283ca233 Illumos 5138 - add tunable for maximum number of blocks freed in one txg
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Mattew Ahrens <mahrens@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5138
  https://github.com/illumos/illumos-gate/commit/af3465d

Porting notes:

Because support for exposing a uint64_t parameter wasn't added
until v3.17-rc1 the zfs_free_max_blocks variable has been declared
as a unsigned long.  This is already far larger than required and
it allows us to avoid additional autoconf compatibility code.

The default value has been set to 100,000 on Linux instead of
ULONG_MAX which is used on Illumos.  This was done to limit the
number of outstanding IOs in the system when snapshots are destroyed.
This helps ensure individual TXG sync times are kept reasonable and
memory isn't wasted managing a huge backlog of outstanding IOs.

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2675
Closes #2581
2014-09-23 14:26:34 -07:00
Alex Reece acbad6ff67 Illumos 4753 - increase number of outstanding async writes when sync task is waiting
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
    https://www.illumos.org/issues/4753
    https://github.com/illumos/illumos-gate/commit/73527f4

Comments by Matt Ahrens from the issue tracker:
    When a sync task is waiting for a txg to complete, we should hurry
    it along by increasing the number of outstanding async writes
    (i.e. make vdev_queue_max_async_writes() return a larger number).
    Initially we might just have a tunable for "minimum async writes
    while a synctask is waiting" and set it to 3.

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2716
2014-09-23 13:50:55 -07:00
Matthew Ahrens d97aa48f7c Illumos 5139 - SEEK_HOLE failed to report a hole at end of file
5139 SEEK_HOLE failed to report a hole at end of file
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: Peng Dai <peng.dai@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5139
  https://github.com/illumos/illumos-gate/commit/0fbc0cd

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2714
2014-09-23 10:38:45 -07:00
Richard Yao 485c581c41 Fix function call with uninitialized value in vdev_inuse
LLVM's static analyzer reported that we could pass an uninitialized
pool_guid to spa_by_guid() in vdev_inuse(). Upon review, it is correct.
An attempt to repurpose a spare or L2ARC drive from an exported pool
will cause the pool_guid passed to spa_by_guid() to be unintialized
information from the stack. This will cause non-deterministic behavior.
Since there is no reason why we cannot repurpose such disks, we modify
vdev_inuse() to avoid calling spa_by_guid() when they are detected.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2330
2014-09-23 10:32:45 -07:00
Matthew Ahrens b8bcca18f7 Illumos 5161 - add tunable for number of metaslabs per vdev
5161 add tunable for number of metaslabs per vdev
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/5161
  https://github.com/illumos/illumos-gate/commit/bf3e216

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2698
2014-09-23 10:00:02 -07:00
Turbo Fredriksson e3020723dc Linux 3.16 compat: smp_mb__after_clear_bit()
The smp_mb__{before,after}_clear_bit functions have been renamed
smp_mb__{before,after}_atomic.  Rather than adding a compatibility
function to handle this the code has been updated to use smp_wmb().

This has the advantage of being a stable functionally equivalent
interface.  On many architectures smp_mb__after_clear_bit() expands
to smp_wmb().  Others might be able to do something slightly more
efficient but this will be safe and correct on all of them.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #386
2014-09-22 16:24:55 -07:00
Matthew Ahrens ebcf49365a Illumos 5177 - remove dead code from dsl_scan.c
5177 remove dead code from dsl_scan.c
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5177
  https://github.com/illumos/illumos-gate/commit/5f37736

Porting notes:

The local variable 'buf' was removed from dsl_scan_visitbp().
This wasn't part of the original patch but it should have been.

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2712
2014-09-22 15:52:58 -07:00
Adam Leventhal 64dbba3679 Illumos 5174 - add sdt probe for blocked read in dbuf_read()
5174 add sdt probe for blocked read in dbuf_read()
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5174
  https://github.com/illumos/illumos-gate/commit/f6164ad

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2710
2014-09-22 14:20:25 -07:00
Matthew Ahrens 6d9036f350 Illumos 5140 - message about "%recv could not be opened" is printed when booting after crash
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/projects/illumos-gate//issues/5140
  https://github.com/illumos/illumos-gate/commit/2243853

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2676
2014-09-18 15:04:59 -07:00
Brian Behlendorf 2d50158343 Fix z_teardown_inactive_lock deadlock
When rolling back a mounted filesystem zfs_suspend() is called
which acquires the z_teardown_inactive_lock.  This lock can not
be dropped until the filesystem has been rolled back and resumed
in zfs_resume_fs().

Therefore, we must not call iput() under this lock because it
may result in the inode->evict() handler being called which also
takes this lock.  Instead use zfs_iput_async() to ensure dropping
the last reference is deferred and runs in a safe context.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2670
2014-09-11 09:11:45 -07:00