mirror of
				https://git.proxmox.com/git/mirror_zfs.git
				synced 2025-10-26 01:45:00 +03:00 
			
		
		
		
	 07a3312f17
			
		
	
	
		07a3312f17
		
	
	
	
	
		
			
			Make use of Dracut's ability to restore the initramfs on shutdown and pivot to it, allowing for a clean unmount and export of the ZFS root. No need to force-import on every reboot anymore. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2195 Issue #2476 Issue #2498 Issue #2556 Issue #2563 Issue #2575 Issue #2600 Issue #2755 Issue #2766
		
			
				
	
	
		
			192 lines
		
	
	
		
			9.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			192 lines
		
	
	
		
			9.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| How to setup a zfs root filesystem using dracut
 | |
| -----------------------------------------------
 | |
| 
 | |
| 1) Install the zfs-dracut package.  This package adds a zfs dracut module
 | |
| to the /usr/share/dracut/modules.d/ directory which allows dracut to
 | |
| create an initramfs which is zfs aware.
 | |
| 
 | |
| 2) Set the bootfs property for the bootable dataset in the pool.  Then set
 | |
| the dataset mountpoint property to '/'.
 | |
| 
 | |
|     $ zpool set bootfs=pool/dataset pool
 | |
|     $ zfs set mountpoint=/ pool/dataset
 | |
| 
 | |
| Alternately, legacy mountpoints can be used by setting the 'root=' option
 | |
| on the kernel line of your grub.conf/menu.lst configuration file.  Then
 | |
| set the dataset mountpoint property to 'legacy'.
 | |
| 
 | |
|     $ grub.conf/menu.lst: kernel ... root=ZFS=pool/dataset
 | |
|     $ zfs set mountpoint=legacy pool/dataset
 | |
| 
 | |
| 3) To set zfs module options put them in /etc/modprobe.d/zfs.conf file.
 | |
| The complete list of zfs module options is available by running the
 | |
| _modinfo zfs_ command.  Commonly set options include: zfs_arc_min,
 | |
| zfs_arc_max, zfs_prefetch_disable, and zfs_vdev_max_pending.
 | |
| 
 | |
| 4) Finally, create your new initramfs by running dracut.
 | |
| 
 | |
|     $ dracut --force /path/to/initramfs kernel_version
 | |
| 
 | |
| Kernel Command Line
 | |
| -------------------
 | |
| 
 | |
| The initramfs' behavior is influenced by the following kernel command line
 | |
| parameters passed in from the boot loader:
 | |
| 
 | |
| * `root=...`: If not set, importable pools are searched for a bootfs
 | |
| attribute.  If an explicitly set root is desired, you may use
 | |
| `root=ZFS:pool/dataset`
 | |
| 
 | |
| * `zfs_force=0`: If set to 1, the initramfs will run `zpool import -f` when
 | |
| attempting to import pools if the required pool isn't automatically imported
 | |
| by the zfs module.  This can save you a trip to a bootcd if hostid has
 | |
| changed, but is dangerous and can lead to zpool corruption, particularly in
 | |
| cases where storage is on a shared fabric such as iSCSI where multiple hosts
 | |
| can access storage devices concurrently.  _Please understand the implications
 | |
| of force-importing a pool before enabling this option!_
 | |
| 
 | |
| * `spl_hostid`: By default, the hostid used by the SPL module is read from
 | |
| /etc/hostid inside the initramfs.  This file is placed there from the host
 | |
| system when the initramfs is built which effectively ties the ramdisk to the
 | |
| host which builds it.  If a different hostid is desired, one may be set in
 | |
| this attribute and will override any file present in the ramdisk.  The
 | |
| format should be hex exactly as found in the `/etc/hostid` file, IE
 | |
| `spl_hostid=0x00bab10c`.
 | |
| 
 | |
| Note that changing the hostid between boots will most likely lead to an
 | |
| un-importable pool since the last importing hostid won't match.  In order
 | |
| to recover from this, you may use the `zfs_force` option or boot from a
 | |
| different filesystem and `zpool import -f` then `zpool export` the pool
 | |
| before rebooting with the new hostid.
 | |
| 
 | |
| How it Works
 | |
| ============
 | |
| 
 | |
| The Dracut module consists of the following files (less Makefile's):
 | |
| 
 | |
| * `module-setup.sh`: Script run by the initramfs builder to create the
 | |
| ramdisk.  Contains instructions on which files are required by the modules
 | |
| and z* programs.  Also triggers inclusion of `/etc/hostid` and the zpool
 | |
| cache.  This file is not included in the initramfs.
 | |
| 
 | |
| * `90-zfs.rules`: udev rules which trigger loading of the ZFS modules at boot.
 | |
| 
 | |
| * `parse-zfs.sh`: Run early in the initramfs boot process to parse kernel
 | |
| command line and determine if ZFS is the active root filesystem.
 | |
| 
 | |
| * `mount-zfs.sh`: Run later in initramfs boot process after udev has settled
 | |
| to mount the root dataset.
 | |
| 
 | |
| * `export-zfs.sh`: Run on shutdown after dracut has restored the initramfs
 | |
| and pivoted to it, allowing for a clean unmount and export of the ZFS root.
 | |
| 
 | |
| `module-setup.sh`
 | |
| ---------------
 | |
| 
 | |
| This file is run by the Dracut script within the live system, not at boot
 | |
| time.  It's not included in the final initramfs.  Functions in this script
 | |
| describe which files are needed by ZFS at boot time.
 | |
| 
 | |
| Currently all the various z* and spl modules are included, a dependency is
 | |
| asserted on udev-rules, and the various zfs, zpool, etc. helpers are included.
 | |
| Dracut provides library functions which automatically gather the shared libs
 | |
| necessary to run each of these binaries, so statically built binaries are
 | |
| not required.
 | |
| 
 | |
| The zpool and zvol udev rules files are copied from where they are
 | |
| installed by the ZFS build.  __PACKAGERS TAKE NOTE__: If you move
 | |
| `/etc/udev/rules/60-z*.rules`, you'll need to update this file to match.
 | |
| 
 | |
| Currently this file also includes `/etc/hostid` and `/etc/zfs/zpool.cache`
 | |
| which means the generated ramdisk is specific to the host system which built
 | |
| it.  If a generic initramfs is required, it may be preferable to omit these
 | |
| files and specify the `spl_hostid` from the boot loader instead.
 | |
| 
 | |
| `parse-zfs.sh`
 | |
| ------------
 | |
| 
 | |
| Run during the cmdline phase of the initramfs boot process, this script
 | |
| performs some basic sanity checks on kernel command line parameters to
 | |
| determine if booting from ZFS is likely to be what is desired.  Dracut
 | |
| requires this script to adjust the `root` variable if required and to set
 | |
| `rootok=1` if a mountable root filesystem is available.  Unfortunately this
 | |
| script must run before udev is settled and kernel modules are known to be
 | |
| loaded, so accessing the zpool and zfs commands is unsafe.
 | |
| 
 | |
| If the root=ZFS... parameter is set on the command line, then it's at least
 | |
| certain that ZFS is what is desired, though this script is unable to
 | |
| determine if ZFS is in fact available.  This script will alter the `root`
 | |
| parameter to replace several historical forms of specifying the pool and
 | |
| dataset name with the canonical form of `zfs:pool/dataset`.
 | |
| 
 | |
| If no root= parameter is set, the best this script can do is guess that
 | |
| ZFS is desired.  At present, no other known filesystems will work with no
 | |
| root= parameter, though this might possibly interfere with using the
 | |
| compiled-in default root in the kernel image.  It's considered unlikely
 | |
| that would ever be the case when an initramfs is in use, so this script
 | |
| sets `root=zfs:AUTO` and hopes for the best.
 | |
| 
 | |
| Once the root=... (or lack thereof) parameter is parsed, a dummy symlink
 | |
| is created from `/dev/root` -> `/dev/null` to satisfy parts of the Dracut
 | |
| process which check for presence of a single root device node.
 | |
| 
 | |
| Finally, an initqueue/finished hook is registered which causes the initqueue
 | |
| phase of Dracut to wait for `/dev/zfs` to become available before attempting
 | |
| to mount anything.
 | |
| 
 | |
| `mount-zfs.sh`
 | |
| ------------
 | |
| 
 | |
| This script is run after udev has settled and all tasks in the initqueue
 | |
| have succeeded.  This ensures that `/dev/zfs` is available and that the
 | |
| various ZFS modules are successfully loaded.  As it is now safe to call
 | |
| zpool and friends, we can proceed to find the bootfs attribute if necessary.
 | |
| 
 | |
| If the root parameter was explicitly set on the command line, no parsing is
 | |
| necessary.  The list of imported pools is checked to see if the desired pool
 | |
| is already imported.  If it's not, and attempt is made to import the pool
 | |
| explicitly, though no force is attempted.  Finally the specified dataset
 | |
| is mounted on `$NEWROOT`, first using the `-o zfsutil` option to handle
 | |
| non-legacy mounts, then if that fails, without zfsutil to handle legacy
 | |
| mount points.
 | |
| 
 | |
| If no root parameter was specified, this script attempts to find a pool with
 | |
| its bootfs attribute set.  First, already-imported pools are scanned and if
 | |
| an appropriate pool is found, no additional pools are imported.  If no pool
 | |
| with bootfs is found, any additional pools in the system are imported with
 | |
| `zpool import -N -a`, and the scan for bootfs is tried again.  If no bootfs
 | |
| is found with all pools imported, all pools are re-exported, and boot fails.
 | |
| Assuming a bootfs is found, an attempt is made to mount it to `$NEWROOT`,
 | |
| first with, then without the zfsutil option as above.
 | |
| 
 | |
| Ordinarily pools are imported _without_ the force option which may cause
 | |
| boot to fail if the hostid has changed or a pool has been physically moved
 | |
| between servers.  The `zfs_force` kernel parameter is provided which when
 | |
| set to `1` causes `zpool import` to be run with the `-f` flag.  Forcing pool
 | |
| import can lead to serious data corruption and loss of pools, so this option
 | |
| should be used with extreme caution.  Note that even with this flag set, if
 | |
| the required zpool was auto-imported by the kernel module, no additional
 | |
| `zpool import` commands are run, so nothing is forced.
 | |
| 
 | |
| `export-zfs.sh`
 | |
| -------------
 | |
| 
 | |
| Normally the zpool containing the root dataset cannot be exported on
 | |
| shutdown as it is still in use by the init process. To work around this,
 | |
| Dracut is able to restore the initramfs on shutdown and pivot to it.
 | |
| All remaining process are then running from a ramdisk, allowing for a
 | |
| clean unmount and export of the ZFS root. The theory of operation is
 | |
| described in detail in the [Dracut manual](https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_dracut_on_shutdown).
 | |
| 
 | |
| This script will try to export all remaining zpools after Dracut has
 | |
| pivoted to the initramfs. If an initial regular export is not successful,
 | |
| Dracut will call this script once more with the `final` option,
 | |
| in which case a forceful export is attempted.
 | |
| 
 | |
| Other Dracut modules include similar shutdown scripts and Dracut
 | |
| invokes these scripts round-robin until they succeed. In particular,
 | |
| the `90dm` module installs a script which tries to close and remove
 | |
| all device mapper targets. Thus, if there are ZVOLs containing
 | |
| dm-crypt volumes or if the zpool itself is backed by a dm-crypt
 | |
| volume, the shutdown scripts will try to untangle this.
 |