articles:software-raid-and-uefi-boot
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| articles:software-raid-and-uefi-boot [2023/09/19 12:14] – Nikita Kipriyanov | articles:software-raid-and-uefi-boot [2023/09/20 06:09] (current) – Nikita Kipriyanov | ||
|---|---|---|---|
| Line 4: | Line 4: | ||
| TL;DR: don't fall for this temptation. The operational gains promised by partitionable RAID are hoax, and the reduced maintenance burden here is bogus for most practical cases. | TL;DR: don't fall for this temptation. The operational gains promised by partitionable RAID are hoax, and the reduced maintenance burden here is bogus for most practical cases. | ||
| + | |||
| + | ===== The problem ===== | ||
| Problems arise when one desires to also boot from this array. Now, not only Linux must understand the structure on disks, but system firmware too. Firmware, on the other hand, won't interpret the MD superblock (metadata) which describes the shape of the array. Often GRUB is used as a bootloader for Linux; it employs some unused areas on disk to install itself. Disks must //look normal// to the firmware even without metadata interpretation and //be useful// to GRUB or other bootloader. | Problems arise when one desires to also boot from this array. Now, not only Linux must understand the structure on disks, but system firmware too. Firmware, on the other hand, won't interpret the MD superblock (metadata) which describes the shape of the array. Often GRUB is used as a bootloader for Linux; it employs some unused areas on disk to install itself. Disks must //look normal// to the firmware even without metadata interpretation and //be useful// to GRUB or other bootloader. | ||
| The obvious first consequence is that we are bound to RAID1 (mirror), because this is the only case which stores data on disks as is, on all components. But there' | The obvious first consequence is that we are bound to RAID1 (mirror), because this is the only case which stores data on disks as is, on all components. But there' | ||
| + | |||
| + | ===== MD RAID1 superblocks ===== | ||
| {{ md-raid-superblocks.svg |On-disk placement of various MD RAID superblock variants}} | {{ md-raid-superblocks.svg |On-disk placement of various MD RAID superblock variants}} | ||
| - | There are two types of MD superblocks (blue): | + | There are two types of MD superblocks (blue), v0.9 and v1.x. They have different structure, whereas v1.x differ only by their placement within: |
| * version 0.9 (deprecated) and version 1.0 are placed near the end of the component device (regardless of the array size!), not farther than 128KiB to the end. | * version 0.9 (deprecated) and version 1.0 are placed near the end of the component device (regardless of the array size!), not farther than 128KiB to the end. | ||
| * version 1.1 is placed at the beginning of component devices | * version 1.1 is placed at the beginning of component devices | ||
| * version 1.2 is placed at 4KiB past the beginning of component devices | * version 1.2 is placed at 4KiB past the beginning of component devices | ||
| - | If the full device capacity is used (when e.g. all component devices are the same size), the size of a data area (green) is slightly smaller, between 64KiB and 128KiB | + | If the full device capacity is used (when e.g. all component devices are the same size), the size of a data area (green) is slightly smaller, between 64KiB and around 1MiB less than device size. This area in case of RAID1 would be the virtual disk size. |
| + | ===== Partition tables ===== | ||
| Now let's observe how partition tables look from the point of view of the firmware who doesn' | Now let's observe how partition tables look from the point of view of the firmware who doesn' | ||
| + | ==== MBR ==== | ||
| {{ mbr-boot-structure.svg |MBR disk layout}} | {{ mbr-boot-structure.svg |MBR disk layout}} | ||
| MBR partition table contains the bootloader code for " | MBR partition table contains the bootloader code for " | ||
| Line 26: | Line 32: | ||
| * v1.2: normally there will be no MBR in the place where firmware expects it, but it is possible to carefully craft a special MBR with the bootloader in such a way so it properly points to the really existing partitions needed to boot, and place it into the free use area (orange). | * v1.2: normally there will be no MBR in the place where firmware expects it, but it is possible to carefully craft a special MBR with the bootloader in such a way so it properly points to the really existing partitions needed to boot, and place it into the free use area (orange). | ||
| + | ==== GPT ==== | ||
| {{ gpt-grub-legacy-boot.svg |GPT disk layout when used for legacy boot}} | {{ gpt-grub-legacy-boot.svg |GPT disk layout when used for legacy boot}} | ||
| {{ gpt-uefi-boot.svg |GPT disk layout when used for UEFI boot}} | {{ gpt-uefi-boot.svg |GPT disk layout when used for UEFI boot}} | ||
| Line 38: | Line 45: | ||
| In addition to crafting the MBR where it's possible one will need to maintain it: update the bootloader code with system updates, manually clone in case of disk replacement, | In addition to crafting the MBR where it's possible one will need to maintain it: update the bootloader code with system updates, manually clone in case of disk replacement, | ||
| + | ==== GPT primer ==== | ||
| As an example, the following table is the complete on-disk structure of the RAID1 MD array with superblock version 1.2, partitioned with GPT, created out of devices exactly 1000000000 = 1GB with sector size 512. Notice how MD pads the beginning of data to 1MiB, the size to 64KiB, and GPT pads partitions to 1MiB: | As an example, the following table is the complete on-disk structure of the RAID1 MD array with superblock version 1.2, partitioned with GPT, created out of devices exactly 1000000000 = 1GB with sector size 512. Notice how MD pads the beginning of data to 1MiB, the size to 64KiB, and GPT pads partitions to 1MiB: | ||
| | ^ Address | | ^ Address | ||
| Line 55: | Line 63: | ||
| | ::: | 0x3b9aca00 = 10000000000 | | ::: | 0x3b9aca00 = 10000000000 | ||
| + | ===== Conclusion ===== | ||
| Here's a summary table: | Here's a summary table: | ||
| ^ Configuration | ^ Configuration | ||
| Line 65: | Line 74: | ||
| In addition to these complications, | In addition to these complications, | ||
| + | |||
| + | MBR is obsolete as it allows to partition up to 2 TiB of space per device and some newer system don't support this kind of a boot sequence anymore. | ||
| So, partitiontable MD RAID actually provides easier maintenance for legacy boot systems with boot disk of less than 2TiB, only RAID1, and is impossible to boot from with UEFI. This kind of a setup is obsolete and very limiting. | So, partitiontable MD RAID actually provides easier maintenance for legacy boot systems with boot disk of less than 2TiB, only RAID1, and is impossible to boot from with UEFI. This kind of a setup is obsolete and very limiting. | ||
| - | What to do? Do not use partitionable RAID. Partition | + | ===== Resolution ===== |
| + | What to do? Do not use partitionable RAID. | ||
| + | |||
| + | * On each disk create a separate partition table; | ||
| + | * Create | ||
| + | * ESP for UEFI boot | ||
| + | * BIOS boot for legacy boot | ||
| + | * A separate partition for '' | ||
| + | * The rest of the space will be one big partition to hold software RAID of whatever level you want; you're not restricted to RAID1 anymore. | ||
| + | * Use LVM to partition this big RAID space into volumes. LVM is far better than whatever partition table we were considering. Since we're using Linux' | ||
| + | * You can also add additional layers | ||
| + | * bcache layer for SSD caching | ||
| + | * then LUKS goes for crypto | ||
| + | * then there might be VDO for deduplication and compression | ||
| + | * Provide for redundant bootloading: | ||
| + | * install bootloader onto each device in case of legacy boot | ||
| + | * create additional ESPs on all devices | ||
| + | * install GRUB and/or initramfs hook that does this cloning every time the contents may update | ||
| + | * create additional firmware boot entries to permit booting from each device using its ESP | ||
| + | |||
| + | Disk replacement | ||
articles/software-raid-and-uefi-boot.1695125654.txt.gz · Last modified: by Nikita Kipriyanov
