This is an old revision of the document!

Software RAID and UEFI boot

When installing Linux with software MD RAID, there is a temptation to use a partitionable software RAID, to reduce maintenance burden for the case of failed drive replacing. In this case one creates an array out of raw unpartitioned devices, and the partitions it.

Problems arise when one desires to also boot from this array, due to the fact the firmware doesn't interpret the MD superblock (metadata) which describes the shape of the array. So we are required to make the on-disk structure to be compatible with whatever requirements the firmware imposes.

The obvious first consequence is that we are bound to RAID1 (mirror), because this is the only case which presents the same data on all components. But there's more.

There are two types of MD superblocks:

version 0.9 (deprecated) and version 1.0 are placed near the end of the component device (regardless of the array size!), not farther than 128KiB to the end.
version 1.1 is placed at the beginning of component devices
version 1.2 is placed at 4KiB past the beginning of component devices

If the full device capacity is used (when e.g. all component devices are the same size), the size of a data area is slightly smaller, between 64KiB and 128KiB less than device size.

With DOS a.k.a. MBR partitioning scheme, which is only supported for “legacy” BIOS boot, it worked pretty well (with RAID1 a.k.a. mirror only). There is a version 1.0 of MD RAID superblock which places itself near the end of the device. Sector 0 of the array, which normally contains a partition table, becomes sector 0 on each component. When BIOS boots, it performs to MD superblock interpretation; it just sees two or more disks which have MBR in the sector 0, and it can easily execute a boot code from any of them it as usual. That boot code normally contains a GRUB stage 1 loader, which loads a few sectors (around 23) from the unused space immediately following the MBR that contain stage 1.5 loader, and that one is capable of interpreting a file system containing all the rest: GRUB stage 2 loader, various drivers, and so on. One of GRUB drivers can actually interpret MD superblock, but that's superficial: Linux kernel and initramfs is usually found along with GRUB code, so it can just load these and Linux will do the rest. So, boot happens normally. If one of drives dies, BIOS can be configured to try all of them one after another, so it will finally end up starting the one which boots.

However, this would not work with GPT partitioning scheme, which is required for UEFI boot. Here's why.

First of all, the GPT is written to the disk in two copies: into the beginning and into the end of the disk. Seeing the disk, firmware expects to see both copies in place. If one creates a partitionable RAID out of raw drives and partitions it using GPT,

Secondly, UEFI specification has no provision for the case when OS-managed RAID is going to be used. To boot UEFI uses a ESP, EFI System Partition, which can not be a part of a RAID. Rather, it should be seen as a simple GPT partition.