NixOS: Installation Guide with RAID 1, encryption, and TPM Unlock (part 6 - Mitigating the volume swap attack)

Content:
The NixOS disk is encrypted, but a careful LUKS volume swap attack can still be used to obtain the encryption master key.
In this post, I show how to mitigate it!
This is the sixth post in the series:
- Preparing the virtual machine and partitioning the disks
- Disko, LUKS, and btrfs
- Installing the OS
- Enabling Secure Boot
- Unlocking the disk with TPM
- Mitigating the volume swap attack (this post)
As I said in the previous post, this kind of attack would not be carried out by an average person. The computer shop technician who is repairing your laptop probably will not be able to read your LUKS encrypted disk and with the key sealed with the help of PCR 7. But someone who knows what they are doing will find the weakness. This post and the next one are for those who want to be fully protected, both from the technician at the shop around the corner and from a hostile government or corporate agent. By the end of these two posts, the attacks described here will be mitigated, significantly increasing security (keeping in mind that it is always prudent to consider other attack surfaces outside these scenarios).
The problem
The attack was described by Oddlama, and it basically consists of removing the disk from the original computer, creating a fake encrypted partition with the same identification data as the original partition, and then using that to obtain the key. He even describes how to do this against NixOS.
This is possible because the UKI (Unified Kernel Image — the binary loaded by UEFI that contains the kernel and the initrd) is not encrypted — if it were, UEFI would not be able to load it. The LUKS volume header is also not encrypted, since it must be read by the system in initrd so that the LUKS volume can be decrypted. It is possible to inspect what will be loaded during boot and replace the LUKS volume with another one carefully prepared to steal the master key from the original LUKS volume.
I will not go into more detail because the explanation is long. You can read Oddlama’s post; it is quite interesting. The important point is that this attack is only possible because the UKI does not validate the identity of the LUKS volume, which allows it to be replaced.
Closing the breach
To solve the problem, we just need to start validating the decrypted volume before proceeding, something that is not
being done yet and will be addressed using PCR 15 (system-identity). In the previous post, I showed how to list the
hashes of all PCRs with systemd-analyze pcrs. Notice that PCR 15 was zeroed out, meaning it had not been extended.
systemd can start writing to it by simply enabling tpm2-measure-pcr=yes for systemd-cryptsetup during the boot
process.
To start writing to PCR 15 in the NixOS way, we need a Nix configuration. Fortunately, Oddlama himself provided the
solution, pointing to
a Forgejo repository by PatrickDaG,
with the file ensure-pcr.nix, which
has already been adapted in the repository
cloned into /etc/nixos, and which makes it possible to read this PCR through a NixOS module.
I incorporated this module into my solution, which made it very easy to apply. The example commit is
f690b88,
described as “Enable PCR15”, but we do not need to check it out; it is enough to update the configuration in
/etc/nixos/configuration.nix (it is already in the code, just remove the comment):
systemIdentity = {
enable = true;
};
Then apply it:
nixos-rebuild switch
Restart the virtual machine, and after the reboot, confirm that the value of PCR 15 is present (the value below is just an example — replaced by a sequence from 0 to 9):
$ systemd-analyze pcrs 15
NR NAME SHA256
15 system-identity 0123456789012345678901234567890123456789012345678901234567890123
That is the hash of the system identity (derived from the activated LUKS volume key); it is what will ensure that the unlocked volume is the expected one, reducing the volume swap attack.
The value found must be copied into the pcr15 field in the configuration. For example, using the fictitious sequence
above from 0 to 9:
systemIdentity = {
enable = true;
pcr15 = "0123456789012345678901234567890123456789012345678901234567890123";
};
Change this value, reboot once more, and everything should continue working.
How this solution prevents volume swapping
The explanation is in the Nix module defined in the file ensure-pcr.nix.
It creates a systemd service that starts extending PCR 15 when systemIdentity.enable is enabled, running:
systemd-cryptsetup attach crypt_disk1 /dev/disk/by-partlabel/NIXLUKS1 - 'tpm2-device=auto,tpm2-measure-pcr=yes,discard';
And when the hash of PCR 15 has been defined in systemIdentity.pcr15, it creates a check-pcrs service that runs in
initrd and, if it fails, prevents boot from continuing. This service is very simple: it just compares the hash stored in
the configuration with the value found in PCR 15:
if [[ $(systemd-analyze pcrs 15 --json=short | jq -r ".[0].sha256") != "${config.systemIdentity.pcr15}" ]] ; then
echo "PCR 15 check failed"
exit 1
else
echo "PCR 15 check succeed"
fi
As this service is required by sysroot.mount, if check-pcrs fails the root disk cannot be mounted and the entire
boot fails, entering emergency mode.
What if the boot fails?
If the value of PCR 15 changes for any reason, the boot will fail and enter emergency mode. In that case, you will need to log in with the emergency password configured in initrd and run:
systemctl disable check-pcrs
systemctl default
Note: see the boot.initrd.systemd.emergencyAccess attribute in configuration.nix, that is what defines the
emergency initrd password (using a hash of the password). To create the password as test, run (replace the salt,
xyz, and the password):
openssl passwd -6 -salt xyz test
Tip: I recommend that you test this by changing the configuration value to the wrong hash, running switch, and
confirming that the boot failed. Then test the commands above: you will need to provide the initrd emergency password,
disable the check-pcrs service, and continue the boot. In the worst-case scenario, if you cannot do that, you can
choose the previous configuration from the boot menu. After logging in again, set the PCR 15 value in the configuration
back to the correct one, run a new switch and reboot, making sure that boot works again without issues.
Is it secure?
Again: not yet.
The attack described by Oddlama no longer works, but there is still another one: since we only bind the disk encryption to PCRs 7 and 15, it is still possible to replace the operating system with another one and use the TPM to decrypt the disk. We will solve that in the next post.