blog: flash fse

2025-12-23 18:02:53 +01:00 · 2025-12-23 18:02:53 +01:00 · 747c1b6c3b
commit 747c1b6c3b
parent 1223888e36
4 changed files with 52 additions and 16 deletions
--- a/content/blog/flash-filesystem-encryption/diagram.py
+++ b/content/blog/flash-filesystem-encryption/diagram.py
@ -14,7 +14,7 @@ ARGS = {
 }
 SVG = """\
 <svg version="1.1" xmlns="http://www.w3.org/2000/svg" width="{w}" height="{h}" viewBox="0 0 {w} {h}">
-	<!-- Generated by https://txmn.tk/blog/flash-filesystem-encryption/graph.py -->
+	<!-- Generated by https://txmn.tk/blog/flash-filesystem-encryption/diagram.py -->
 	<!-- Image released under license CC0 (public domain) -->
 	<title>{title}</title>
 	<style>
--- a/content/blog/flash-filesystem-encryption/fse.pdf
+++ b/content/blog/flash-filesystem-encryption/fse.pdf
--- a/content/blog/flash-filesystem-encryption/index.md
+++ b/content/blog/flash-filesystem-encryption/index.md
@ -1,18 +1,26 @@
 +++
 title = "Embedded filesystem encryption on flash memory"
-date = 2025-12-13
+date = 2025-12-23
 description = "My journey in the world of filesystem encryption and flash memories."
 insert_anchor_links = "left"
-draft = true
 [taxonomies]
-tags = ["cryptography", "embedded"]
+tags = ["cryptography", "ESP32"]
 [extra]
 katex = true
 +++

 One of my long-term projects is an ESP32-based phone, using an SD card for storage. Then, why not encrypting the SD card?

-_In this post, we first explain the basics of filesystem encryption, then explore ways to apply it to the case of an embedded device and flash memory. This last part is quite rarely analyzed in the litterature._
+_In this post, we first explain the basics of filesystem encryption, then explore ways to apply it to the case of an embedded device and flash memory. This last part is quite rarely analyzed in the literature._
+
+## Threat model
+
+The threat model is often a bit vague when it comes to filesystem encryption. Expectations may vary depending on the context. Here is what we would like here:
+
+* If an adversary should steal the device, they would not be able to obtain information about its content, except maybe the total size.
+* If an adversary should steal the device, they would not be able to make us obtain the plaintexts of their choice.
+* The adversary can choose plaintexts and make the defender encrypt and write them. (That happens naturally when you receive a message.)
+* The adversary can choose ciphertexts and write them. (Imagine you leave the SD card on your desk when going to lunch.)

 ## Choosing a cipher

@ -34,7 +42,7 @@ Block ciphers like AES process data by blocks of fixed size, for instance 16 byt

 $$C = E(K, P)$$

-(I love those diagrams so I made [a simple Python script](graph.py) to generate them in SVG.)
+(I love those diagrams so I made [a simple Python script](diagram.py) to generate them in SVG. I could have used existing ones from Wikimedia Commons or used Tikz, but I wanted clean SVG respecting light/dark mode.)

 This mode of operation is called ECB for Electronic Code Book. It has, however, fatal flaws:

@ -84,34 +92,37 @@ Now that we've highlighted an important property of flash memories, it appears F

 [LittleFS](https://github.com/littlefs-project/littlefs) is made exactly for this purpose. Moreover, it provides atomic operations, meaning it never leaves the filesystem in an incoherent state if there is a power loss or a storage failure during a write operation.

-If we're going down at the filesystem lever, why not going further? Instead of encrypting files, we can directly encrypt the filesystem's blocks, by placing the cryptographic module between LittleFS and the IO. LittleFS's write length can be customized so we can set it to our block length and avoid dealing with partial blocks, as we would have to do when encrypting files. Another benefit is that we're hiding the file tree as well: directories, names and metadata are encrypted as well, with no additional complexity.
+If we're going down at the filesystem level, why not going further? Instead of encrypting files, we can directly encrypt the filesystem's blocks, by placing the cryptographic module between LittleFS and the IO. LittleFS's write length can be customized so we can set it to our block length and avoid dealing with partial blocks, as we would have to do when encrypting files. Another benefit is that we're hiding the file tree as well: directories, names and metadata are encrypted as well, with no additional complexity.
+
+Such a niche filesystem has the disadvantage that it's not natively supported by Linux, making development, debug or even file transfer between the device and a computer more difficult. A LittleFS kernel module exists, and adding our encryption layer should be feasible.

 #### XTS

-We need something looking more like ECB or CTR in that it allows small random writes. XTS is a popular for filesystem encryption and satisfies this criterion.
+We need something looking more like ECB or CTR in that it allows small random writes. XTS is a popular mode for filesystem encryption and satisfies this criterion.

 <div style="text-align:center"><img alt="XTS" src="xts.svg"/></div>

 $$C = E(K_1, P \oplus \Delta) \oplus \Delta$$
 $$\Delta = E(K_2, i) \times \alpha^j$$
+$$X \times \alpha = (X \ll 1) \oplus (MSB(X) \cdot 135)$$

-Here, the storage is divided into sectors and sectors into blocks. In the diagram, i is the sector number and j is the block number.
+Here, the storage is divided into sectors and sectors into blocks. In the diagram, i is the sector number and j is the block number. $\ll$ is left bitshift and MSB is the most significant bit.

-Why so complicated? First, $E(K_2, i)$ looks like CTR. To make it faster, it remains constant through the entire sector (which is useful because LittleFS prefers to read or write contiguous blocks when possible). Multiplication by $\alpha$ (as defined later) is faster than a block encryption and can be computed incrementally with $x \times \alpha^j = (x \times \alpha^{j-1}) \times \alpha$. The double XOR prevents attacks on chosen ciphertext or known plaintext as described before.
+Why so complicated? First, $E(K_2, i)$ looks like CTR. To make it faster, it remains constant through the entire sector (which is useful because LittleFS prefers to read or write contiguous blocks when possible). Multiplication by $\alpha$ is faster than a block encryption and can be computed iteratively with $x \times \alpha^j = (x \times \alpha^{j-1}) \times \alpha$. The double XOR prevents attacks on chosen ciphertext or known plaintext as described before.

 XTS has a way to deal with final partial blocks (when data length is not a multiple of block size), but as we're encrypting full blocks of 16 bytes only, we don't need that mechanism.

 [Rogaway 2011](https://www.cs.ucdavis.edu/~rogaway/papers/modes.pdf) criticized XTS on multiple points.

-* XTS is based on a modified version of Rogaway's XEX mode (XOR-Encrypt-Xor) which has well understood security properties.
+* XTS is based on a modified version of Rogaway's XEX mode (XOR-Encrypt-XOR) which has well understood security properties.
 * Ciphertext stealing, the way to deal with final partial blocks, is poorly designed or at least not proven secure under well-defined security goals. Again, we are not concerned.
 * The use of two different keys is unjustified, except it makes proofs easier. If the sector number i is xored with a secret random salt, there is no risk of collision between the inputs of the two cipher blocks, as long as we do not store ciphertexts of the secret key or the salt (they should be user inputs stored in volatile memory only).
 * It is a FIPS (NIST standard) but only specified in an IEEE spec that is seemingly not available publicly (unless using Sci-Hub of course).
-* $\Delta$ is byte-swapped to make implementation easier on little-endian machines, but this has no security implications.
+* In the original definition, $\Delta$ is byte-swapped to make implementation easier on little-endian machines, but this has no security implications.

 ## Benchmarking ciphers

-I implemented the simplified XEX in Rust and ran a benchmark on the ESP32. As the multiplication by powers of alpha can be implemented in many ways, I also tried different versions.
+I implemented XTS in Rust and ran a benchmark on the ESP32. As the multiplication by powers of alpha can be implemented in many ways, I also tried different versions.

 First version, delta is an unaligned array of bytes, cast to u128 to do the maths:

@ -177,9 +188,11 @@ Here are the benchmark results (encrypting 100 times 128kB):

 The fastest is XTS with one key (and salted sector number) and long sectors.

-Sectors must not be too long, however, as random access needs computing all 
+Sectors must not be too long, however, as random access to block j needs computing all j successive powers of $\alpha$. 32 blocks may be a good value, as it matches flash erase size.

-## Storing the key
+## The key
+
+### Deriving the key from a password

 AES128 needs 128 bits of key, however the user will only remember ASCII words, not fully random bytes. We need something to derive a key from a variable-length password. We could just compute a hash of the password, as the ESP32 provides a hardware implementation of SHA2, but for storing passwords it is better to use a dedicated function that is fast enough to run once but hard to bruteforce efficiently on optimized systems.

@ -187,6 +200,28 @@ AES128 needs 128 bits of key, however the user will only remember ASCII words, n

 A popular choice as of today is [Argon2](https://en.wikipedia.org/wiki/Argon2), which is memory-hard: one instance requires efficient access to a big amoung of memory, potentially megabytes or even gigabytes, so it is difficult to optimize even on dedicated hardware. Problems are that its implementation is quite complicated (it will take too much ROM) and its specs are not even complete.

-[Catena](https://www.researchgate.net/publication/261548591_The_Catena_Password_Scrambler) is a scheme with similar properties but with a very simple description. It takes less than 50 lines of Rust. To run on the ESP32, I used SHA256 and set its memory usage to 128kB and 1024 iterations. In comparison, recommended parameters are between 67MB and 1GB with 3 or 4 iterations. It runs in 911ms. We can expect a speedup of more than 10 on a good CPU, and it still can be parallelized easily on an old GPU: if your GPU has 1GB of RAM, it can hold at most 8192 parallel instances.
+[Catena](https://www.researchgate.net/publication/261548591_The_Catena_Password_Scrambler) is a scheme with similar properties but with a very simple description. It takes less than 50 lines of Rust. To run on the ESP32 (and its 256kB RAM), I used SHA256 and set its memory usage to 128kB and 1024 iterations. In comparison, recommended parameters are between 67MB and 1GB with 3 or 4 iterations. It runs in 911ms. We can expect a speedup of more than 10 on a good CPU, and it still can be parallelized easily on an old GPU: if your GPU has 1GB of RAM, it can hold at most 8192 parallel instances.

 The benefit of password hashing functions on the ESP32 is a bit disappointing, we only slow down attacks by a small factor. It seems easier to enforce strong passwords. Picking 10 random words from a [BIP39](https://github.com/bitcoin/bips/blob/04b448b599cb16beae40ba9a98df9f262da522f7/bip-0039/english.txt) wordlist gives $\log_2(2048^{10})=110$ bits of entropy. To make it faster to type, each word can be shortened to its 4 first letters without loosing entropy.
+
+### Storing the key
+
+It can be useful to use two keys: the first one, derived from the password, is used to encrypt the second key, which is written to the storage. The second key is use to encrypt the filesystem. This way, the password can be changed, as the second key does not depend on it. If you have to destroy the data in a hurry and you have a reason to think someone with a gun may force you to hand over the password, you just have to erase the stored key.
+
+## Active attacks and authentication
+
+Assumptions and security goals about malleability are debatable. Lack of authentication allows many attacks which are inherently hard to counter when encrypting a filesystem.
+
+If an adversary **steals your device**, they may copy your encrypted data before handling it back to you. They may as well install a keylogger in the program memory. In this case, you should ideally copy your data, destroy the potentially compromised device and install a fresh one. One motivation to still consider defending against this attack is that in our context, the executable code is stored in the ESP32 meanwhile the data are in the SD card, so it is possible that the SD card gets compromised while the ESP32 stays in your pocket.
+
+**Replay attacks** are trivial. XTS prevents copying a block from one place to another without scrambling its content, but nothing prevents it from being copied through time: the adversary makes a copy of block N one day, you write newer data to block N, the adversary rewrites the old data to block N, and you have no way to detect the attack because the block is valid. LittleFS coincidentally mitigates this problem, because when modifying a block, it writes the new data to an unused block and modifies the link that points to it, so the old one is now unused. The old block will only be used again after some time, to equalize wear through the entire storage. This requires replay attacks to be more subtle but doesn't make them impossible.
+
+**Data can be scrambled.** Altering encrypted blocks will produce valid garbage plaintexts, which may or may not be detected, depending on what files or filesystem structures are affected. Again LittleFS partly mitigates this issue, because every bit of data is covered by a checksum. A checksum is not a cryptographic tool as it has low entropy and is malleable, and its goal is to detect hardware faults, not attacks. However as XTS is not bitwise malleable, it may contribute to render active attacks harder, as a scrambled block can be marked as faulty.
+
+**Why not authenticate?** We could write authentication tags along the data (e.g. AES-GCM, HMAC), but that would be very expensive to compute. It would also break the 1:1 correspondance between ciphertext blocks and plaintext blocks, that is vital to its performance. We would need either to write all authentication tags to a different partition (out of the filesystem, hence causing performance issues), or to make encryption part of the filesystem itself, which is a lot of work.
+
+## Conclusion
+
+For my project, I will go on with LittleFS over AES128-XTS. Deciding between the one-key or two-key variants will need benchmarking on a more realistic setup. I would also like to make energy consumption measurements to complete the running time benchmarks, and to decide whether Catena or PBKDF2 are worth it.
+
+If you want to know more about filesystem encryption in general, here is [a quick presentation](fse.pdf) I made. [CryptSetup's FAQ](https://gitlab.com/cryptsetup/cryptsetup/-/wikis/FrequentlyAskedQuestions) is also a great source of information for non-cryptographers.