diff --git a/config.toml b/config.toml index 891cff7..1594fe9 100644 --- a/config.toml +++ b/config.toml @@ -8,6 +8,9 @@ default_language = "en" minify_html = false +generate_feeds = true +author = "tuxmain" + [slugify] paths = "off" taxonomies = "off" diff --git a/content/blog/_index.eo.md b/content/blog/_index.eo.md index cd910c8..8f0f455 100644 --- a/content/blog/_index.eo.md +++ b/content/blog/_index.eo.md @@ -3,4 +3,5 @@ page_template = "blog.html" title = "Blogo" sort_by = "date" insert_anchor_links = "left" +generate_feeds = true +++ diff --git a/content/blog/_index.fr.md b/content/blog/_index.fr.md index 5e6f5f5..bb1c6c3 100644 --- a/content/blog/_index.fr.md +++ b/content/blog/_index.fr.md @@ -3,4 +3,5 @@ page_template = "blog.html" title = "Blog" sort_by = "date" insert_anchor_links = "left" +generate_feeds = true +++ diff --git a/content/blog/_index.md b/content/blog/_index.md index 5e6f5f5..bb1c6c3 100644 --- a/content/blog/_index.md +++ b/content/blog/_index.md @@ -3,4 +3,5 @@ page_template = "blog.html" title = "Blog" sort_by = "date" insert_anchor_links = "left" +generate_feeds = true +++ diff --git a/content/blog/flash-filesystem-encryption/graph.py b/content/blog/flash-filesystem-encryption/diagram.py similarity index 99% rename from content/blog/flash-filesystem-encryption/graph.py rename to content/blog/flash-filesystem-encryption/diagram.py index b4edd00..ff5cb58 100644 --- a/content/blog/flash-filesystem-encryption/graph.py +++ b/content/blog/flash-filesystem-encryption/diagram.py @@ -14,7 +14,7 @@ ARGS = { } SVG = """\ - + {title}
XTS
$$C = E(K_1, P \oplus \Delta) \oplus \Delta$$ $$\Delta = E(K_2, i) \times \alpha^j$$ +$$X \times \alpha = (X \ll 1) \oplus (MSB(X) \cdot 135)$$ -Here, the storage is divided into sectors and sectors into blocks. In the diagram, i is the sector number and j is the block number. +Here, the storage is divided into sectors and sectors into blocks. In the diagram, i is the sector number and j is the block number. $\ll$ is left bitshift and MSB is the most significant bit. -Why so complicated? First, $E(K_2, i)$ looks like CTR. To make it faster, it remains constant through the entire sector (which is useful because LittleFS prefers to read or write contiguous blocks when possible). Multiplication by $\alpha$ (as defined later) is faster than a block encryption and can be computed incrementally with $x \times \alpha^j = (x \times \alpha^{j-1}) \times \alpha$. The double XOR prevents attacks on chosen ciphertext or known plaintext as described before. +Why so complicated? First, $E(K_2, i)$ looks like CTR. To make it faster, it remains constant through the entire sector (which is useful because LittleFS prefers to read or write contiguous blocks when possible). Multiplication by $\alpha$ is faster than a block encryption and can be computed iteratively with $x \times \alpha^j = (x \times \alpha^{j-1}) \times \alpha$. The double XOR prevents attacks on chosen ciphertext or known plaintext as described before. XTS has a way to deal with final partial blocks (when data length is not a multiple of block size), but as we're encrypting full blocks of 16 bytes only, we don't need that mechanism. [Rogaway 2011](https://www.cs.ucdavis.edu/~rogaway/papers/modes.pdf) criticized XTS on multiple points. -* XTS is based on a modified version of Rogaway's XEX mode (XOR-Encrypt-Xor) which has well understood security properties. +* XTS is based on a modified version of Rogaway's XEX mode (XOR-Encrypt-XOR) which has well understood security properties. * Ciphertext stealing, the way to deal with final partial blocks, is poorly designed or at least not proven secure under well-defined security goals. Again, we are not concerned. * The use of two different keys is unjustified, except it makes proofs easier. If the sector number i is xored with a secret random salt, there is no risk of collision between the inputs of the two cipher blocks, as long as we do not store ciphertexts of the secret key or the salt (they should be user inputs stored in volatile memory only). * It is a FIPS (NIST standard) but only specified in an IEEE spec that is seemingly not available publicly (unless using Sci-Hub of course). -* $\Delta$ is byte-swapped to make implementation easier on little-endian machines, but this has no security implications. +* In the original definition, $\Delta$ is byte-swapped to make implementation easier on little-endian machines, but this has no security implications. ## Benchmarking ciphers -I implemented the simplified XEX in Rust and ran a benchmark on the ESP32. As the multiplication by powers of alpha can be implemented in many ways, I also tried different versions. +I implemented XTS in Rust and ran a benchmark on the ESP32. As the multiplication by powers of alpha can be implemented in many ways, I also tried different versions. First version, delta is an unaligned array of bytes, cast to u128 to do the maths: @@ -177,9 +188,11 @@ Here are the benchmark results (encrypting 100 times 128kB): The fastest is XTS with one key (and salted sector number) and long sectors. -Sectors must not be too long, however, as random access needs computing all +Sectors must not be too long, however, as random access to block j needs computing all j successive powers of $\alpha$. 32 blocks may be a good value, as it matches flash erase size. -## Storing the key +## The key + +### Deriving the key from a password AES128 needs 128 bits of key, however the user will only remember ASCII words, not fully random bytes. We need something to derive a key from a variable-length password. We could just compute a hash of the password, as the ESP32 provides a hardware implementation of SHA2, but for storing passwords it is better to use a dedicated function that is fast enough to run once but hard to bruteforce efficiently on optimized systems. @@ -187,6 +200,28 @@ AES128 needs 128 bits of key, however the user will only remember ASCII words, n A popular choice as of today is [Argon2](https://en.wikipedia.org/wiki/Argon2), which is memory-hard: one instance requires efficient access to a big amoung of memory, potentially megabytes or even gigabytes, so it is difficult to optimize even on dedicated hardware. Problems are that its implementation is quite complicated (it will take too much ROM) and its specs are not even complete. -[Catena](https://www.researchgate.net/publication/261548591_The_Catena_Password_Scrambler) is a scheme with similar properties but with a very simple description. It takes less than 50 lines of Rust. To run on the ESP32, I used SHA256 and set its memory usage to 128kB and 1024 iterations. In comparison, recommended parameters are between 67MB and 1GB with 3 or 4 iterations. It runs in 911ms. We can expect a speedup of more than 10 on a good CPU, and it still can be parallelized easily on an old GPU: if your GPU has 1GB of RAM, it can hold at most 8192 parallel instances. +[Catena](https://www.researchgate.net/publication/261548591_The_Catena_Password_Scrambler) is a scheme with similar properties but with a very simple description. It takes less than 50 lines of Rust. To run on the ESP32 (and its 256kB RAM), I used SHA256 and set its memory usage to 128kB and 1024 iterations. In comparison, recommended parameters are between 67MB and 1GB with 3 or 4 iterations. It runs in 911ms. We can expect a speedup of more than 10 on a good CPU, and it still can be parallelized easily on an old GPU: if your GPU has 1GB of RAM, it can hold at most 8192 parallel instances. The benefit of password hashing functions on the ESP32 is a bit disappointing, we only slow down attacks by a small factor. It seems easier to enforce strong passwords. Picking 10 random words from a [BIP39](https://github.com/bitcoin/bips/blob/04b448b599cb16beae40ba9a98df9f262da522f7/bip-0039/english.txt) wordlist gives $\log_2(2048^{10})=110$ bits of entropy. To make it faster to type, each word can be shortened to its 4 first letters without loosing entropy. + +### Storing the key + +It can be useful to use two keys: the first one, derived from the password, is used to encrypt the second key, which is written to the storage. The second key is use to encrypt the filesystem. This way, the password can be changed, as the second key does not depend on it. If you have to destroy the data in a hurry and you have a reason to think someone with a gun may force you to hand over the password, you just have to erase the stored key. + +## Active attacks and authentication + +Assumptions and security goals about malleability are debatable. Lack of authentication allows many attacks which are inherently hard to counter when encrypting a filesystem. + +If an adversary **steals your device**, they may copy your encrypted data before handling it back to you. They may as well install a keylogger in the program memory. In this case, you should ideally copy your data, destroy the potentially compromised device and install a fresh one. One motivation to still consider defending against this attack is that in our context, the executable code is stored in the ESP32 meanwhile the data are in the SD card, so it is possible that the SD card gets compromised while the ESP32 stays in your pocket. + +**Replay attacks** are trivial. XTS prevents copying a block from one place to another without scrambling its content, but nothing prevents it from being copied through time: the adversary makes a copy of block N one day, you write newer data to block N, the adversary rewrites the old data to block N, and you have no way to detect the attack because the block is valid. LittleFS coincidentally mitigates this problem, because when modifying a block, it writes the new data to an unused block and modifies the link that points to it, so the old one is now unused. The old block will only be used again after some time, to equalize wear through the entire storage. This requires replay attacks to be more subtle but doesn't make them impossible. + +**Data can be scrambled.** Altering encrypted blocks will produce valid garbage plaintexts, which may or may not be detected, depending on what files or filesystem structures are affected. Again LittleFS partly mitigates this issue, because every bit of data is covered by a checksum. A checksum is not a cryptographic tool as it has low entropy and is malleable, and its goal is to detect hardware faults, not attacks. However as XTS is not bitwise malleable, it may contribute to render active attacks harder, as a scrambled block can be marked as faulty. + +**Why not authenticate?** We could write authentication tags along the data (e.g. AES-GCM, HMAC), but that would be very expensive to compute. It would also break the 1:1 correspondance between ciphertext blocks and plaintext blocks, that is vital to its performance. We would need either to write all authentication tags to a different partition (out of the filesystem, hence causing performance issues), or to make encryption part of the filesystem itself, which is a lot of work. + +## Conclusion + +For my project, I will go on with LittleFS over AES128-XTS. Deciding between the one-key or two-key variants will need benchmarking on a more realistic setup. I would also like to make energy consumption measurements to complete the running time benchmarks, and to decide whether Catena or PBKDF2 are worth it. + +If you want to know more about filesystem encryption in general, here is [a quick presentation](fse.pdf) I made. [CryptSetup's FAQ](https://gitlab.com/cryptsetup/cryptsetup/-/wikis/FrequentlyAskedQuestions) is also a great source of information for non-cryptographers. diff --git a/sass/css/_content.scss b/sass/css/_content.scss index db3f9ee..affe467 100644 --- a/sass/css/_content.scss +++ b/sass/css/_content.scss @@ -49,6 +49,7 @@ code, pre { font-family: "Fira Code", monospace, monospace; font-variant-ligatures: none; font-size: 14px; + tab-size: 4; } code {