DI Management Home > Cryptography > Breaking XML Encryption

Breaking XML Encryption


This page looks at the attack on XML Encryption in the paper How to break XML encryption [XMLENC-CBC-ATTACK].

In their paper, the authors give a "Toy Example" as a simplified demonstration of how their main attack works. Out of curiousity, we wrote a Python program to implement the attack in the Toy Example. It is surprising how easy it is to attack ciphertexts encrypted in CBC mode if we have an oracle that responds in a certain way to errors.

2020-12-30. In Part 2, we give a Python implementation that carries out a slightly simplified version of the full attack.

The Attack | The Code | The Toy Example | The Real Example | References | Contact us

The Attack

Back in the day (early 2000s) the only encryption algorithms specified in the W3C recommendation XML Encryption Syntax and Processing [XMLENC] were Triple DES and AES both using cipher-block chaining (CBC) confidentiality mode. The paper by Tibor Jager and Juraj Somorovsky demonstrated a very practical attack on XML block encryption when used with CBC mode.

The end result was that the XMLENC recommendation was updated to add the AES-GCM encryption algorithm and to strongly recommend against using block encryption in CBC mode. The GCM mode of encryption includes in-built authentication which protects against the described attack.

How it works

CBC-mode is malleable - flipping a bit in the IV flips the same bit in the first block of plaintext. The attack exploits a subtle correlation between the CBC mode of operation and the character encoding of encrypted text, and the response behaviour of a Web service if an XML message cannot be parsed correctly. The W3C recommended padding for block cipher mode adds to the vulnerability by only using the last byte. The Web service acts as an Oracle when sent related ciphertexts and evaluating its response. The attack relies on having a Web server that will happily respond to unlimited queries.

For full details of the attack, see the paper [XMLENC-CBC-ATTACK]. Note there is a typo in line 5 of Algorithm 1: change O((IV",C(1)))=1 to O((IV",C(1)))=0.

The Code

Here is our code in Python 3 that implements the Toy Example: break-xmlenc.py.

Part 2. And here is our code that carries out the slightly-simplified real attack: break-xmlenc-2.py.

The code uses our CryptoSys PKI Toolkit and Python for CryptoSys PKI. To install, download CryptoSys PKI from CryptoSys PKI download and install the Python module
> pip install cryptosyspki
You can, of course, substitute your own preferred package to carry out the cryptographic operations in the code. The result should be the same.

The Toy Example

In the example code, we use the ciphertext produced by encrypting the message "Now is the time for all good men to come to the aid of their par" using AES-128 in CBC mode with the key 0x0123456789ABCDEFF0E1D2C3B4A59687 and IV 0xFEDCBA9876543210FEDCBA9876543210. By design, the message has a length which is an exact multiple of the AES block size (16 bytes) and consists only of 7-bit ASCII text. The ciphertext in this example is
C3153108A8DD340C0BCB1DFE8D25D2320EE0E66BD2BB4A313FB75C5638E9E177211FC26A1FF51CE35741B76A77DB6DE27435A4C79E56F29BB12B595404C222F1

The "Toy Example" in part 3 of the paper describes a simplified attack which assumes a certain oracle is available. The real attack is based on the same idea. The oracle accepts an IV' (a specially chosen one, not necessarily the original) and the first 16-byte block of ciphertext, C. It decrypts the block using the given IV' and the known key to produce a 16-byte plaintext m, and replies as follows:

The specially-chosen IV' is chosen so that the result of decrypting the block C using it does not contain a zero byte (the authors call this "well-formed"). If the original IV does this then use it, else keep trying random 16-byte IV' values until you find one.

The algorithm proceeds by tweaking each byte in turn of IV' and testing the result with the oracle. We make a new IV" by XOR-ing a byte mask with the j-th byte of IV' and querying the oracle with this adjusted IV". If the oracle returns zero, we know we have tweaked that byte to be a zero value, and we can then recover the j-th byte of the plaintext by a simple XOR calculation. We only need to try byte masks from value 1 to 127, since we know the plaintext consists of ASCII characters. One of them will succeed.

In this manner, we can recover each byte of plaintext from the first 16-byte block C. Using the properties of CBC mode, we can repeat this analysis for the second ciphertext block C2 by setting IV' = C1.

The Real Example

Part 2. The slightly-simplified real example gives Python code to implement the attack described in section 4 of the paper. In the paper, the authors describe slightly-simplified algorithms and make some restrictions on the input plaintext (e.g. no "&" characters). We make another simplification in how our XML "parser" works, which does not take away from showing how the overall algorithm works (it just avoids having to set up a full XML parser that works like the old Apache Axis2).

It is assumed that the input only consists of 7-bit ASCII characters. The ASCII character set is partitioned into Type-A and Type-B characters. Type-A characters are all the illegal XML characters (0x00 to 0x1F excluding 0x09, 0x0A and 0x0D) and the characters 0x3C ">" and 0x26 "&" (these are equivalent to the single Type-A 0x00 character in the Toy Example). Type-B characters are all the 128 ASCII characters (0x00 to 0x7F) excluding the Type-A characters.

References

Contact us

To contact us or comment on this page, please send us a message.

This page first published 10 December 2020. Last updated 4 January 2021