DI Management Home > Cryptography > Signing an XML document using XMLDSIG (Part 1)

Signing an XML document using XMLDSIG (Part 1)

This page demonstrates how to create a digital signature in XML. This is a simple [sic] example of an enveloping signature where we sign a straightforward text string inside an XML document. If you want information on encryption in XML documents, see Encryption in XML documents using XMLENC.

New2022-02-24 For a more general treatment of XML-DSIG with examples, see Part 3
2012-05-09: For an example of a enveloped signature, see Part 2.
2017-06-28: See Canonicalization of an XML document for a more detailed how-to guide for canonicalization (C14N) of an XML document prior to signing, and
2017-07-11: SC14N, a straightforward XML canonicalization utility.
See Using SC14N to compute the digest of the input text string directly and Using SC14N to compute the digest of the SignedInfo directly.

To make a digital signature, you need a private key. Our example uses the 1024-bit RSA private key for Alice from RFC 4134 [SMIME-EX]. We use our CryptoSys PKI Toolkit to carry out the necessary computations. We treat an XML document as a simple text file and avoid using any of those frightful, unwieldy XML "DOM" packages.

We give full details of the exact data to be processed at each stage in order to produce the final signed XML document. We hope this is in sufficient detail to help you implement your own version.

For advanced users: If this is too simple for you, see our page on XML-Dsig and the Chile SII where we look in detail at creating digital signatures in XML documents using the standards for electronic invoices set by the Servicio de Impuestos Internos (SII) of Chile. There are some useful hints and generic functions in VB6 and C# to create <SignedInfo> elements for XML-Dsig.

2012-10-01: See How to create a SAT Cancelacion document an enveloped XML-DSIG document with the namespace http://cancelacfd.sat.gob.mx issued by the Servicio de Administración Tributaria (SAT) in Mexico.
UpdatedUpdated 2022-01-29.

See also Accented characters and UTF-8 in XML-DSIG signatures where we look at a simple example to create an XML-DSIG signature of an XML document containing accented characters like áéíóúñ


Foreword | Download | Testing | Input | Output | Procedure | Message Digests | Canonicalization | References


  >>I have some questions related to XML-Dsig:
  >Argghh!! Run away!

  A near-universal reaction.

- from Why XML Security is Broken by Peter Gutmann. For another rant, see our page XML is xhite.


Here is the VB6 code, the output XML file, Alice's PKCS#8 encrypted private key (password: "password"), her corresponding X.509 certificate, and all these files collected in a zip file.


In this example we create the digital signature for the text

some text
  with spaces and CR-LF.

That is, the 35 bytes beginning with 's', 'o', 'm',... and ending with ...,'L', 'F', '.'. There is exactly one CR-LF newline (the two-byte sequence (0x)0D 0A) in the text, between the two lines. There are two spaces before the word "with". There is no newline at the end. In hexadecimal format, the text is

73 6F 6D 65 20 74 65 78 74 0D 0A 
20 20 77 69 74 68 20 73 70 61 63 65 73 20 61 6E 64 20 43 52 2D 4C 46 2E 


Output XML file (1 kB).

<?xml version="1.0" encoding="UTF-8"?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315" />
  <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1" />
  <Reference URI="#object">
    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" />
<Object Id="object">some text
  with spaces and CR-LF.</Object>

Note that the whitespace inside the <SignedInfo> and <Object> elements is important and should not be changed.


Test with the Online XML Digital Signature Verifer.

New2022-03-20: See Troubleshooting problems on the 'Online XML Digital Signature Verifier' site.


Algorithm: XMLDSIG of simple text string.
T, text-to-be-signed, a byte string;
Ks, RSA private key;
OUTPUT: XML file, xml
  1. Canonicalize* the text-to-be-signed, C = C14n(T).
  2. Compute the message digest of the canonicalized text, m = Hash(C).
  3. Encapsulate the message digest in an XML <SignedInfo> element, SI, in canonicalized form.
  4. Compute the RSA signatureValue of the canonicalized <SignedInfo> element, SV = RsaSign(Ks, SI).
  5. Compose the final XML document including the signatureValue, this time in non-canonicalized form.

* Strictly, what we are doing here is encapsulating the text string T inside an <Object> element, then canonicalizing that element.

Message Digests

There are two message digests to compute. The input to these two computations has to be exactly correct or you will get the wrong result. We use the SHA-1 message digest function, which outputs a hash value 20 bytes long.

Digest of the input text string

Form the canonicalized <Object> element with all CR-LF pairs ((0x)0D 0A) in the text converted to single LF characters (0x0A). In this case there is no newline after the text, so the closing tag comes directly after the '.' character in the text string. Note we have added the xmlns attribute, which exists here but not in the original or final document. This attribute is propagated from the parent <Signature> element.

<Object xmlns="http://www.w3.org/2000/09/xmldsig#" Id="object">some text
  with spaces and CR-LF.</Object>

and compute the message digest of the byte string beginning '<', 'O', 'b',... and ending ...,'e','c', 't', '>'

000000  3c 4f 62 6a 65 63 74 20 78 6d 6c 6e 73 3d 22 68  <Object xmlns="h
000010  74 74 70 3a 2f 2f 77 77 77 2e 77 33 2e 6f 72 67  ttp://www.w3.org
000020  2f 32 30 30 30 2f 30 39 2f 78 6d 6c 64 73 69 67  /2000/09/xmldsig
000030  23 22 20 49 64 3d 22 6f 62 6a 65 63 74 22 3e 73  #" Id="object">s
000040  6f 6d 65 20 74 65 78 74 0a 20 20 77 69 74 68 20  ome text.  with
000050  73 70 61 63 65 73 20 61 6e 64 20 43 52 2d 4c 46  spaces and CR-LF
000060  2e 3c 2f 4f 62 6a 65 63 74 3e                    .</Object>

The exact byte string in this case to be digested is (in hex format)



Using SC14N to compute the digest of the input text string directly

Using SC14N on the XML file: Transform the subset for element with Id="object" and compute digest value of this using default SHA-1.

> sc14n -d -S object XmlAliceSig-base.xml

In C#:

string digval = Sc14n.C14n.ToDigest("XmlAliceSig-base.xml", "object", Tran.SubsetById, DigAlg.Sha1);

Digest of the SignedInfo

Form the canonicalized <SignedInfo> element. Note the xmlns attribute which we include here, but not in the final document. This is propagated down from the parent <Signature> element.

<SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
  <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod>
  <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"></SignatureMethod>
  <Reference URI="#object">
    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod>

In hex format, the byte string is


The message digest of this is in hex is 5AC8EFAB045A9A46FE001AC58C253646FF88DC6A or WsjvqwRamkb+ABrFjCU2Rv+I3Go= in base64.

Using SC14N to compute the digest of the SignedInfo directly

Using SC14N on the XML file: Transform the subset for element with tag name SignedInfo and compute digest value of this using default SHA-1.

> sc14n -d -s SignedInfo XmlAliceSig.xml

In C#:

string digval = Sc14n.C14n.ToDigest("XmlAliceSig.xml", "SignedInfo", Tran.SubsetByTag, DigAlg.Sha1);

Actually, this digest value is not output directly. It is computed and then encrypted as part of the signature value calculation. But to verify the signature you need to be able to re-create it. (Thanks to Marcos Paulo Pereira Brito Garcia for pointing out an error in an early version of this.)

The byte string of the <SignedInfo> element is input to the sha1WithRSAEncryption signature algorithm and signed with Alice's private RSA key to produce the 1024-bit RSA signatureValue in hex format


In base64 this is


Update 2017-08-13: See some code to compute this signature value.

Comment on SignedInfo

In the composition of the <SignedInfo> element above, we added some space characters before the lines, to add to readability. These space characters must be preserved in both the canonicalized version and the final XML document. It gets even messier if you use tab characters (0x09) because, if they get changed later into space characters, you will fail to get the correct signature value.

It is better practice to form the <SignedInfo> element with no whitespace before the elements and just a single newline after each line, as follows:

<SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
<CanonicalizationMethod Algorithm="..."></CanonicalizationMethod>
<SignatureMethod Algorithm="..."></SignatureMethod>
<Reference URI="...">
<DigestMethod Algorithm="..."></DigestMethod>

Note, though, that this will give a different signature value than our example above. If, at this stage, you are thinking, "But isn't that a rather stupid procedure if it can be messed up so easily?", you would not be wrong...

Canonicalization (c14n)

Canonicalization is a method for generating a physical representation, the canonical form, of an XML document that accounts for permissible syntactic changes.

In other words, no matter what (legal) changes you could make to a given XML document, the canonical form will always be identical, byte-for-byte.

The cute abbreviation for canonicalization is c14n denoting that there are 14 characters between the "c" and the "n" in a word that is obviously too long to begin with.

Note that the canonicalized data does not appear in the original or final XML document. It is composed in memory and a message digest or RSA signature value is computed from it.

This is the official (2001) outline of the procedure for c14n, taken from [XML-C14N]:

  1. The document is encoded in UTF-8
  2. Line breaks normalized to #xA on input, before parsing
  3. Attribute values are normalized, as if by a validating processor
  4. Character and parsed entity references are replaced
  5. CDATA sections are replaced with their character content
  6. The XML declaration and document type declaration (DTD) are removed
  7. Empty elements are converted to start-end tag pairs
  8. Whitespace outside of the document element and within start and end tags is normalized
  9. All whitespace in character content is retained (excluding characters removed during line feed normalization)
  10. Attribute value delimiters are set to quotation marks (double quotes)
  11. Special characters in attribute values and character content are replaced by character references
  12. Superfluous namespace declarations are removed from each element
  13. Default attributes are added to each element
  14. Lexicographic order is imposed on the namespace declarations and attributes of each element

Simple, eh?

To make it even worse, the rules above are for a complete XML document. When you are canonicalizing a Subset of a document, like we are doing here, you have to propagate the xml namespaces from the parent elements that have been omitted (unless you are using Exclusive XML Canonicalization (xml-exc-c14n), which we are not!). The merged xmlns attributes then have to be sorted in a certain order. In this example, the <Object> and <SignedInfo> elements inherit the xmlns attribute from their omitted parent <Signature>.

In our example here, it was sufficient just to replace any CR-LF (0x0D 0A) line break with a single LF (0x0A) character (point 2 above). All other issues were dealt with by simply hardcoding the necessary XML tags and attributes in our variable strings.

Other c14n issues

Given a simple text string input, and the fact that we are composing our own XML document instead of dealing with an existing one, the two other issues that we are most likely to have to deal with are UTF-8 encoding (point 1 above) and entity references (point 4):

UTF-8 encoding
If our text-to-be-signed string, T, contains any non-ASCII characters, make sure these are converted to UTF-8 encoding.

For example, the character á (small letter a with acute accent) is encoded in the ISO-8859-1 character set (Latin-1) as the single byte value 225 (0xE1). This is not an ASCII character, as it has a value greater than 127. Such characters need to be converted to UTF-8 encoding. In this case, the byte 0xE1 must be represented as the two-byte UTF-8 sequence (0x)C3 A1.

In CryptoSys PKI, use the CNV_UTF8BytesFromLatin1 function to convert a string containing Latin-1 characters to proper UTF-8.

With Notepad++, use menu options
  • Edit > EOL Conversion > Unix (LF)
  • Encoding > Convert to UTF-8
then save.
Entity references
All occurences of the following characters in element content:
  • the ampersand (&),
  • the less than symbol (<),
  • the greater than symbol (>),
  • the quotation mark or double quote ("), and
  • the apostrophe or single quote (')
must be escaped in the form &amp; &lt; &gt; &quot; &apos; respectively. This only applies to characters inside an element's content, not the tags themselves.

So, for example, the 8-byte string <x>&</x> ((0x)3C783E263C2F783E) is transformed to the 12-byte string <x>&amp;</x> ((0x)3C783E26616D703B3C2F783E).

These two issues should cover almost all instances for a simple text string.

New2017-06-28: See Canonicalization of an XML document for a more detailed how-to guide for canonicalization (C14N) of an XML document prior to signing.

NewRe-released 2018-08-09: Our new program SC14N, a straightforward XML canonicalization utility performs the canonicalization (C14N) transformation you need to do when creating signed XML documents using XML-DSIG.

For some more examples, see the section Canonicalizing the SII elements on our XML-Dsig and the Chile SII page.



For more information, or to comment on this page, please send us a message.

This page last updated 15 November 2022