XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (599 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
7.66Mb size Format: txt, pdf, ePub

As a general rule, most software that produces Unicode text (for example, text editors) will produce NFC output most of the time. This is useful, and explains why you don't hear of many people having real-world XPath expressions that fail because of normalization issues. But it's certainly a possibility, and one of the concerns is that it is also a security risk—using the “wrong” representation of characters could be a way of getting round validation software.

The K variants (NFKC and NFKD) differ from NFC and NFD in that they normalize further; specifically, they normalize away distinctions between “compatibility variants” of characters. These compatibility variants exist because Unicode was created as the union of many different preexisting character sets. The designers had to make the decision whether two characters in different character sets were really representations of the same character. The problem in merging two characters into one is that it would lose information when data is converted into Unicode and then back again—the original data stream could not necessarily be reconstituted. So Unicode adopted the approach of allowing multiple representations of a character as
compatibility variants
. The distinction between the letter
å
and the ångstrom symbol is an example of this phenomenon; normalization forms NFKC and NFKD eliminate the distinction between these two characters. Another example is the distinction between the two characters
f
i
and the single character
fi
(really just a graphical visualization of the two separate characters, but recognized as a single character for the benefit of typesetting applications). Another one (and here the “loss of information” argument starts to become significant) is the distinction between the superscript digits
2
and
3
and the ordinary digits
2

Other books

Tap (Lovibond #1) by Georgia Cates
Sin Incarnate by Archer, T. C.
The Invisible Harry by Marthe Jocelyn