Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Signature
Argument | Type | Meaning |
value | xs:string | The input string, to which percent-encoding is to be applied |
Result | xs:string | The percent-encoded string |
Effect
The result string is formed from the input string by escaping special characters according to the rules defined in RFC 3986, (
http://www.ietf.org/rfc/rfc3986.txt
). Special characters are escaped by first encoding them in UTF-8, then representing each byte of the UTF-8 encoding in the form %HH where HH represents the byte as two hexadecimal digits. The digits A–F are always in upper case.
All characters are escaped except the following:
Examples
Expression | Result |
encode-for-uri(“simple.xml”) | “simple.xml” |
encode-for-uri(“my doc.xml”) | “my%20doc.xml” |
encode-for-uri(“f+o.pdf”) | “f%2Bo.pdf” |
encode-for-uri(“Grüße.html”) | “Gr%C3%BC%C3%9Fe.html” |
Usage
This function is designed for use by applications that need to construct URIs.
The rules for URIs (given in RFC3986,
http://www.ietf.org/rfc/rfc3986.txt
) make it clear that a string in which special characters have not been escaped is not a valid URI. In many contexts where URIs are required, both in XPath functions such as the
doc()
function and in places such as the
href
attribute of the
element in HTML, the URI should in theory be fully escaped according to these rules. In practice, software is very often tolerant and accepts unescaped URIs, but applications shouldn't rely on this.
The rules for escaping special characters (officially called
percent-encoding
) are rather peculiar. To escape a character, it is first encoded in UTF-8, which in general represents a character as one or more octets (bytes). Each of these bytes is then substituted into the string using the notation
%HH
, where
HH
is the value of the byte in hexadecimal. For example, the space character is represented as
%20
, and the euro symbol as
%E2%82%AC
. Although RFC 3986 allows the hexadecimal digits
A-F
to be in either upper or lower case, the
encode-for-uri()
function mandates upper case, to ensure that escaped URIs can be compared as strings.