Where Is XML Used in Practice? – Blog Series – Part #02
To find out which systems can be vulnerable to attacks based on XML, it is essential to investigate in which scenarios the technology XML is used. In the last post in this series, we explained the basic concepts of XML and attacks such as XXE. In this second part, we would like to discuss the most common uses of XML to enable you to effectively identify and fix potential vulnerabilities in your systems.
Web APIs and Communication
AJAX and Fetch API
Similar to “REST” [3], Asynchronous JavaScript and XML (AJAX) is not a technology per se, but a general concept. AJAX describes how different technologies can be used together to make websites interactive. Although the name implies the use of XML, nowadays JSON [2] is mostly used for data exchange. JSON offers the advantage that the format can be more easily (de)serialized into JavaScript objects [1].
Since 2015, there is a successor for the use of AJAX, the Fetch API [4]. This enables very similar functionalities, but is based on a more modern programming syntax of JavaScript and is better adapted to today’s usual usage. Moreover, the Fetch API prefers JSON as the format for data exchange and provides a special API call for importing JSON [5]. Nevertheless, it is possible to use XML as the format for data exchange via the Fetch API.
When such systems are used with XML, the server must support endpoints that accept and process data in XML format. As noted in the last post in this series, Shodan—a search engine for Internet-enabled devices—currently lists more than three million devices that claim to understand XML. An attacker can send modified XML requests to these endpoints that contain a payload for one of the many XML-based attacks.
SOAP Protocol
SOAP [15] is a protocol for the exchange of data, which is often used for web services. SOAP itself defines a structure for the transmitted messages in XML format. The so-called envelope that SOAP defines can be used to send arbitrary requests or data to the receiver. An example of a SOAP envelope can be seen in Listing 1.
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <m:GetLastTradePriceResponse xmlns:m="Some-URI"> <Price>34.5</Price> </m:GetLastTradePriceResponse> </SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Envelope – An Example [15]. (Listing 1)
Furthermore, SOAP often serves as the basis for other protocols, among others SAML (see next section) or AS4 [16]—a protocol for the exchange of business documents between companies (business-to-business). In general, SOAP occurs more in the background nowadays, as many client side APIs rely on JSON for data exchange. Shodan lists nearly seven million servers that use SOAP.
SAML Protocol
The Security Assertion Markup Language (SAML) [12] defines an XML-based protocol for authentication and authorization of users. SAML builds on the use of SOAP and uses an XML-based format to exchange data. The use of SAML is widespread and the protocol is supported by many commercial products as well as many open source solutions in the single sign-on area [13].
Listing 2 shows an example of an AuthNRequest
. This is a request from a service provider to authenticate a user using SAML.
<samlp:AuthnRequest xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" ID="ONELOGIN_809707f0030a5d00620c9d9df97f627afe9dcc24" Version="2.0" ProviderName="SP test" IssueInstant="2014-07-16T23:52:45Z" Destination="http://idp.example.com/SSOService.php" ProtocolBinding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" AssertionConsumerServiceURL="http://sp.example.com/demo1/index.php?acs"> <saml:Issuer>http://sp.example.com/demo1/metadata.php</saml:Issuer> <samlp:NameIDPolicy Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress" AllowCreate="true" /> <samlp:RequestedAuthnContext Comparison="exact"> <saml:AuthnContextClassRef> urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport </saml:AuthnContextClassRef> </samlp:RequestedAuthnContext> </samlp:AuthnRequest>
SAML AuthNRequest
– An Example [14]. (Listing 2)
XML File Formats?
Microsoft Office and OpenDocument
The file formats used by Microsoft's Office Suite—for example .docx, .xlsx
, or .pptx
—are based on the Open Packaging Conventions (OPC) [7] specification. This defines a ZIP archive which is structured by several XML files. An example of one of these XML files is shown in Listing 3. Many of the other file types based on OPC also define most of their content in XML format.
<?xml version='1.0' encoding='utf-8'?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <w:body> <w:p> <w:pPr> <w:pStyle w:val="Heading 1"/> <w:pageBreakBefore w:val="off"/> </w:pPr> <w:r> <w:bookmarkStart w:id="1" w:name="BookTitle"/> <w:t></w:t> <w:bookmarkEnd w:id="1"/> <w:t>Inet Magazine No. 2</w:t> </w:r> </w:p> ... </w:body> </w:document>
.docx
File Example – Beginning of a document.xml
file, which defines the main document within a .docx
file [8]. (Listing 3)
The older Microsoft Office file formats—.doc
, .xls
, or .ppt
—also use a similar structure to an OPC document. While the main document is stored as a binary blob in these file formats, the XML documents describing the structure may be vulnerable to attacks via XML.
The OpenDocument file formats (e.g., .odt
, .ods
, or .odp
) are also based on XML. Similar to the Microsoft Office file formats, these are based on a ZIP archive that contains XML files.
RSS and SVG
RSS [9] and SVG [10] are both file formats that are directly based on XML. This means that technically both file formats are XML files with a specified structure. Both formats are commonly used on the Internet and are used and processed in various external applications. Examples of both file types can be seen in Listings 4 and 5.
<?xml version="1.0"?> <rss version="2.0"> <channel> <title>Hackmanit Cyber Security Blog</title> <link>https://www.hackmanit.de/en/blog-en</link> <description>Articles on various IT security related topics.</description> ... <item> <title>How to Secure APIs?</title> <link>https://www.hackmanit.de/en/blog-en/155-how-to-secure-apis</link> <description>APIs can provide critical functionalities and information; hence their security is a crucial aspect...</description> <pubDate>Mon, 11 Jul 2022</pubDate> </item> ... </channel> </rss>
RSS File – An Example. (Listing 4)
<?xml version="1.0" standalone="yes"?> <parent xmlns="http://example.org" xmlns:svg="http://www.w3.org/2000/svg"> <!-- parent contents here --> <svg:svg width="4cm" height="8cm"> <svg:ellipse cx="2cm" cy="4cm" rx="2cm" ry="1cm" /> </svg:svg> ... </parent>
SVG File – An Example [11]. (Listing 5)
... and Many More
There are many other file formats that use XML internally [6]. If you want to find out whether a file format you are using contains XML, it is a good idea to look in a file format database such as https://fileinfo.com/. These databases often contain information about how a document is structured and what types of data it contains. If this does not provide the desired results, it is often necessary to do your own research, for example by reading the specification of the respective file format.
Part #03 – Finding XXE Attacks in 3 Steps
With this overview of where—and in which form—you can find XML, you have a solid foundation for recognizing XML in practice. The next part of this blog series on the XML format will be about detecting actual XML security vulnerabilities. We will explain this with a practical example of XXE attacks.
Blog Series – What Are the Risks of XML? – All parts at a glance
Part #01 – XML – An Overview
Part #02 – Where Is XML Used in Practice?
Part #03 – Finding XXE Attacks in 3 Steps ---> Soon
Follow us on X (Twitter) or Linkedin and don't miss any of our future blog posts.
Footnotes
¹ https://www.shodan.io/search?query=%22Content-Type%3A+text%2Fxml%22 searches for all entries in the database which have Content-Type: text/xml in the HTTP header.
² https://www.shodan.io/search?query=%22Server%3A+gSOAP%22.
³ A list of file formats can be found, for example, at https://en.wikipedia.org/wiki/Open_Packaging_Conventions.
Sources
[1] https://developer.mozilla.org/en-US/docs/Web/Guide/AJAX
[2] https://www.json.org/json-en.html
[3] https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
[4] https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API
[5] https://developer.mozilla.org/en-US/docs/Web/API/Response#methods
[6] http://fileformats.archiveteam.org/wiki/Category:XML_based_file_formats
[7] ISO/IEC 29500-2:2021 – Document description and processing languages - Office Open XML file formats - Part 2: Open packaging conventions https://www.iso.org/standard/77818.html
[8] https://telparia.com/fileFormatSamples/document/docx/
[9] hhttps://www.rssboard.org/rss-specification
[10] https://www.w3.org/TR/SVG/
[11] https://www.w3.org/TR/SVG/struct.html
[12] http://docs.oasis-open.org/security/saml/v2.0/sstc-saml-approved-errata-2.0.pdf
[13] https://en.wikipedia.org/wiki/SAML-based_products_and_services
[14] https://www.samltool.com/generic_sso_req.php
[15] http://www.w3.org/TR/SOAP/
[16] http://docs.oasis-open.org/ebxml-msg/ebms/v3.0/profiles/AS4-profile/v1.0/os/AS4-profile-v1.0-os.pdf
Our Experts Develop the Optimal Solution for You
XML Parsing – XML Security – SOAP
Are you faced with the decision of how to securely process XML and optimally protect your customer data? Or are you already using XML and wondering if your implementation is secure?
We will be glad to advise you; contact us for a no-obligation initial consultation. We support you with the following services and solutions:
IT Security Consulting | Training | Penetration Tests
Don't hesitate and find your way to secure APIs with us. We look forward to supporting you with your projects.
Your Contact for XML Security and SOAP
Prof. Dr. Juraj Somorovsky
juraj.somorovsky@hackmanit.de