Today we will learn how to convert XML to JSON and XML to Dict in python. We can use python xmltodict
module to read XML file and convert it to Dict or JSON data. We can also stream over large XML files and convert them to Dictionary. Before stepping into the coding part, let’s first understand why XML conversion is necessary.
XML files have slowly become obsolete but there are pretty large systems on the web that still uses this format. XML is heavier than JSON and so, most developers prefer the latter in their applications. When applications need to understand the XML provided by any source, it can be a tedious task to convert it to JSON. The xmltodict
module in Python makes this task extremely easy and straightforward to perform.
We can get started with xmltodict
module but we need to install it first. We will mainly use pip to perform the installation.
Here is how we can install the xmltodict module using Python Package Index (pip):
pip install xmltodict
This will be done quickly as xmltodict
is a very light weight module. Here is the output for this installation: The best thing about this installation was that this module is not dependent on any other external module and so, it is light-weight and avoids any version conflicts. Just to demonstrate, on Debian based systems, this module can be easily installed using the apt
tool:
sudo apt install python-xmltodict
Another plus point is that this module has an official Debian package.
The best place to start trying this module will be to perform an operation it was made to perform primarily, to perform XML to JSON conversions. Let’s look at a code snippet on how this can be done:
import xmltodict
import pprint
import json
my_xml = """
<audience>
<id what="attribute">123</id>
<name>Shubham</name>
</audience>
"""
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(xmltodict.parse(my_xml)))
Let’s see the output for this program: Here, we simply use the parse(...)
function to convert XML data to JSON and then we use the json
module to print JSON in a better format.
Keeping XML data in the code itself is neither always possible nor it is realistic. Usually, we keep our data in either database or some files. We can directly pick files and convert them to JSON as well. Let’s look at a code snippet how we can perform the conversion with an XML file:
import xmltodict
import pprint
import json
with open('person.xml') as fd:
doc = xmltodict.parse(fd.read())
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(doc))
Let’s see the output for this program: Here, we used another module pprint to print the output in a formatted manner. Apart from that, using the open(...)
function was straightforward, we used it get a File descriptor and then parsed the file into a JSON object.
As the module name suggest itself, xmltodict actually converts the XML data we provide to just a simply Python dictionary. So, we can simply access the data with the dictionary keys as well. Here is a sample program:
import xmltodict
import pprint
import json
my_xml = """
<audience>
<id what="attribute">123</id>
<name>Shubham</name>
</audience>
"""
my_dict = xmltodict.parse(my_xml)
print(my_dict['audience']['id'])
print(my_dict['audience']['id']['@what'])
Let’s see the output for this program: So, the tags can be used as the keys along with the attribute keys as well. The attribute keys just need to be prefixed with the @
symbol.
In XML data, we usually have a set of namespaces which defines the scope of the data provided by the XML file. While converting to the JSON format, it is then necessary that these namespaces persist in the JSON format as well. Let us consider this sample XML file:
<root xmlns="https://defaultns.com/"
xmlns:a="https://a.com/">
<audience>
<id what="attribute">123</id>
<name>Shubham</name>
</audience>
</root>
Here is a sample program on how we can include XML namespaces in the JSON format as well:
import xmltodict
import pprint
import json
with open('person.xml') as fd:
doc = xmltodict.parse(fd.read(), process_namespaces=True)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(doc))
Let’s see the output for this program:
ALthough converting from XML to JSON is the prime objective of this module, xmltodict also supports doing the reverse operation, converting JSON to XML form. We will provide the JSON data in program itself. Here is a sample program:
import xmltodict
student = {
"data" : {
"name" : "Shubham",
"marks" : {
"math" : 92,
"english" : 99
},
"id" : "s387hs3"
}
}
print(xmltodict.unparse(student, pretty=True))
Let’s see the output for this program: Please note that giving a single JSON key is necessary for this to work correctly. If we consider that we modify our program to contain multiple JSON keys at the very first level of data like:
import xmltodict
student = {
"name" : "Shubham",
"marks" : {
"math" : 92,
"english" : 99
},
"id" : "s387hs3"
}
print(xmltodict.unparse(student, pretty=True))
In this case, we have three keys at the root level. If we try to unparse this form of JSON, we will face this error: This happens because xmltodict needs to construct the JSON with the very first key as the root XML tag. This means that there should only be a single JSON key at the root level of data.
In this lesson, we studied an excellent Python module which can be used to parse and convert XML to JSON and vice-versa. We also learned how to convert XML to Dict using xmltodict module. Reference: API Doc
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Como puedo guardar el documento xml después de leerlo json
- Jhonatan
Thanks for your tutorial. This worked for me: pp.pprint(json.dumps(xmltodict.parse(results))) However as your tutorial ends early I don’t know what to do next if I want to store this result in a variable. I tried json_data = pp.pprint(json.dumps(xmltodict.parse(results))) but I don’t think this is right. How do I then store this result in a variable to use next in my code without storing it in a file?
- Ben