Serialization is the process of converting an object into a special format (usually bytes) so that it can be transferred over a network or stored in a persistent storage. Deserialization is the reverse of serialization. It converts the special format returned by the serialization back to the Object with the same state. JSON and XML are popular format to achieve this.

Serialization and Deserialization

JSON (JavaScript Object Notation) is language-neutral data interchange format. It is commonly used by web applications to transfer data between client and server. In the case of Python, serializing objects convert a Python object into a JSON string and deserialization builds up the Python object from its JSON string representation.

Serializing JSON

dump() function of json module is used to serialize data. It takes a Python object, serializes it and writes the output (which is a JSON string) to a file like object. Similarly dumps() method is used for writing data to Python string. Conversion from Python type to Json type is done according to below table

PythonJSON
dictObject
list, tupleArray
strString
int, long, floatNumber
Truetrue
Falsefalse
Nonenull

Syntax of dump function

  • json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw) : Serialize obj as a JSON formatted stream to fp (fp.write()-supporting file-like object)
  • json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw) : Serialize obj to a JSON formatted str using above table.

Where

  • skipkeys : If true, then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError.
  • ensure_ascii : If true, the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
  • check_circular : If false, then the circular reference check for container types will be skipped and a circular reference will result in an OverflowError.
  • allow_nan : If false, then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance of the JSON specification. If allow_nan is true, their JavaScript equivalents (NaN, Infinity, -Infinity) will be used.
  • indent : A non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or “” will only insert newlines. Using a positive integer indent indents that many spaces per level. If indent is a string (such as “\t”), that string is used to indent each level.
  • separators : It should be an (item_separator, key_separator) tuple. The default is (‘, ‘, ‘: ‘).
  • default : If specified, it should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.
  • sort_keys : If true , then the output of dictionaries will be sorted by key.

JSON module always produces str objects, not bytes objects. Keys in key/value pairs of JSON are always of the type str. When a dictionary is converted into JSON, all the keys of the dictionary are coerced to strings. As a result of this, if a dictionary is converted into JSON and then back into a dictionary, the dictionary may not equal the original one.

Below example demonstrate encoding python object to JSON.

import urllib.request
import json


# Retrieve some sample JSON data
req = urllib.request.urlopen("http://httpbin.org/json")
data = req.read().decode('utf-8')
print(data)

# Output
# {
#   "slideshow": {
#     "author": "Yours Truly", 
#     "date": "date of publication", 
#     "slides": [
#       {
#         "title": "Wake up to WonderWidgets!", 
#         "type": "all"
#       }, 
#       {
#         "items": [
#           "Why <em>WonderWidgets</em> are great", 
#           "Who <em>buys</em> WonderWidgets"
#         ], 
#         "title": "Overview", 
#         "type": "all"
#       }
#     ], 
#     "title": "Sample Slide Show"
#   }
# }

# Parse the returned data
obj = json.loads(data)

# Access parsed data

# Output
# Yours Truly
print(obj["slideshow"]["author"])
for slide in obj["slideshow"]["slides"]:
    print(slide["title"])

    # Output
    # Wake up to WonderWidgets!

# Write python objects as JSON
objdata = {
    "name": "User",
    "titles": [
        "Learning Python", "Learning JSON"
    ]
}

with open("jsonoutput.json", "w") as fp:
    json.dump(objdata, fp, indent=4)

# Output
# Content of jsonoutput.json
# {
#     "name": "User",
#     "titles": [
#         "Learning Python",
#         "Learning JSON"
#     ]
# }

Deserializing JSON

Python’s json.load() and json.loads() methods read JSON data from file and String. These function convert JSON encoded data into Python Types. Conversion from JSON type to Python type is done according to below table

JSONPython
objectdict
arraylist
stringstr
number (int)int
number (real)float
trueTrue
falseFalse
nullNone

Syntax of the functions are

  • json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) : Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.
  • json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) : Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.

where

  • object_hook : Optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. It can be used to implement custom decoders.
  • parse_float : If specified, will be called with the string of every JSON float to be decoded. By default, this is equivalent to float(num_str).
  • parse_int : If specified, will be called with the string of every JSON int to be decoded. By default, this is equivalent to int(num_str).
  • parse_constant : If specified, will be called with one of the following strings: ‘-Infinity’, ‘Infinity’, ‘NaN’.
  • object_pairs_hook : It will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook is also defined, the object_pairs_hook takes priority.

Following example demonstrate decoding json into Python object.

import urllib.request
import json

# Retrieve some sample JSON data
req = urllib.request.urlopen("http://httpbin.org/json")
data = req.read().decode('utf-8')
print(data)

# Output
# {
#   "slideshow": {
#     "author": "Yours Truly", 
#     "date": "date of publication", 
#     "slides": [
#       {
#         "title": "Wake up to WonderWidgets!", 
#         "type": "all"
#       }, 
#       {
#         "items": [
#           "Why <em>WonderWidgets</em> are great", 
#           "Who <em>buys</em> WonderWidgets"
#         ], 
#         "title": "Overview", 
#         "type": "all"
#       }
#     ], 
#     "title": "Sample Slide Show"
#   }
# }

# Parse the returned data
obj = json.loads(data)

print(obj["slideshow"]["author"])
# Output
# Yours Truly

with open("jsonoutput.json", "r") as fp:
    print(json.load(fp))
    # Output
    # {'name': 'User', 'titles': ['Learning Python', 'Learning JSON']}