Skip to content

🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

License

Notifications You must be signed in to change notification settings

explosion/srsly

Repository files navigation

srsly: Modern high-performance serialization utilities for Python

This package bundles some of the best Python serialization libraries into one standalone package, with a high-level API that makes it easy to write code that's correct across platforms and Pythons. This allows us to provide all the serialization utilities we need in a single binary wheel. Currently supports JSON,JSONL,MessagePack,PickleandYAML.

tests PyPi conda GitHub Python wheels

Motivation

Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries likespaCyand Prodigyhad steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especiallyjson, msgpackandpickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place.

At the same time, we noticed that having a lot of small dependencies was making maintenance harder, and making installation slower. To solve this, we've made srslystandalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel.

srslycurrently includes forks of the following packages:

Installation

⚠️Note thatv2.xis only compatible withPython 3.6+.For 2.7+ compatibility, usev1.x.

srslycan be installed from pip. Before installing, make sure that yourpip, setuptoolsandwheelare up to date.

Python -m pip install -U pip setuptools wheel
Python -m pip install srsly

Or from conda via conda-forge:

conda install -c conda-forge srsly

Alternatively, you can also compile the library from source. You'll need to make sure that you have a development environment with a Python distribution including header files, a compiler (XCode command-line tools on macOS / OS X or Visual C++ build tools on Windows), pip and git installed.

Install from source:

#clone the repo
git clone https://github /explosion/srsly
cdsrsly

#create a virtual environment
Python -m venv.env
source.env/bin/activate

#update pip
Python -m pip install -U pip setuptools wheel

#compile and install from source
Python -m pip install.

For developers, install requirements separately and then install in editable mode without build isolation:

#install in editable mode
Python -m pip install -r requirements.txt
Python -m pip install --no-build-isolation --editable.

#run test suite
Python -m pytest --pyargs srsly

API

JSON

📦 The underlying module is exposed viasrsly.ujson.However, we normally interact with it via the utility functions only.

functionsrsly.json_dumps

Serialize an object to a JSON string. Falls back tojsonifsort_keys=True is used (until it's fixed inujson).

data={"foo":"bar","baz":123}
json_string=srsly.json_dumps(data)
Argument Type Description
data - The JSON-serializable data to output.
indent int Number of spaces used to indent JSON. Defaults to0.
sort_keys bool Sort dictionary keys. Defaults toFalse.
RETURNS str The serialized string.

functionsrsly.json_loads

Deserialize unicode or bytes to a Python object.

data='{ "foo": "bar", "baz": 123}'
obj=srsly.json_loads(data)
Argument Type Description
data str / bytes The data to deserialize.
RETURNS - The deserialized Python object.

functionsrsly.write_json

Create a JSON file and dump contents or write to standard output.

data={"foo":"bar","baz":123}
srsly.write_json("/path/to/file.json",data)
Argument Type Description
path str /Path The file path or"-"to write to stdout.
data - The JSON-serializable data to output.
indent int Number of spaces used to indent JSON. Defaults to2.

functionsrsly.read_json

Load JSON from a file or standard input.

data=srsly.read_json("/path/to/file.json")
Argument Type Description
path str /Path The file path or"-"to read from stdin.
RETURNS dict / list The loaded JSON content.

functionsrsly.write_gzip_json

Create a gzipped JSON file and dump contents.

data={"foo":"bar","baz":123}
srsly.write_gzip_json("/path/to/file.json.gz",data)
Argument Type Description
path str /Path The file path.
data - The JSON-serializable data to output.
indent int Number of spaces used to indent JSON. Defaults to2.

functionsrsly.write_gzip_jsonl

Create a gzipped JSONL file and dump contents.

data=[{"foo":"bar"}, {"baz":123}]
srsly.write_gzip_json("/path/to/file.jsonl.gz",data)
Argument Type Description
path str /Path The file path.
lines - The JSON-serializable contents of each line.
append bool Whether or not to append to the location. Appending to.gz files is generally not recommended, as it doesn't allow the algorithm to take advantage of all data when compressing - files may hence be poorly compressed.
append_new_line bool Whether or not to write a new line before appending to the file.

functionsrsly.read_gzip_json

Load gzipped JSON from a file.

data=srsly.read_gzip_json("/path/to/file.json.gz")
Argument Type Description
path str /Path The file path.
RETURNS dict / list The loaded JSON content.

functionsrsly.read_gzip_jsonl

Load gzipped JSONL from a file.

data=srsly.read_gzip_jsonl("/path/to/file.jsonl.gz")
Argument Type Description
path str /Path The file path.
RETURNS dict / list The loaded JSONL content.

functionsrsly.write_jsonl

Create a JSONL file (newline-delimited JSON) and dump contents line by line, or write to standard output.

data=[{"foo":"bar"}, {"baz":123}]
srsly.write_jsonl("/path/to/file.jsonl",data)
Argument Type Description
path str /Path The file path or"-"to write to stdout.
lines iterable The JSON-serializable lines.
append bool Append to an existing file. Will open it in"a"mode and insert a newline before writing lines. Defaults toFalse.
append_new_line bool Defines whether a new line should first be written when appending to an existing file. Defaults toTrue.

functionsrsly.read_jsonl

Read a JSONL file (newline-delimited JSON) or from JSONL data from standard input and yield contents line by line. Blank lines will always be skipped.

data=srsly.read_jsonl("/path/to/file.jsonl")
Argument Type Description
path str / Path The file path or"-"to read from stdin.
skip bool Skip broken lines and don't raiseValueError.Defaults toFalse.
YIELDS - The loaded JSON contents of each line.

functionsrsly.is_json_serializable

Check if a Python object is JSON-serializable.

assertsrsly.is_json_serializable({"hello":"world"})isTrue
assertsrsly.is_json_serializable(lambdax:x)isFalse
Argument Type Description
obj - The object to check.
RETURNS bool Whether the object is JSON-serializable.

msgpack

📦 The underlying module is exposed viasrsly.msgpack.However, we normally interact with it via the utility functions only.

functionsrsly.msgpack_dumps

Serialize an object to a msgpack byte string.

data={"foo":"bar","baz":123}
msg=srsly.msgpack_dumps(data)
Argument Type Description
data - The data to serialize.
RETURNS bytes The serialized bytes.

functionsrsly.msgpack_loads

Deserialize msgpack bytes to a Python object.

msg=b "\x82\xa3foo\xa3bar\xa3baz{ "
data=srsly.msgpack_loads(msg)
Argument Type Description
data bytes The data to deserialize.
use_list bool Don't use tuples instead of lists. Can make deserialization slower. Defaults toTrue.
RETURNS - The deserialized Python object.

functionsrsly.write_msgpack

Create a msgpack file and dump contents.

data={"foo":"bar","baz":123}
srsly.write_msgpack("/path/to/file.msg",data)
Argument Type Description
path str /Path The file path.
data - The data to serialize.

functionsrsly.read_msgpack

Load a msgpack file.

data=srsly.read_msgpack("/path/to/file.msg")
Argument Type Description
path str /Path The file path.
use_list bool Don't use tuples instead of lists. Can make deserialization slower. Defaults toTrue.
RETURNS - The loaded and deserialized content.

pickle

📦 The underlying module is exposed viasrsly.cloudpickle.However, we normally interact with it via the utility functions only.

functionsrsly.pickle_dumps

Serialize a Python object with pickle.

data={"foo":"bar","baz":123}
pickled_data=srsly.pickle_dumps(data)
Argument Type Description
data - The object to serialize.
protocol int Protocol to use.-1for highest. Defaults toNone.
RETURNS bytes The serialized object.

functionsrsly.pickle_loads

Deserialize bytes with pickle.

pickled_data=b "\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03foo\x94\x8c\x03bar\x94\x8c\x03baz\x94K{u. "
data=srsly.pickle_loads(pickled_data)
Argument Type Description
data bytes The data to deserialize.
RETURNS - The deserialized Python object.

YAML

📦 The underlying module is exposed viasrsly.ruamel_yaml.However, we normally interact with it via the utility functions only.

functionsrsly.yaml_dumps

Serialize an object to a YAML string. See the ruamel.yamldocs for details on the indentation format.

data={"foo":"bar","baz":123}
yaml_string=srsly.yaml_dumps(data)
Argument Type Description
data - The JSON-serializable data to output.
indent_mapping int Mapping indentation. Defaults to2.
indent_sequence int Sequence indentation. Defaults to4.
indent_offset int Indentation offset. Defaults to2.
sort_keys bool Sort dictionary keys. Defaults toFalse.
RETURNS str The serialized string.

functionsrsly.yaml_loads

Deserialize unicode or a file object to a Python object.

data='foo: bar\nbaz: 123'
obj=srsly.yaml_loads(data)
Argument Type Description
data str / file The data to deserialize.
RETURNS - The deserialized Python object.

functionsrsly.write_yaml

Create a YAML file and dump contents or write to standard output.

data={"foo":"bar","baz":123}
srsly.write_yaml("/path/to/file.yml",data)
Argument Type Description
path str /Path The file path or"-"to write to stdout.
data - The JSON-serializable data to output.
indent_mapping int Mapping indentation. Defaults to2.
indent_sequence int Sequence indentation. Defaults to4.
indent_offset int Indentation offset. Defaults to2.
sort_keys bool Sort dictionary keys. Defaults toFalse.

functionsrsly.read_yaml

Load YAML from a file or standard input.

data=srsly.read_yaml("/path/to/file.yml")
Argument Type Description
path str /Path The file path or"-"to read from stdin.
RETURNS dict / list The loaded YAML content.

functionsrsly.is_yaml_serializable

Check if a Python object is YAML-serializable.

assertsrsly.is_yaml_serializable({"hello":"world"})isTrue
assertsrsly.is_yaml_serializable(lambdax:x)isFalse
Argument Type Description
obj - The object to check.
RETURNS bool Whether the object is YAML-serializable.