Skip to content

rmoralespp/jsonl

Repository files navigation

jsonl

GitHub tag CI pypi versions codecov license Linter: ruff Downloads


About

jsonl is a lightweight Python library designed to simplify working with JSON Lines data, adhering to the jsonlines and ndjson specifications.

🎯 Features

  • 🌎 Provides an API similar to Python's standard json module.
  • 🚀 Supports custom (de)serialization via user-defined callbacks.
  • 🗜️ Built-in support for gzip, bzip2, xz compression formats and ZIP or TAR archives.
  • 🔧 Skips malformed lines during file loading.
  • 📥 Loads from URLs directly.
  • 🐍 No external dependencies: relies only on the Python standard library.

📦 Installation

To install jsonl using pip, run the following command:

pip install py-jsonl

⚡ Quick Start

Dumping data to a JSON Lines File

Note

Use jsonl.dump to incrementally write an iterable of dictionaries to a JSON Lines file:

# -*- coding: utf-8 -*-

import jsonl

data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, "file.jsonl")

Loading data from a JSON Lines source

Note

Use jsonl.load to incrementally load a JSON Lines source—such as a filename, URL, or file-like object—into as an iterator of dictionaries:

# -*- coding: utf-8 -*-

import jsonl

# Load data from a JSON Lines file
iterator = jsonl.load("file.jsonl")
print(tuple(iterator))

# Load data from a URL
iterator = jsonl.load("https://example.com/file.jsonl")
print(tuple(iterator))

Dump multiple JSON Lines Files into an Archive (ZIP or TAR)

Note

Use jsonl.dump_archive to incrementally write structured data to multiple JSON Lines files, which are then stored in a ZIP or TAR archive.

# -*- coding: utf-8 -*-

import jsonl

data = [
    # Create `file1.jsonl` withing the archive
    ("file1.jsonl", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]),
    # Create `file2.jsonl` within the archive
    ("path/to/file2.jsonl", [{"name": "Charlie", "age": 35}, {"name": "David", "age": 40}]),
    # Append to `file1.jsonl` within the archive
    ("file1.jsonl", [{"name": "Eve", "age": 28}]),
]
jsonl.dump_archive("archive.zip", data)

Load multiple JSON Lines Files from an Archive (ZIP or TAR)

Note

Use jsonl.load_archive to incrementally load multiple JSON Lines files from a ZIP or TAR archive.

Tip

# -*- coding: utf-8 -*-

import jsonl

# Load all JSON Lines files matching the pattern "*.jsonl" from a local archive
for filename, iterator in jsonl.load_archive("archive.zip"):
    print("Filename:", filename)
    print("Data:", tuple(iterator))

# Load all JSON Lines files matching the pattern "*.jsonl" from a remote archive
for filename, iterator in jsonl.load_archive("https://example.com/archive.zip"):
    print("Filename:", filename)
    print("Data:", tuple(iterator))

Dumping data to Multiple JSON Lines Files

Note

Use jsonl.dump_fork to incrementally write structured data to multiple JSON Lines files, which can be useful when you want to separate data based on some criteria.

# -*- coding: utf-8 -*-

import jsonl

data = [
    # Create `file1.jsonl` or overwrite it if it exists
    ("file1.jsonl", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]),
    # Create `file2.jsonl` or overwrite it if it exists
    ("file2.jsonl", [{"name": "Charlie", "age": 35}, {"name": "David", "age": 40}]),
    # Append to `file1.jsonl`
    ("file1.jsonl", [{"name": "Eve", "age": 28}]),
]
jsonl.dump_fork(data)

📚 Documentation

For more detailed information and usage examples, refer to the project documentation

🛠️ Development

To contribute to the project, you can run the following commands for testing and documentation:

First, ensure you have the latest version of pip:

python -m pip install --upgrade pip

Running Unit Tests

Install the development dependencies and run the tests:

pip install --group=test  --upgrade # Install test dependencies, skip if already installed
python -m pytest tests/ # Run all tests
python -m pytest tests/ --cov # Run tests with coverage

Running Linters

pip install --group=lint --upgrade  # Install lint dependencies, skip if already installed
ruff check . # Run linter
spxl . # Run sphinx-linter for docstring issues
pymport . # Check for import issues

Building the Documentation

To build the documentation locally, use the following commands:

pip install --group=doc --upgrade  # Install doc dependencies, skip if already installed
mkdocs serve # Start live-reloading docs server
mkdocs build # Build the documentation site

🗒️ License

This project is licensed under the MIT license.