genro-storage

Universal storage abstraction for Python with pluggable backends

A modern, elegant Python library that provides a unified interface for accessing files across local filesystems, cloud storage (S3, GCS, Azure), and remote protocols (HTTP). Built on top of fsspec, genro-storage adds an intuitive mount-point system and user-friendly API inspired by Unix filesystems.

Documentation

Full Documentation - Complete API reference and guides
API Design - Detailed design specification
Testing Guide - How to run tests with MinIO
Interactive Tutorials - Hands-on Jupyter notebooks

Status: Beta - Ready for Production Testing

Current Version: 0.4.3 Last Updated: October 2025

Core implementation complete
15 storage backends working (local, S3, GCS, Azure, HTTP, Memory, Base64, SMB, SFTP, ZIP, TAR, Git, GitHub, WebDAV, LibArchive)
411 tests (401 passing, 10 skipped) with 85% coverage on Python 3.9-3.12
Full documentation on ReadTheDocs
Battle-tested code from Genropy (19+ years in production, storage abstraction since 2018)
Available on PyPI

Key Features

Async/await support - Transparent sync/async via @smartasync decorator
Native permission control - Configure readonly, readwrite, or delete permissions for any backend
Powered by fsspec - Leverage 20+ battle-tested storage backends
Mount point system - Organize storage with logical names like home:, uploads:, s3:
Intuitive API - Pathlib-inspired interface that feels natural and Pythonic
Intelligent copy strategies - Skip files by existence, size, or hash for efficient incremental backups
Progress tracking - Built-in callbacks for progress bars and logging during copy operations
Content-based comparison - Compare files by MD5 hash across different backends
Efficient hashing - Uses cloud metadata (S3 ETag) when available, avoiding downloads
External tool integration - call() method for seamless integration with ffmpeg, imagemagick, pandoc, etc.
WSGI file serving - serve() method for web frameworks (Flask, Django, Pyramid) with ETag caching
MIME type detection - Automatic content-type detection from file extensions
Flexible configuration - Load mounts from YAML, JSON, or code
Dynamic paths - Support for callable paths that resolve at runtime (perfect for user-specific directories)
Cloud metadata - Get/set custom metadata on S3, GCS, Azure files
URL generation - Generate presigned URLs for S3, public URLs for sharing
Base64 utilities - Encode files to data URIs, download from URLs
S3 versioning - Access historical file versions (when S3 versioning enabled)
Test-friendly - In-memory backend for fast, isolated testing
Base64 data URIs - Embed data inline with automatic encoding (writable with mutable paths)
Production-ready backends - Built on 6+ years of Genropy production experience
Lightweight core - Optional backends installed only when needed
Cross-storage operations - Copy/move files between different storage types seamlessly

Why genro-storage vs raw fsspec?

While fsspec is powerful, genro-storage provides:

Mount point abstraction - Work with logical names instead of full URIs
Simpler API - Less verbose, more intuitive for common operations
Configuration management - Load storage configs from files
Enhanced utilities - Cross-storage copy, unified error handling

Think of it as "requests" is to "urllib" - a friendlier interface to an excellent foundation.

Perfect For

Multi-cloud applications that need storage abstraction
Data pipelines processing files from various sources
Web applications managing uploads across environments
CLI tools that work with local and remote files
Testing scenarios requiring storage mocking

Quick Example

Synchronous Usage

from genro_storage import StorageManager

# Configure storage backends
storage = StorageManager()
storage.configure([
    {'name': 'home', 'protocol': 'local', 'base_path': '/home/user'},
    {'name': 'uploads', 'protocol': 's3', 'bucket': 'my-app-uploads'},
    {'name': 'backups', 'protocol': 'gcs', 'bucket': 'my-backups', 'permissions': 'readwrite'},
    {'name': 'public', 'protocol': 'http', 'base_path': 'https://cdn.example.com', 'permissions': 'readonly'},
    {'name': 'data', 'protocol': 'base64'}  # Inline base64 data
])

# Work with files using a unified API
node = storage.node('uploads:users/123/avatar.jpg')
if node.exists():
    # Copy from S3 to local
    node.copy_to(storage.node('home:cache/avatar.jpg'))

    # Read and process
    data = node.read_bytes()

    # Backup to GCS
    node.copy_to(storage.node('backups:avatars/user_123.jpg'))

# Base64 backend: embed data directly in URIs (data URI style)
# Read inline data
import base64
text = "Configuration data"
b64_data = base64.b64encode(text.encode()).decode()
node = storage.node(f'data:{b64_data}')
print(node.read_text())  # "Configuration data"

# Or write to create base64 (path updates automatically)
node = storage.node('data:')
node.write_text("New content")
print(node.path)  # "TmV3IGNvbnRlbnQ=" (base64 of "New content")

# Copy from S3 to base64 for inline use
s3_image = storage.node('uploads:photo.jpg')
b64_image = storage.node('data:')
s3_image.copy_to(b64_image)
data_uri = f"data:image/jpeg;base64,{b64_image.path}"

# Advanced features
# 1. Intelligent incremental backups (NEW!)
docs = storage.node('home:documents')
s3_backup = storage.node('uploads:backup/documents')

# Skip files that already exist (fastest)
docs.copy_to(s3_backup, skip='exists')

# Skip files with same size (fast, good accuracy)
docs.copy_to(s3_backup, skip='size')

# Skip files with same content (accurate, uses S3 ETag - fast!)
docs.copy_to(s3_backup, skip='hash')

# With progress tracking
from tqdm import tqdm
pbar = tqdm(desc="Backing up", unit="file")
docs.copy_to(s3_backup, skip='hash',
          progress=lambda cur, tot: pbar.update(1))
pbar.close()

# 2. Work with external tools using call() (ffmpeg, imagemagick, etc.)
video = storage.node('uploads:video.mp4')
thumbnail = storage.node('uploads:thumb.jpg')

# Automatically handles cloud download/upload
video.call('ffmpeg', '-i', video, '-vf', 'thumbnail', '-frames:v', '1', thumbnail)

# Or use local_path() for more control
with video.local_path(mode='r') as local_path:
    import subprocess
    subprocess.run(['ffmpeg', '-i', local_path, 'output.mp4'])

# 3. Serve files via WSGI (Flask, Django, Pyramid)
from flask import Flask, request
app = Flask(__name__)

@app.route('/files/<path:filepath>')
def serve_file(filepath):
    node = storage.node(f'uploads:{filepath}')
    # ETag caching, streaming, MIME types - all automatic!
    return node.serve(request.environ, lambda s, h: None, cache_max_age=3600)

# 4. Check MIME types
doc = storage.node('uploads:report.pdf')
print(doc.mimetype)  # 'application/pdf'

# 5. Dynamic paths for multi-user apps
def get_user_storage():
    user_id = get_current_user()
    return f'/data/users/{user_id}'

storage.configure([
    {'name': 'user', 'protocol': 'local', 'base_path': get_user_storage}
])
# Path resolves differently per user!

# 6. Cloud metadata
file = storage.node('uploads:document.pdf')
file.set_metadata({
    'Author': 'John Doe',
    'Department': 'Engineering'
})

# 7. Generate shareable URLs
url = file.url(expires_in=3600)  # S3 presigned URL

# 8. Encode to data URI
img = storage.node('home:logo.png')
data_uri = img.to_base64()  # data:image/png;base64,...

# 9. Download from internet
remote = storage.node('uploads:downloaded.pdf')
remote.fill_from_url('https://example.com/file.pdf')

Async Usage

All I/O methods use the @smartasync decorator for transparent sync/async support. The same StorageManager and StorageNode classes work in both contexts.

from genro_storage import StorageManager

# Same StorageManager works in both sync and async contexts
storage = StorageManager()
storage.configure([
    {'name': 'uploads', 'protocol': 's3', 'bucket': 'my-app-uploads'},
    {'name': 'cache', 'protocol': 'local', 'base_path': '/tmp/cache'}
])

# Use in async context (FastAPI, asyncio, etc.)
async def process_file(file_path: str):
    node = storage.node(f'uploads:{file_path}')

    # All I/O methods are awaitable in async context
    if await node.exists():
        data = await node.read_bytes()

        # Process and cache
        processed = process_data(data)
        cache_node = storage.node('cache:processed.dat')
        await cache_node.write_bytes(processed)

        return processed

    raise FileNotFoundError(file_path)

# FastAPI example
from fastapi import FastAPI, HTTPException

app = FastAPI()

@app.get("/files/{filepath:path}")
async def get_file(filepath: str):
    """Serve file from S3 storage."""
    node = storage.node(f'uploads:{filepath}')

    if not await node.exists():
        raise HTTPException(status_code=404, detail="File not found")

    return {
        "data": await node.read_bytes(),
        "size": await node.size(),
        "mime_type": node.mimetype  # Non-I/O property (sync)
    }

# Concurrent operations
import asyncio

async def backup_files(file_list):
    """Backup multiple files concurrently."""
    async def backup_one(filepath):
        source = storage.node(f'uploads:{filepath}')
        target = storage.node(f'backups:{filepath}')
        data = await source.read_bytes()
        await target.write_bytes(data)

    # Process all files in parallel
    await asyncio.gather(*[backup_one(f) for f in file_list])

Learning with Interactive Tutorials

The best way to learn genro-storage is through our hands-on Jupyter notebooks in the notebooks/ directory.

Run Online (No Installation Required)

Click the badge above to launch an interactive Jupyter environment in your browser. Ready in ~2 minutes!

Run Locally

# 1. Install Jupyter
pip install jupyter notebook

# 2. Navigate to notebooks directory
cd notebooks

# 3. Launch Jupyter
jupyter notebook

# 4. Open 01_quickstart.ipynb and start learning!

Note: Jupyter will open in your browser automatically. Execute cells sequentially with Shift+Enter.

Tutorial Contents

Notebook	Topic	Duration	Level
01 - Quickstart	Basic concepts and first steps	15 min	Beginner
02 - Backends	Storage backends and configuration	20 min	Beginner
03 - File Operations	Read, write, copy, directories	25 min	Beginner
04 - Virtual Nodes	iternode, diffnode, zip archives	30 min	Intermediate
05 - Copy Strategies	Smart copying and filtering	25 min	Intermediate
06 - Versioning	S3 version history and rollback	30 min	Intermediate
07 - Advanced Features	External tools, WSGI, metadata	35 min	Advanced
08 - Real World Examples	Complete use cases	40 min	Advanced

Total time: ~3.5 hours • Start here: 01_quickstart.ipynb

See notebooks/README.md for the complete learning guide.

Installation

From GitHub (Recommended)

Install directly from GitHub without cloning:

# Base package
pip install git+https://github.com/genropy/genro-storage.git

# With S3 support
pip install "genro-storage[s3] @ git+https://github.com/genropy/genro-storage.git"

# With all backends
pip install "genro-storage[all] @ git+https://github.com/genropy/genro-storage.git"

From Source (Development)

Clone and install in editable mode:

# Clone repository
git clone https://github.com/genropy/genro-storage.git
cd genro-storage

# Install base package
pip install -e .

# Install with S3 support
pip install -e ".[s3]"

# Install with all backends
pip install -e ".[all]"

# Install for development
pip install -e ".[all,dev]"

Supported Backends

Install optional dependencies for specific backends:

# Cloud storage
pip install genro-storage[s3]          # Amazon S3
pip install genro-storage[gcs]         # Google Cloud Storage
pip install genro-storage[azure]       # Azure Blob Storage

# Network protocols
pip install genro-storage[http]        # HTTP/HTTPS
pip install genro-storage[smb]         # SMB/CIFS (Windows/Samba shares)
pip install genro-storage[sftp]        # SFTP (SSH File Transfer)
pip install genro-storage[webdav]      # WebDAV (Nextcloud, ownCloud, SharePoint)

# Archive formats
pip install genro-storage[libarchive]  # RAR, 7z, ISO, and 20+ formats

# Version control
# Git and GitHub are built-in to fsspec (no extra install needed)

# Other
pip install genro-storage[async]       # Async support
pip install genro-storage[all]         # All backends + async

Built-in backends (no extra dependencies):

Local filesystem
Memory (in-memory storage for testing)
Base64 (inline data URIs)
ZIP archives
TAR archives (with gzip, bzip2, xz compression)
Git repositories (requires system pygit2)
GitHub repositories

Testing

# Unit tests (fast, no external dependencies)
pytest tests/test_local_storage.py -v

# Integration tests (requires Docker + MinIO)
docker-compose up -d
pytest tests/test_s3_integration.py -v

# All tests
pytest tests/ -v

# With coverage
pytest tests/ -v --cov=genro_storage

See TESTING.md for detailed testing instructions with MinIO.

Built With

fsspec - Pythonic filesystem abstraction
genro-toolbox - @smartasync for transparent sync/async
Modern Python (3.9+) with full type hints
Optional backends: s3fs, gcsfs, adlfs, aiohttp, smbprotocol, paramiko, webdav4, libarchive-c

Origins

genro-storage is extracted and modernized from Genropy, a Python web framework in production since 2006 (19+ years). The storage abstraction layer was introduced in 2018 and has been battle-tested in production for 6+ years. We're making this powerful storage abstraction available as a standalone library for the wider Python community.

Development Status

Phase: Beta - Production Testing

API Design Complete and Stable
Core Implementation Complete
FsspecBackend (15 storage backends: local, S3, GCS, Azure, HTTP, Memory, Base64, SMB, SFTP, ZIP, TAR, Git, GitHub, WebDAV, LibArchive)
Comprehensive Test Suite (411 tests, 85% coverage)
CI/CD with Python 3.9, 3.10, 3.11, 3.12
MD5 hashing and content-based equality
Base64 backend with writable mutable paths
Intelligent copy skip strategies (exists, size, hash, custom)
call() method for external tool integration (ffmpeg, imagemagick, etc.)
serve() method for WSGI file serving (Flask, Django, Pyramid)
mimetype property for automatic content-type detection
local_path() context manager for external tools
Callable path support for dynamic directories
Native permission control (readonly, readwrite, delete)
Cloud metadata get/set (S3, GCS, Azure)
URL generation (presigned URLs, data URIs)
S3 versioning support
Full Documentation on ReadTheDocs
MinIO Integration Testing
Transparent async/await support via @smartasync decorator
Ready for early adopters and production testing
Extended GCS/Azure integration testing in progress

Recent Releases:

v0.7.0 (January 2026) - Unified sync/async via @smartasync, removed AsyncStorageManager
v0.4.2 (October 2025) - Git, GitHub, WebDAV, LibArchive backends
v0.4.1 (October 2025) - SMB, SFTP, ZIP, TAR backends
v0.4.0 (October 2025) - Relative mounts with permissions, unified read/write API
v0.2.0 (October 2025) - Virtual nodes, tutorials, enhanced testing

Contributing

Contributions are welcome! We follow a Git Flow workflow with protected branches for code quality.

Quick Start:

Read our Contributing Guide for detailed workflow and guidelines
Fork the repository and create a feature branch from develop
Make your changes with tests and documentation
Submit a Pull Request to the develop branch

Branch Structure:

main - Production releases (protected, requires PR review)
develop - Integration branch (protected, requires PR review)
feature/* - Feature development branches
bugfix/* - Bug fixes
hotfix/* - Critical production fixes

See CONTRIBUTING.md for complete workflow documentation.

Areas for contribution:

Add integration tests for GCS and Azure backends
Improve test coverage (target: 90%+)
Add integration tests for new backends (SMB, SFTP, WebDAV, etc.)
Performance optimizations
Additional backend implementations

License

MIT License - See LICENSE for details

Made with ❤️ by the Genropy team

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github		.github
announcements		announcements
async_POC		async_POC
doc_to_review		doc_to_review
docs		docs
gtext_src		gtext_src
notebooks		notebooks
scripts		scripts
src/genro_storage		src/genro_storage
temp		temp
tests		tests
.clauderc		.clauderc
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
TESTING.md		TESTING.md
codecov.yml		codecov.yml
docker-compose.yml		docker-compose.yml
logo.png		logo.png
pyproject.toml		pyproject.toml
requirements-docs.txt		requirements-docs.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

genro-storage

Documentation

Status: Beta - Ready for Production Testing

Key Features

Why genro-storage vs raw fsspec?

Perfect For

Quick Example

Synchronous Usage

Async Usage

Learning with Interactive Tutorials

Run Online (No Installation Required)

Run Locally

Tutorial Contents

Installation

From GitHub (Recommended)

From Source (Development)

Supported Backends

Testing

Built With

Origins

Development Status

Contributing

License

About

Uh oh!

Releases 6

Packages

Contributors 4

Uh oh!

Languages

License

genropy/genro-storage

Folders and files

Latest commit

History

Repository files navigation

genro-storage

Documentation

Status: Beta - Ready for Production Testing

Key Features

Why genro-storage vs raw fsspec?

Perfect For

Quick Example

Synchronous Usage

Async Usage

Learning with Interactive Tutorials

Run Online (No Installation Required)

Run Locally

Tutorial Contents

Installation

From GitHub (Recommended)

From Source (Development)

Supported Backends

Testing

Built With

Origins

Development Status

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 4

Uh oh!

Languages

Packages