DataQuery SDK

Professional Python SDK for the DataQuery API - High-performance data access with parallel downloads, time series queries, and seamless OAuth 2.0 authentication.

Features

High-Performance Downloads: Parallel file downloads with automatic retry and progress tracking
Time Series Queries: Query data by expressions, instruments, or groups with flexible filtering
OAuth 2.0 Authentication: Automatic token management and refresh
Connection Pooling: Optimized HTTP connections with configurable rate limiting
Pandas Integration: Direct conversion to DataFrames for analysis
Async & Sync APIs: Use async/await or synchronous methods based on your needs

Installation

pip install dataquery-sdk

Quick Start

1. Configure Credentials

Set your API credentials as environment variables:

export DATAQUERY_CLIENT_ID="your_client_id"
export DATAQUERY_CLIENT_SECRET="your_client_secret"

Or create a .env file in your project directory:

DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret

2. Download Files

Synchronous (Python Scripts)

from dataquery import DataQuery

# Download all files for a date range
with DataQuery() as dq:
    results = dq.run_group_download(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data"
    )
    print(f"Downloaded {results['successful_downloads']} files")

Asynchronous (Jupyter Notebooks)

from dataquery import DataQuery

# Download all files for a date range
async with DataQuery() as dq:
    results = await dq.run_group_download_async(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data"
    )
    print(f"Downloaded {results['successful_downloads']} files")

3. Query Time Series Data

from dataquery import DataQuery

async with DataQuery() as dq:
    # Query by expression
    result = await dq.get_expressions_time_series_async(
        expressions=["DB(MTE,IRISH EUR 1.100 15-May-2029 LON,,IE00BH3SQ895,MIDPRC)"],
        start_date="20240101",
        end_date="20240131"
    )
    
    # Convert to pandas DataFrame
    df = dq.to_dataframe(result)
    print(df.head())

4. Discover Available Data

from dataquery import DataQuery

async with DataQuery() as dq:
    # List all available groups
    groups = await dq.list_groups_async(limit=100)
    
    # Convert to DataFrame for easy viewing
    groups_df = dq.to_dataframe(groups)
    print(groups_df[['group_id', 'group_name', 'description']])

Common Use Cases

Download Single File

from dataquery import DataQuery
from pathlib import Path

async with DataQuery() as dq:
    result = await dq.download_file_async(
        file_group_id="JPMAQS_GENERIC_RETURNS",
        file_datetime="20250115",
        destination_path=Path("./downloads")
    )
    print(f"Downloaded: {result.local_path}")

Query with Filters

async with DataQuery() as dq:
    # Get time series for Ireland bonds only
    result = await dq.get_group_time_series_async(
        group_id="FI_GO_BO_EA",
        attributes=["MIDPRC", "REPO_1M"],
        filter="country(IRL)",
        start_date="20240101",
        end_date="20240131"
    )
    
    df = dq.to_dataframe(result)

Search for Instruments

async with DataQuery() as dq:
    # Search for instruments by keywords
    results = await dq.search_instruments_async(
        group_id="FI_GO_BO_EA",
        keywords="irish"
    )
    
    # Use the results to query time series
    instrument_ids = [inst.instrument_id for inst in results.instruments[:5]]
    data = await dq.get_instrument_time_series_async(
        instruments=instrument_ids,
        attributes=["MIDPRC"],
        start_date="20240101",
        end_date="20240131"
    )

Performance Optimization

Parallel Downloads

async with DataQuery() as dq:
    # Download multiple files concurrently with parallel chunks
    results = await dq.run_group_download_async(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data",
        max_concurrent=5,  # Download 5 files simultaneously
        num_parts=4        # Split each file into 4 parallel chunks
    )

Recommended Settings:

max_concurrent: 3-5 (concurrent file downloads)
num_parts: 2-8 (parallel chunks per file)

Rate Limiting

Configure rate limits to avoid API throttling:

from dataquery import DataQuery, ClientConfig

config = ClientConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    rate_limit_rpm=300,  # Requests per minute
    max_retries=3,
    timeout=60.0
)

async with DataQuery(config=config) as dq:
    # Your code here
    pass

Configuration

Environment Variables

# Required
DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret

# Optional - API Endpoints
DATAQUERY_BASE_URL=https://api-developer.jpmorgan.com
DATAQUERY_FILES_BASE_URL=https://api-strm-gw01.jpmchase.com

# Optional - Performance
DATAQUERY_MAX_RETRIES=3
DATAQUERY_TIMEOUT=60
DATAQUERY_RATE_LIMIT_RPM=300

Programmatic Configuration

from dataquery import DataQuery, ClientConfig

config = ClientConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    base_url="https://api-developer.jpmorgan.com",
    max_retries=3,
    timeout=60.0,
    rate_limit_rpm=300
)

async with DataQuery(config=config) as dq:
    # Your code here
    pass

Error Handling

from dataquery import DataQuery
from dataquery.exceptions import (
    DataQueryError,
    AuthenticationError,
    NotFoundError,
    RateLimitError
)

async def safe_query():
    try:
        async with DataQuery() as dq:
            result = await dq.get_expressions_time_series_async(
                expressions=["DB(...)"],
                start_date="20240101",
                end_date="20240131"
            )
            return result
    except AuthenticationError as e:
        print(f"Authentication failed: {e}")
    except NotFoundError as e:
        print(f"Resource not found: {e}")
    except RateLimitError as e:
        print(f"Rate limit exceeded: {e}")
    except DataQueryError as e:
        print(f"API error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")

Date Formats

Absolute Dates

start_date="20240101"  # YYYYMMDD format
end_date="20241231"

Relative Dates

start_date="TODAY"      # Today
start_date="TODAY-1D"   # Yesterday
start_date="TODAY-1W"   # 1 week ago
start_date="TODAY-1M"   # 1 month ago
start_date="TODAY-1Y"   # 1 year ago

Calendar Conventions

Calendar	Description	Use Case
`CAL_WEEKDAYS`	Monday-Friday	International data (recommended)
`CAL_USBANK`	US banking days	US-only data (default)
`CAL_WEEKDAY_NOHOLIDAY`	All weekdays	Generic business days
`CAL_DEFAULT`	Calendar day	Include weekends

Examples

The examples/ directory contains comprehensive examples:

File Downloads: Single file, batch downloads, availability checks
Time Series: Expressions, instruments, groups with filters
Discovery: Search instruments, list groups, get attributes
Advanced: Grid data, auto-download, custom progress tracking

Run an example:

python examples/files/download_file.py
python examples/expressions/get_expressions_time_series.py

CLI Usage

The SDK includes a command-line interface:

# Download files
dataquery download --group-id JPMAQS_GENERIC_RETURNS \
                   --start-date 20250101 \
                   --end-date 20250131 \
                   --destination ./data

# List groups
dataquery list-groups --limit 100

# Check file availability
dataquery check-availability --file-group-id JPMAQS_GENERIC_RETURNS \
                             --date 20250115

API Reference

Core Methods

File Downloads

download_file_async() - Download a single file
run_group_download_async() - Download all files in a date range
list_available_files_async() - Check file availability

Time Series Queries

get_expressions_time_series_async() - Query by expression
get_instrument_time_series_async() - Query by instrument ID
get_group_time_series_async() - Query entire group with filters

Discovery

list_groups_async() - List available data groups
search_instruments_async() - Search for instruments
list_instruments_async() - List all instruments in a group
get_group_attributes_async() - Get available attributes
get_group_filters_async() - Get available filters

Utilities

to_dataframe() - Convert any response to pandas DataFrame
health_check_async() - Check API health
get_stats() - Get connection and rate limit statistics

For detailed API documentation, see the API Reference.

Requirements

Python 3.10 or higher
Dependencies:
- aiohttp>=3.8.0 - Async HTTP client
- pydantic>=2.0.0 - Data validation
- structlog>=23.0.0 - Structured logging
- python-dotenv>=1.0.0 - Environment variable management

Optional:

pandas>=2.0.0 - For DataFrame conversion

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/dataquery/dataquery-sdk.git
cd dataquery-sdk

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Run Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=dataquery --cov-report=html

# Run specific test file
pytest tests/test_client.py -v

Code Quality

# Format code
black dataquery/ tests/

# Check linting
flake8 dataquery/ tests/ examples/

# Type checking
mypy dataquery/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

GitHub Issues: Report a bug
Documentation: Read the docs
Email: support@dataquery.com

Changelog

See CHANGELOG.md for version history and release notes.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
dataquery		dataquery
docs		docs
examples		examples
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
py.typed		py.typed
pyproject.toml		pyproject.toml

License

jpmorganchase/dataquery-sdk

Folders and files

Latest commit

History

Repository files navigation

DataQuery SDK

Features

Installation

Quick Start

1. Configure Credentials

2. Download Files

3. Query Time Series Data

4. Discover Available Data

Common Use Cases

Download Single File

Query with Filters

Search for Instruments

Performance Optimization

Parallel Downloads

Rate Limiting

Configuration

Environment Variables

Programmatic Configuration

Error Handling

Date Formats

Absolute Dates

Relative Dates

Calendar Conventions

Examples

CLI Usage

API Reference

Core Methods

Requirements

Development

Setup Development Environment

Run Tests

Code Quality

Contributing

License

Support

Changelog

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages