df-diskcache
df-diskcache
is a Python library for caching pandas.DataFrame
objects to local disk.
pip install df-diskcache
Supports the following methods:
get
: Get a cache entry (pandas.DataFrame
) for the key. ReturnsNone
if the key is not found.set
: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.update
touch
: Update the last accessed time of a cache entry to extend the TTL.delete
prune
: Delete expired cache entries.- Dictionary-like operations:
__getitem__
__setitem__
__contains__
__delitem__
Sample Code: | import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache.get(url)
if df is None:
print("cache miss")
df = pd.read_csv(url)
cache.set(url, df)
else:
print("cache hit")
print(df) |
---|
You can also use operations like a dictionary:
Sample Code: | import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache[url]
if df is None:
print("cache miss")
df = pd.read_csv(url)
cache[url] = df
else:
print("cache hit")
print(df) |
---|
Sample Code: | import pandas as pd
from dfdiskcache import DataFrameDiskCache
DataFrameDiskCache.DEFAULT_TTL = 10 # you can override the default TTL (default: 3600 seconds)
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache.get(url)
if df is None:
df = pd.read_csv(url)
cache.set(url, df, ttl=60) # you can set a TTL for the key-value pair
print(df) |
---|