-
Notifications
You must be signed in to change notification settings - Fork 3
Key Features
ccflow (Composable Configuration Flow) is a collection of tools for workflow configuration, orchestration, and dependency injection.
It is intended to be flexible enough to handle diverse use cases, including data retrieval, validation, transformation, and loading (i.e. ETL workflows), model training, microservice configuration, and automated report generation.
Central to ccflow is the BaseModel class.
BaseModel is the base class for models in the ccflow framework.
A model is basically a data class (class with attributes).
The naming was inspired by the open source library Pydantic (BaseModel actually inherits from the Pydantic base model class).
CallableModel is the base class for a special type of BaseModel which can be called.
CallableModel's are called with a context (something that derives from ContextBase) and returns a result (something that derives from ResultBase).
As an example, you may have a SQLReader callable model that when called with a DateRangeContext returns a ArrowResult (wrapper around a Arrow table) with data in the date range defined by the context by querying some SQL database.
A ModelRegistry is a named collection of models.
A ModelRegistry can be loaded from YAML configuration, which means you can define a collection of models using YAML.
This is really powerful because this gives you a easy way to define a collection of Python objects via configuration.
Although you are free to define your own models (BaseModel implementations) to use in your flow graph,
ccflow comes with some models that you can use off the shelf to solve common problems. ccflow comes with a range of models for reading data.
The following table summarizes the available models.
Note
Some models are still in the process of being open sourced.
ccflow also comes with a range of models for writing data.
These are referred to as publishers.
You can "chain" publishers and callable models using PublisherModel to call a CallableModel and publish the results in one step.
In fact, ccflow comes with several implementations of PublisherModel for common publishing use cases.
The following table summarizes the "publisher" models.
Note
Some models are still in the process of being open sourced.
| Name | Path | Description |
|---|---|---|
DictTemplateFilePublisher |
ccflow.publishers |
Publish data to a file after populating a Jinja template. |
GenericFilePublisher |
ccflow.publishers |
Publish data using a generic "dump" Callable. Uses smart_open under the hood so that local and cloud paths are supported. |
JSONPublisher |
ccflow.publishers |
Publish data to file in JSON format. |
PandasFilePublisher |
ccflow.publishers |
Publish a pandas data frame to a file using an appropriate method on pd.DataFrame. For large-scale exporting (using parquet), see PandasParquetPublisher. |
NarwhalsFilePublisher |
ccflow.publishers |
Publish a narwhals data frame to a file using an appropriate method on nw.DataFrame. |
PicklePublisher |
ccflow.publishers |
Publish data to a pickle file. |
PydanticJSONPublisher |
ccflow.publishers |
Publish a pydantic model to a json file. See Pydantic modeljson |
YAMLPublisher |
ccflow.publishers |
Publish data to file in YAML format. |
CompositePublisher |
ccflow.publishers |
Highly configurable, publisher that decomposes a pydantic BaseModel or a dictionary into pieces and publishes each piece separately. |
PrintPublisher |
ccflow.publishers |
Print data using python standard print. |
LogPublisher |
ccflow.publishers |
Print data using python standard logging. |
PrintJSONPublisher |
ccflow.publishers |
Print data in JSON format. |
PrintYAMLPublisher |
ccflow.publishers |
Print data in YAML format. |
PrintPydanticJSONPublisher |
ccflow.publishers |
Print pydantic model as json. See https://docs.pydantic.dev/latest/concepts/serialization/#modelmodel\_dump\_json |
ArrowDatasetPublisher |
Coming Soon! | |
PandasDeltaPublisher |
Coming Soon! | |
EmailPublisher |
Coming Soon! | |
MatplotlibFilePublisher |
Coming Soon! | |
MLFlowArtifactPublisher |
Coming Soon! | |
MLFlowPublisher |
Coming Soon! | |
PandasParquetPublisher |
Coming Soon! | |
PlotlyFilePublisher |
Coming Soon! | |
XArrayPublisher |
Coming Soon! |
ccflow comes with "evaluators" that allows you to evaluate (i.e. run) CallableModel s in different ways.
The following table summarizes the "evaluator" models.
Note
Some models are still in the process of being open sourced.
| Name | Path | Description |
|---|---|---|
LazyEvaluator |
ccflow.evaluators |
Evaluator that only actually runs the callable once an attribute of the result is queried (by hooking into __getattribute__) |
LoggingEvaluator |
ccflow.evaluators |
Evaluator that logs information about evaluating the callable. |
MemoryCacheEvaluator |
ccflow.evaluators |
Evaluator that caches results in memory. |
MultiEvaluator |
ccflow.evaluators |
An evaluator that combines multiple evaluators. |
GraphEvaluator |
ccflow.evaluators |
Evaluator that evaluates the dependency graph of callable models in topologically sorted order. |
ChunkedDateRangeEvaluator |
Coming Soon! | |
ChunkedDateRangeResultsAggregator |
Coming Soon! | |
RayChunkedDateRangeEvaluator |
Coming Soon! | |
DependencyTrackingEvaluator |
Coming Soon! | |
DiskCacheEvaluator |
Coming Soon! | |
ParquetCacheEvaluator |
Coming Soon! | |
RayCacheEvaluator |
Coming Soon! | |
RayGraphEvaluator |
Coming Soon! | |
RayDelayedDistributedEvaluator |
Coming Soon! | |
ParquetCacheEvaluator |
Coming Soon! | |
RetryEvaluator |
Coming Soon! |
A Result is an object that holds the results from a callable model. It provides the equivalent of a strongly typed dictionary where the keys and schema are known upfront.
The following table summarizes the "result" models.
| Name | Path | Description |
|---|---|---|
GenericResult |
ccflow.result |
A generic result (holds anything). |
DictResult |
ccflow.result |
A generic dict (key/value) result. |
ArrowResult |
ccflow.result.pyarrow |
Holds an arrow table. |
ArrowDateRangeResult |
ccflow.result.pyarrow |
Extension of ArrowResult for representing a table over a date range that can be divided by date, such that generation of any sub-range of dates gives the same results as the original table filtered for those dates. |
NarwhalsResult |
ccflow.result.narwhals |
Holds a narwhals DataFrame or LazyFrame. |
NarwhalsDataFrameResult |
ccflow.result.narwhals |
Holds a narwhals eager DataFrame. |
NumpyResult |
ccflow.result.numpy |
Holds a numpy array. |
PandasResult |
ccflow.result.pandas |
Holds a pandas dataframe. |
XArrayResult |
ccflow.result.xarray |
Holds an xarray. |
This wiki is autogenerated. To made updates, open a PR against the original source file in docs/wiki.
Get Started
Tutorials
Developer Guide