Skip to content

A simple DVC pipeline to fetch from an SQL DB, cache as parquet for reproducibility and faster processing

License

Notifications You must be signed in to change notification settings

shcheklein/dvc-sql-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DVC Azure SQL example

A simple DVC pipeline to fetch from an SQL DB, cache as parquet for reproducibility and faster processing.

Install

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Depending on the setup and machine, you might need to install ODBC driver. It depends on the OS, please refer to MS ODBC setup docs.

Setup

Create and .env file with:

AZURE_CONNECTION_STRING="DRIVER={ODBC Driver 18 for SQL Server};SERVER=<server>.database.windows.net,1433;DATABASE=<db>;UID=<user>;PWD=<password>"

This file is in .gitignore.

Note! There should be a better way to manage Azure credentials (e.g. using AD or managed identities. This is example is made simple, but we recommend to explore other options.

Running

Run dvc repro or dvc exp run to reproduce the pipeline. Use regular dvc push, dvc pull, etc, to save and load artifacts.

About

A simple DVC pipeline to fetch from an SQL DB, cache as parquet for reproducibility and faster processing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages