🚌 DataJourney

🪶Short version

Design- first Open Source Data Management Toolkit. Simplifies data workflows with modular, reproducible solutions

🌲Long version

DataJourney demonstrates how organizations can effectively manage and utilize data by harnessing the power of open-source technologies. It's designed to help navigate the complex landscape of data tools, offering a structured approach to building scalable, and reproducible data workflows.

Built on open-source principles, the framework guides users through essential steps—from identifying goals and selecting tools to testing and customising workflows. With its flexible, modular design, DataJourney can be tailored to individual needs, making it an invaluable toolkit for data professionals.

🧱 Design Philosophy (LEGO)

Built with additive, subtractive capabilities glued with open source. Each layer has a certain strength of communication inbuilt

PO (Base): Static home(s) to keep it together (GitHub)
P1 (Tooling): Tooling, strings (Powered by open source)
P2 (Maintenance + Monitoring): Env, automations (Pixi + GHA)
P3 (Abstraction): Layer(s), CLI/task manager for users to interact with (Pixi)

🛠 Current workflows covered

{✨= Experimental, ✅ = Implemented}

✅ Python Packaging framework design principles
✅ GitHub actions configured
✅ Vale.sh configured at PR level
✅ Pre-commit hooks configured for code linting/formatting
✨ LangChain Basics & workflows
✅ Environment management via pixi
✅ Reading data from online sources using intake
✅ Sample pipeline built using Dagster
✅ Building Dashboard using holoviews + panel
✅ Exploratory data analysis (EDA) using mito
✅ Web UI build on Flask
✅ Web UI re-done and expanded with FastHTML
✅ Leverage AI models to analyse data GitHub AI models Beta

☕️ Quickly getting started with DataJourney

Clone DJ git@github.com:sayantikabanik/DataJourney.git
Generate & add GITHUB_TOKEN, instructions here
- Added requirement to run the LLM workflows
Switch directory cd DataJourney
Download pixi : prefix.dev
Activate env: pixi shell
Install DJ framework locally pixi run DJ_package
List all the tasks: pixi task list
Execute a task from the list: pixi run <TASK>
Execute a task with verbosity enabled: pixi run -v <TASK>

🏃🏽‍♀️ Active `tasks` under DJ

GIT_TOKEN_CHECK
DJ_package
DJ_pre_commit
DJ_dagster
DJ_fasthtml_app
DJ_flask_app
DJ_mito_app
DJ_panel_app
DJ_llm_analysis
DJ_hello_world_langchain

🔌 About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details

pixi run DJ_pre_commit

🦭 Executing LLM script: Generate stock price recommendations

pixi run DJ_llm_analysis

🪼 Execute pre-configured Dagster pipeline

pixi run DJ_dagster

🐙 Panel app

pixi run DJ_panel_app

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_twilio_dashboard

🐵 Mito

To explore further visit trymito.io

pixi run DJ_mito_app

🦋 Display all data sources present via web UI

# Run FastHTML app
pixi run DJ_fasthtml_app

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🚌 DataJourney

🪶Short version

🌲Long version

🧱 Design Philosophy (LEGO)

🛠 Current workflows covered

☕️ Quickly getting started with DataJourney

🏃🏽‍♀️ Active `tasks` under DJ

🔌 About pre-commit-hooks and activating

🦭 Executing LLM script: Generate stock price recommendations

🪼 Execute pre-configured Dagster pipeline

🐙 Panel app

🐵 Mito

🦋 Display all data sources present via web UI

Files

README.md

Latest commit

History

README.md

File metadata and controls

🚌 DataJourney

🪶Short version

🌲Long version

🧱 Design Philosophy (LEGO)

🛠 Current workflows covered

☕️ Quickly getting started with DataJourney

🏃🏽‍♀️ Active tasks under DJ

🔌 About pre-commit-hooks and activating

🦭 Executing LLM script: Generate stock price recommendations

🪼 Execute pre-configured Dagster pipeline

🐙 Panel app

🐵 Mito

🦋 Display all data sources present via web UI

🏃🏽‍♀️ Active `tasks` under DJ