A comprehensive reference guide mapping the entire AI, Machine Learning, Data Science, and Data Engineering ecosystem—with categorized tools, libraries, workflows, and Python usage.
- Supervised Learning
- Regression, Classification
- Libraries:
scikit-learn,XGBoost,LightGBM
- Unsupervised Learning
- Clustering, PCA
- Libraries:
scikit-learn,NumPy,SciPy
- Reinforcement Learning (RL)
- Q-Learning, DQN
- Libraries:
OpenAI Gym,Stable-Baselines3
- Deep Learning (DL)
- CNN, RNN, Transformers
- Libraries:
TensorFlow,PyTorch,Keras - Python used for: Model building, training, deployment
- Tasks: Sentiment Analysis, NER, Translation
- Libraries:
NLTK,spaCy,Hugging Face Transformers,Gensim - Python used for: Tokenization, text preprocessing, model training
- Tasks: Image Classification, Object Detection, Segmentation
- Libraries:
OpenCV,PIL,PyTorch,TensorFlow,Detectron2 - Python used for: Image preprocessing, feature extraction, modeling
- Descriptive, Diagnostic, Predictive, Prescriptive
- Libraries:
Pandas,NumPy - Tools: Jupyter, SQL, Python
- Python used for: EDA, statistics, modeling
- Libraries:
Matplotlib,Seaborn,Plotly,Altair - Tools: Tableau, Power BI
- Python used for: Plots, charts, dashboards
- Tools:
scikit-learn,pandas,NumPy - Python used throughout ML lifecycle
- Tools: Apache NiFi, Talend, Informatica, dbt
- Python used with:
Pandas,PySpark
- Batch: Apache Spark, AWS Glue
- Stream: Apache Kafka, Apache Flink
- Python used for: Transformation logic, UDFs
- Tools: Apache Airflow, Prefect, Luigi
- Python used to define DAGs and scheduling
- Tools: BigQuery, Redshift, Snowflake
- Python used to connect via
sqlalchemy,pandas-gbq, etc.
- Tools: Power BI, Tableau, Looker
- Python integration: Script execution, data export
- Use cases: KPIs, reports, dashboards
- Tools:
FastAPI,Flask, Docker, Kubernetes - Python used to serve models as REST APIs
- Tools: MLflow, DVC, Kubeflow
- Python used for automation and pipeline creation
- Tools: Evidently AI, WhyLabs
- Python used to retrain and monitor models
- Tools:
Great Expectations,Deequ - Python used for writing expectations
- Tools: Amundsen, Apache Atlas
- Libraries:
Faker, custom Python scripts
| Tool/Library | Used In | Purpose |
|---|---|---|
| Python | Everywhere | General scripting, analysis, modeling, deployment |
| Pandas | Data Science, ETL, Analytics | Data wrangling, tabular data, EDA |
| NumPy | ML, DL, Scientific Computing | Fast numerical computations |
| SciPy | Stats, ML, Signal/Image Processing | Advanced scientific computation |
| scikit-learn | ML (classification, regression, clustering) | Traditional ML modeling |
| TensorFlow | Deep Learning, CV, NLP | Neural networks, large-scale DL |
| PyTorch | Deep Learning, Research | Flexibility, academic research, vision/NLP tasks |
| Matplotlib | Visualization | Static plots and charts |
| Seaborn | Visualization | Statistical plots built on Matplotlib |
| Plotly | Interactive Visualization | Dashboards, web-based visual insights |
| Airflow | Data Engineering, MLOps | Workflow orchestration (Python DAGs) |
| FastAPI | MLOps, API Services | Fast, async REST APIs for ML models |
| MLflow/DVC | MLOps, CI/CD | Model tracking, version control |
| NLTK/spaCy | NLP | Tokenization, text preprocessing |
| OpenCV | Computer Vision | Image preprocessing, detection |
| Hugging Face | NLP, Transformers | Pretrained models, pipelines |
