Problem Statement: Extracting meaningful insights from raw datasets and selecting the most effective machine learning models remains a significant challenge for both non-technical and technical users. Non-technical users and analysts often face steep learning curves due to complex tools and the need for coding expertise. On the other hand, technical users struggle with fragmented workflows that lack intuitive interfaces for rapid experimentation, hyper-parameter tuning, and performance comparison. This disconnect hinders efficient model development and slows down decision-making across teams
Context and Background: In the current data-driven era, organizations and individuals increasingly rely on data analysis for strategic actions. However, most available tools require programming knowledge or familiarity with data science workflows. This creates a barrier for non-technical users and business professionals who need to make sense of data without specialized skills. Even technical users encounter inefficiencies due to disjointed tools and unintuitive interfaces, making it harder to iterate quickly, fine-tune models, and compare results effectively
Purpose and Contribution: Synapse aims to democratize data analysis by developing a no-code, web-based platform that enables users to upload datasets, perform exploratory data analysis (EDA), and select the most appropriate machine learning model through a simple, conversational interface. The system bridges the gap between usability and advanced analytics by combining automation with natural language interaction
Methods and Approach: Synapse includes a user-friendly web interface with two modes: a visual dashboard for EDA and bayesian optimization, and a chatbot for natural language queries. Upon uploading a dataset file, the system automatically handles data preprocessing such as cleaning, encoding, and scaling. Users can visualize the dataset and interact with the chatbot. For model selection, Bayesian optimization is used to identify the best-fit algorithm for classification
Results and Conclusion: Synapse successfully simplifies complex data tasks, enabling users to analyze and interpret their datasets without writing code. It demonstrates that combining automation, natural language processing, and model optimization can make machine learning more accessible, thereby enhancing decision-making for users across technical and non-technical domains
- Engineered a full-stack real-time ML platform enabling the automation of ML workflows
- Integrated Bayesian Optimization to autonomously tune hyperparameters for diverse models
- Implemented an automated EDA pipeline generating insightful and interactive visualizations
- Developed a robust customisable preprocessing engine with intelligent missing value handling, feature selection, and scaling
- Embedded an AI chatbot to provide data-driven insights and statistical interpretations for technical and non-technical users
- Backend & Frameworks
- Python (Flask): The core web framework used to build the application
- Flask-SocketIO: Enables real-time, bi-directional communication for the EDA and training logs
- Flask-SQLAlchemy: ORM for database management
- Flask-Dance: Handles Google OAuth 2.0 authentication
- Machine Learning & AI
- Scikit-Learn: Used for standard algorithms (SVM, KNN, Random Forest, etc.) and metrics
- Scikit-Optimize (skopt): Powers the Bayesian Optimization engine for hyperparameter tuning
- XGBoost & LightGBM: Advanced gradient boosting frameworks integrated into the pipeline
- Visualization
- Matplotlib & Seaborn: Generates static charts like correlation heatmaps and pairplots (rendered to Base64)
- Pygal: Used for interactive vector-based (SVG) visualizations
- Frontend
- HTML5 / CSS3 / JavaScript: Core technologies for the user interface
- GSAP (GreenSock): Used for advanced animations and scroll triggers
- Motion (Motion One): A modern animation library for UI transitions
- JSZip & FileSaver.js: Allows users to zip and download generated charts directly from the browser
- Database
- SQLite: A fast and simple database used for storing user data and task information
To run the application locally, follow these steps:
First of all, ensure that you have git and Python 3.8+ installed on your machine. Then, run the following commands:
# Clone the repository
git clone https://github.com/msr8/synapse
cd synapse/src
# Install the required dependencies
pip install -r requirements.txt
# Run the flask application
python run.pyThe application will be accessible at http://127.0.0.1:5000 in your web browser
Warning
These instructions are intended for local deployment only. For production deployment, use a production-ready server like Gunicorn or uWSGI, and consider using a reverse proxy like Nginx
| URL Path | Description |
|---|---|
/ |
Landing page |
/learn-more |
Information page about the project |
/dashboard |
User dashboard displaying tasks |
/login |
User login page |
/signup |
User registration page |
/logout |
Logs the user out |
/login/google-authorised/ |
Google OAuth callback URL |
/task/<int:task_id> |
Main interface for a specific task |
/api/auth/login |
API to handle user login |
/api/auth/signup |
API to handle user registration |
/api/auth/change-username |
API to update the current user's username |
/api/auth/change-password |
API to update the current user's password |
/api/upload |
API to handle dataset uploads |
/api/task/set-target |
API to set the target column for a task |
/api/task/change-taskname |
API to rename a specific task |
/api/task/delete-task |
API to delete a task |
/api/task/chatbot/initialise |
API to start the LLM chat session |
/api/task/chatbot/chat |
API to send a message to the chatbot |
/api/task/chatbot/reset |
API to clear chat history |
We optimise over the following classification models using Bayesian optimization to find the best model and hyperparameters for a given dataset:
1) K-Nearest-Neighbours
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_neighbors |
Number of neighbors to use | Integer | 1 to 30 |
weights |
Weight function used in prediction | Categorical | uniform, distance |
metric |
Distance metric to use | Categorical | chebyshev, cosine, euclidean, manhattan, minkowski, sqeuclidean |
2) Support Vector Machine
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
C |
Regularization parameter | Float | 1e-4 to 1e+4 (log-uniform) |
kernel |
Kernel type to be used | Categorical | rbf,sigmoid, poly |
degree |
Degree of the polynomial kernel | Integer | 1 to 3 |
gamma |
Kernel coefficient | Categorical | scale |
3) Logistic Regression
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
C |
Inverse of regularization strength | Float | 1e-6 to 1e+6 (log-uniform) |
penalty |
Norm used in penalization | Categorical | l1, l2 |
solver |
Optimization algorithm | Categorical | liblinear, saga |
4) Decision Tree
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
criterion |
Function to measure split quality | Categorical | gini, entropy |
splitter |
Strategy used to choose split | Categorical | best, random |
max_depth |
Maximum depth of the tree | Integer | 1 to 10 |
min_samples_split |
Min samples required to split node | Integer | 2 to 10 |
min_samples_leaf |
Min samples required at leaf node | Integer | 1 to 10 |
max_features |
Number of features to consider | Categorical | None, sqrt, log2 |
5) Random Forest
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_estimators |
Number of trees in the forest | Integer | 10 to 100 |
criterion |
Function to measure split quality | Categorical | gini, entropy |
max_depth |
Maximum depth of the tree | Integer | 1 to 10 |
min_samples_split |
Min samples required to split node | Integer | 2 to 10 |
min_samples_leaf |
Min samples required at leaf node | Integer | 1 to 10 |
max_features |
Number of features to consider | Categorical | None, sqrt, log2 |
6) Extra Trees
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_estimators |
Number of trees in the forest | Integer | 10 to 100 |
criterion |
Function to measure split quality | Categorical | gini, entropy |
max_depth |
Maximum depth of the tree | Integer | 1 to 10 |
min_samples_split |
Min samples required to split node | Integer | 2 to 10 |
min_samples_leaf |
Min samples required at leaf node | Integer | 1 to 10 |
max_features |
Number of features to consider | Categorical | None, sqrt, log2 |
7) Gradient Boosting
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_estimators |
Number of boosting stages | Integer | 10 to 100 |
learning_rate |
Shrinks contribution of each tree | Float | 1e-6 to 1 (log-uniform) |
max_depth |
Maximum depth of estimators | Integer | 1 to 10 |
min_samples_split |
Min samples required to split node | Integer | 2 to 10 |
min_samples_leaf |
Min samples required at leaf node | Integer | 1 to 10 |
max_features |
Number of features to consider | Categorical | None, sqrt, log2 |
8) Light Gradient Boosting Machine (LGBM)
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_estimators |
Number of boosted trees | Integer | 10 to 100 |
learning_rate |
Boosting learning rate | Float | 1e-6 to 1 (log-uniform) |
max_depth |
Maximum tree depth | Integer | -1 to 15 |
num_leaves |
Max tree leaves for base learners | Integer | 10 to 50 |
min_child_samples |
Min data needed in a leaf | Integer | 5 to 20 |
subsample |
Subsample ratio of training instance | Float | 0.5 to 1.0 |
colsample_bytree |
Subsample ratio of columns per tree | Float | 0.5 to 1.0 |
reg_alpha |
L1 regularization term | Float | 0.0 to 5.0 |
reg_lambda |
L2 regularization term | Float | 0.0 to 5.0 |
9) Ada Boost
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_estimators |
Maximum number of estimators | Integer | 10 to 100 |
learning_rate |
Weight applied to each classifier | Float | 1e-6 to 1 (log-uniform) |
10) Bagging
| Hyperparameter | Description | Type | Range / Values |
|---|---|---|---|
n_estimators |
Number of base estimators | Integer | 10 to 100 |
max_samples |
Number of samples to draw | Float | 0.1 to 1.0 |
max_features |
Number of features to draw | Float | 0.1 to 1.0 |
bootstrap |
Draw samples with replacement | Boolean | True, False |
bootstrap_features |
Draw features with replacement | Boolean | True, False |










