Skip to content

Latest commit

 

History

History
77 lines (53 loc) · 3.61 KB

README.md

File metadata and controls

77 lines (53 loc) · 3.61 KB

This code uses DBSCAN from the sklearn.cluster module to perform clustering on click data.


Class ClusterClicks: The ClusterClicks class performs DBSCAN clustering on the provided click data. It has the following attributes

clustering: An instance of DBSCAN from the sklearn.cluster module used for clustering. clusters: The result of the clustering fit. cluster_id: The cluster label of the last sample. And the following method:

fit: Perform DBSCAN clustering on the provided click data and return the cluster label of the last sample.

Function save_click_and_predict_cluster_api: This function accepts a request object of type NewClickRequest and a database session object. The function processes the incoming request and adds the new click data to the database and stores the click's coordinates in memory in the form of a list. Then the function performs clustering on the stored click data for the specific page ID and returns a response object of type NewClickResponse containing the cluster index and a flag indicating whether the cluster is new or not.

Function predict_cluster_api: This function accepts a request object of type NewClickRequest. The function processes the incoming request and extracts coordinates and page_id from it, stores the click's coordinates in memory in the form of a list. The function then clusters the incoming coordinates per page_id and returns a cluster_id that the input coordinates belong to.


NOTE: In this code, the global variable clicks stores the click data and clusters stores the result of the clustering fit, ordered by page_id.


What's included

  • FastAPI (examples in app/main.py)
  • SQLAlchemy (models in app/db_models.py)
  • PostgreSQL
  • NumPy

Getting started

  1. Install Docker
  2. Clone the repository
  3. Use docker-compose up in your Terminal to start the Docker container.
  4. The app is defaulted to run on localhost:8004
    • /: The root url (contents from app/main.py)

Rebuild infrastructure (not for code changes)

  • docker-compose build

Troubleshooting

  • docker-compose down (This will destroy all your containers for the project, might need it when you change db models)
  • docker-compose up

Infrastructure changes

I had to make these changes to ensure that the app could be setup:

  • To resolve the following error during initial setup:
ERROR [internal] load metadata for docker.io/tiangolo/uvicorn-gunicorn-fastapi:python3.8
  • Change Docker image from tiangolo/uvicorn-gunicorn-fastapi:python3.8 to tiangolo/uvicorn-gunicorn-fastapi:python3.10.

  • @contextmanager decorator was changed to a function in instance.py as without it, the CREATE TABLE was failing.

  • Also changed database id to not include uuid_generate_v4() function as PostgreSQL gave a function not found error. Subsequent change to gen_random_uuid() also failed so resorted to using autoincrement for PRIMARY KEY

  • Changed the APIRoute from /save_click_and_predict_cluster to /new_click per the files in tests/cases/test_cases_XX.json

Running tests

Installation

python3 -m pip install requests

Run test cases

python3 tests/test.py

NOTE: Due to the use of DBSCAN and it's parameters specifically eps, which determines how close points should be to each other to be considered a part of a cluster and min_samples, which indicates the number of neighbouring points required for a point to be considered as a dense region, or a valid cluster; the clusters created though valid do not conform to the test_cases. Further discussion on these settings are required.