Annotation Web Service (AWS) is a syntax highlighting web service based on a deep learning (DL) model. The goal was to build an API that uses the DL model to provide syntax highlighting for Java, Kotlin and Python3. Furthermore, the incoming requests should be used to train the DL model and to further improve its accuracy.
This README.md
focuses on the technical aspects and the running and configuration of the services. For a more in-depth description of the functionalities, technologies and our development process please consult our Wiki. The original motivation and requirements for this project can be found in the project instructions provided by the lecturers of the course.
The Annotation Web Service consists of the following microservices:
Microservice | Description | Technology | Status |
---|---|---|---|
Annotation Service | Handles the annotation of code, i.e. lexing and highlighting. | Java with Spring Boot | |
Prediction Service | Handles the prediction of syntax highlighting. | Python with Flask | |
Training Service | Handles the regularly conducted training and exchange of the underlying prediction models. | Python with Flask | |
Web API | Acts as the primary entry point for all services. | JS/TS with Nest.js |
Every microservice is running in a Docker container. An extensive documentation of each microservice is provided in the Wiki.
In addition to the microservices listed above, we have implemented a number of utils/helpers and a proof-of-concept of a demo-frontend that uses the API provided by the microservices. These tools are intended for internal use only. Thus, they do not adhere to the same code quality standards as the microservices. Nevertheless, they demonstrate how the API can be used in various environments.
Tool | Description | Technology |
---|---|---|
Demo Frontend | A single page webapp that demonstrates how the API could be used by a potential customer. | JS/TS with Vue |
Code Fetcher | A command line tool to download source code from GitHub and send it to the API. | Python |
Load Tester | A simple script to send a lot of concurrent requests to the API and analyze the performance under heavy load. | Javascript with K6 |
The microservices rely on a number of environment variables for their configuration. The environment variables are defined in a .env
file in the project root. This file is used within the docker-compose.yml
to pass the configuration to the services. The following table displays an overview of the environment variables and their default values:
Variable Name | Description | Example Value |
---|---|---|
MONGO_USERNAME |
The username used for the MongoDB. | hack3rz |
MONGO_PASSWORD |
The password used for the MongoDB. | palm_tree_poppin_out_the_powder_blue_sky |
MONGO_DATABASE_NAME |
The database name for the MongoDB. | aws |
MONGO_DATABASE_TEST_NAME |
The test database name for the MongoDB. | aws_test |
MONGO_PORT |
The port on which the MongoDB runs. | 27017 |
MONGO_HOST |
The host for the MongoDB. | mongodb |
MONGO_AUTH_DATABASE |
The database used in MongoDB for the authentication (holds default users). | admin |
DB_CONNECTION_STRING |
The connection string for the MongoDB. | mongodb://hack3rz:palm_tree_poppin_out_the_powder_blue_sky@mongodb:27017/aws?authSource=admin |
MODEL_NAME |
The prefix used when storing a model locally on the disk. | best |
MIN_TRAINING_BATCH_SIZE |
The minimum amount of annotations required before a training is started. | 100 |
DEMO_FRONTEND_PORT |
The port on which the demo frontend runs. | 80 |
WEB_API_PORT |
The port on which the web api runs. | 8081 |
SWAGGER_UI_PORT |
The port on which the Swagger UI runs. | 8082 |
ANNOTATION_SERVICE_PORT |
The port on which the annotation service runs. | 8083 |
PREDICTION_SERVICE_PORT |
The port on which the prediction service runs. | 8084 |
TRAINING_SERVICE_PORT |
The port on which the training service runs. | 8085 |
NGINX_PORT |
The port on which the nginx load-balancer runs. | 4000 |
The following table contains a list of all docker containers that are part of the docker-compose
setup and their respective images used. All the images prefixed with richner
were developed as part of this project and are publicly available on DockerHub.
Name | Description | Image |
---|---|---|
Demo Frontend | The single demo frontend. | richner/demo-frontend:latest |
Web API | The web API that wraps all the other microservices. | richner/web-api:latest |
Swagger UI | The swagger UI that holds the documentation for all services. | swaggerapi/swagger-ui |
Annotation | The annotation service. | richner/annotation-service:latest |
Prediction | The prediction service. | richner/prediction-service:latest |
Training | The training service. | richner/training-service:latest |
Nginx | The NGINX load-balancer and reverse-proxy. | nginx:latest |
MongoDB | The MongoDB that is used by the annotation, prediction and training service. | mongo:5.0.6 |
Make sure that you use Docker Compose V2 and activate it in your docker setup. Within Docker Desktop, go to "Settings" and toggle the “Use Docker Compose V2” option under the “General” tab. More information can be found here. You can verify the setting by running:
$ docker-compose -v # should output v2.X.X
Use the following command to run all services using docker-compose
:
$ docker-compose up --build --scale prediction=2 --scale annotation=2
Sometimes builds fail on machines with different processor architectures (e.g. on M1 MacBooks). In other cases the build might fail, because there are old versions of the docker containers stored. Use the following command for a clean new build:
$ docker-compose up -d --force-recreate --renew-anon-volumes --build --scale prediction=2 --scale annotation=2
The MongoDB is launched as a separate Docker container. The credentials are stored within the environment of the other containers, so they can access it.
A folder data
in the project root is mounted as a volume for the database. This folder will persist the data in the database even when the containers are reset. If you want to reset the database you can just delete the contents of this folder. The file mongo-init.sh
is used to initialize the database with a new user and the credentials provided by the environment file.
Make sure the mongodb container is running. Connect to the CLI of the container and use the following command to access the DB:
$ mongo --username "$MONGO_USERNAME" --password "$MONGO_PASSWORD"
Alternatively, you can use a GUI like mongoDBCompass to access the database.
To demonstrate the scaling and redundancy possibilities within the API NGINX is used to act as a load-balancer and reverse-proxy for the annotation and prediction microservices. Consequently, the Web API interacts with NGINX which in turn forwards the requests to the respective microservices. This allows us to scale both the annotation and prediction services. The load is distributed using a round-robin method. The configuration for NGINX can be found in the nginx.conf.template
file.
All the endpoints of each microservice are documented using Swagger. Each microservice contains a openapi.json
file that documents the endpoints using the OpenApi specification.
If the docker-compose
is run there will be an additional swagger container running at localhost:8082
.
Currently, it's not possible to automate the deployment with GitHub Actions because our student subscription via UZH does not have the privilege to create a service account which would be required for automated deployments. However, there is a possibility to manually deploy the Docker containers to Azure. Please make sure your Azure account is owner of Azure's resource group called hack3rz and you have installed Azure CLI on your machine. Then use the following commands to deploy the containers:
- Login to Azure with your credentials and setup context for Azure Container Instances (ACI):
$ az login
$ az account set --subscription 02b30768-05c8-4ad0-acc8-dda03818d4d6
$ az acr login --name hack3rzacr
$ docker login azure
- Run the following shell script to deploy and redeploy the containers:
$ sh deploy-azure.sh
- After a successful deployment you can check the status of the deployed containers at Azure Portal. The public domain name is
hack3rz-aws.switzerlandnorth.azurecontainer.io
. The demo is accessible via http://hack3rz-aws.switzerlandnorth.azurecontainer.io. A test request can be made with the following command:
$ curl -X 'POST' \
'http://hack3rz-aws.switzerlandnorth.azurecontainer.io:8081/api/v1/highlight' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"code": "public static void main(String args[]){ System.out.println(\"testing\") }",
"language": "java"
}'
The container configuration for the deployment on azure can be found in the file docker-compose-azure.yml
.
The current deployment configuration found in docker-compose-azure.yml
is a preliminary version. The following restrictions apply:
- CosmosDB instead of MongoDB is used
- Swagger UI is not deployed
- Nginx is not deployed
- Training Service cronjob does not work
- There is no model update
- Only a single instance of the prediction service is deployed
- Only a single instance of the annotation service is deployed
Consequently, the deployment only acts as a proof-of-concept and does not yet fully reflect the local docker setup.
A demo is accessible via http://hack3rz-aws.switzerlandnorth.azurecontainer.io.
Attention: The restrictions / caveats mentioned above apply. Use docker-compose
to test our service with its full functionality.
This project has been built by team Hack3rz as part of the Advanced Software Engineering course at the University of Zurich in the spring semester 2022:
It is based on the following libraries: