This repository has been archived by the owner on Sep 18, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
worker_rest_api
Grzegorz Mrukwa edited this page Nov 27, 2017
·
1 revision
The main problem addressed by this document is to provide consistent API exposed by each algorithm worker container.
Common part of the algorithms:
- dataset (is it common?)
Disjoint parts:
- there may be different datasets needed (more than 1)
- there may be other parameters needed, like ROI
- Requirements for the worker: a. Heartbeat b. 202 if accepted new job - immediate response from worker c. returns response to master when finishes d. 503 if scaled down & preoccupied
- Requirements for the master: a. Manage heartbeat for different type of workers independently b. Serve returned results (?) c. Renew request if 503
- Benefits: a. possibility to provide partial completion status b. possible fault tolerance
- Cons: a. complexity
- Post request to worker: master -> worker
POST /worker/job -> 503 Unavailable POST /worker/job -> 503 Unavailable POST /worker/job -> 503 Unavailable POST /worker/job -> 503 Unavailable POST /worker/job -> 200 OK { id: job_id }
- Meanwhile, worker notifies master about its health: worker -> master (e.g. every 10sec)
POST /master/healthcheck { processed_job: job_id, response_type: 'gmm_response' }
- Finally, worker responds to master with results: worker -> master (retried up to 60sec)
Success -> POST /master { id: string = job_id, response_type: string = 'gmm_response', result: object = { ...? } } Failure in algorithm -> POST /master { id: string = job_id, response_type: string = 'error', stack_trace: string, exception: string, message: string } Failure in Web service -> no request to master -> worker removed from master's list because of healthcheck
- Requirements for the worker: a. 200 with payload when finished - response after potential several minutes
- Requirements for the master: a. Serve returned results
- Benefits: b. Simplicity of implementation
- Cons: a. No fault-tolerance
- Master posts job to worker: master -> worker
POST /worker/job -> timeout - retry POST /worker/job -> timeout - retry POST /worker/job -> blocked by worker
- Worker responds with result: worker -> master
Success -> 200 OK { response_type: string = 'gmm_response', result: object = { output_file: string = '\\share\data\output_file', } } Failed in algorithm -> 500 Server Error { response_type: string = 'algorithm_error', stack_trace: string, exception: string, message: string } Failed in Web service -> no output
Involved services:
- 2x Web
- 1x master
- 1x database: MSSQL/Postgres?
- Nx worker: GMM/DiviK/other
web traffic -> Web Worker
\ /
... DB - Master ...
/ \
web traffic -> Web Worker
Web = front + API + nginx
Responsibilities:
- accept computation requests from external world (API):
- DiviK
- GMM
- ROI
- allow computation artifacts retrieval
- serve frontend (due to Docker architecture)
- saves information of computation task type that allows to retrieve worker name
- serve computed resources
Master = single process
Responsibilities:
- zoo keeper = sends tasks to proper workers
- knows which worker (by host name) is responsible for particular job (this information should be retrieved by a type of job from database)
Worker = API + algorithm calculation
Responsibilities:
- accept incoming computation requests
- fetch files from storage
- perform computation with defined settings
- saves results to files
- returns location to results & success status, total duration
Web Worker
/ \ /
web traffic -> LB ... DB - Master ...
\ / \
Web Worker