Skip to content

Conversation

RobertLucian
Copy link
Member

@RobertLucian RobertLucian commented Oct 8, 2020

Closes #619, closes #1252.

For the Python, ONNX and Tensorflow predictors, the following functionalities are added:

  • Live model reloading - loads/reloads/unloads models from S3 buckets as the API is running.
  • Multi-model caching - gives the API access to thousands of models while only having a couple loaded in memory.

checklist:

  • run make test and make lint
  • test individual high-level functionalities
    • Python predictor + live reloading of models.
    • Python predictor + caching of models.
    • ONNX predictor + live reloading of models.
    • ONNX predictor + caching of models.
    • TensorFlow predictor + live reloading of models.
    • TensorFlow predictor + caching of models.
  • add more pertinent logs
  • retrieve metadata for the models
  • create examples for each of the above tested functionalities
  • re-test each functionality using the provided examples
  • add documentation
  • create tickets for the remaining tasks

@RobertLucian RobertLucian merged commit 1cd5c22 into master Nov 11, 2020
@RobertLucian RobertLucian deleted the feature/multi-model-caching branch November 11, 2020 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance A performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support live reloading of models Serve a collection of custom models based on LRU
3 participants