-
Notifications
You must be signed in to change notification settings - Fork 607
Add support for multi model caching & live reloading #1428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
290 commits
Select commit
Hold shift + click to select a range
243d27d
Add some boilerplate to clear the head
RobertLucian d1181af
Add even more boilerplate
RobertLucian fd9ec82
WIP crons
RobertLucian 8a72779
WIP for model validation on the serving container
RobertLucian 31d1640
Few comments on lib.model.validation
RobertLucian 6511653
Add json converter for model template
RobertLucian c53909c
WIP add validation for model template
RobertLucian 4de6270
Add checks/exceptions for model template
RobertLucian 3014205
Fix GenericPlaceholder validation
RobertLucian 3ebf70a
Add recursive model validation and add fixes
RobertLucian b78fede
Small fix for validate_integer_placeholder
RobertLucian b5ad9d3
Replace ExclAlternativePlaceholder w/ OneOfAllPlaceholder
RobertLucian b6098ed
Implement OneOfAll Placeholder for validation
RobertLucian 15a7d97
Fix AnyPlaceholder when the folder is empty
RobertLucian 66730c7
Add validation for PlaceholderGroup
RobertLucian c1ac84f
Small mods to validation
RobertLucian 6adb9cc
Add docstrings + add TensorFlowNeuronPredictor
RobertLucian b690ae6
Add validation for models:dir in Python
RobertLucian 69b37bd
Make certain functions "private"
RobertLucian 9d6ecc4
Implement part of the SimpleModelMonitor
RobertLucian 580b946
Add LockedFile class and more
RobertLucian 40f557f
WIP SimpleModelMonitor
RobertLucian 3532dd4
Add mechanism for live-model-reloading
RobertLucian b864f15
Few forgotten docstrings
RobertLucian cd34eca
Add structure to hold models in memory
RobertLucian ef17e85
WIP - live-reloading w/ parts of model caching
RobertLucian 8e2fc2a
Finalize ONNX client for non-models:dir paths
RobertLucian 0d00be4
Add logic for making predictions when models:dir is set for ONNX client
RobertLucian 718b791
Don't encode when writing to locked file
RobertLucian d5f5e1b
Renaming types & comment
RobertLucian 7a74c34
Add RWLock
RobertLucian 166ee6b
Refactor serving library
RobertLucian 9269263
WIP on LRU, model tree & syncing
RobertLucian e29563b
Add ModelsTree class & Rm LockedStateAndLRU
RobertLucian dd13b40
WIP on model locking, global locking for GC, model structs
RobertLucian 7f4a636
Add docstrings to existing methods
RobertLucian 4800d86
Improve docstrings
RobertLucian 26be81f
WIP MMC
RobertLucian 4c5efb2
WIP MMC
RobertLucian 262dcb8
WIP MMC refactor
RobertLucian fc2bbcc
WIP MMC implement retrieving mechanism for ONNX
RobertLucian 0502a11
Fix ReadWriteLock & add writer-preferring RW lock
RobertLucian 63c54b7
Add preference policy changer to ReadWriteLock
RobertLucian 87d401c
Remove comments & add TODO
RobertLucian b1cebe7
Implement v. selection when model v. is latest or highest
RobertLucian 413b523
Implement tag counting func. for model holder
RobertLucian 2e781a4
Improve tag counting functionality for models holder
RobertLucian 8930482
Implement download model callback when caching
RobertLucian 45aac6c
Replace global -> model access for models tree
RobertLucian c1aa9f2
Rename model.lru to model.model
RobertLucian c642107
Parse timestamp from datetime when downloading model
RobertLucian 1a5479e
Apply locking for ModelsTree.update_models method
RobertLucian ddf896f
Reduce the locking time for the model tree
RobertLucian e5a0022
Implement model upstream timestamp update when caching is disabled
RobertLucian 1bc7f7f
Remove models that no longer appear in model_names, if any present at…
RobertLucian 644a97a
Load/remove locally-provided models
RobertLucian 05029c9
Fix missing UTC timestamp when caching is disabled
RobertLucian 2e127be
Support non-version models for all predictor types (not enabled)
RobertLucian 78f823c
Enable non-version model support for all predictor types
RobertLucian ab17828
Add GC cron for when caching is enabled
RobertLucian efa1507
Use abstract threading class for all crons
RobertLucian 54e200d
Python package import errors
RobertLucian 0ec565a
Add models tree updater cron
RobertLucian 0550145
Add "latest"/"highest" model tree preloader cron
RobertLucian 2f70930
WIP MMC
RobertLucian 320e651
Local models availability & Predictor impl
RobertLucian 8c7db86
Various fixes, PythonPredictor client, Predictor
RobertLucian eb6e2fd
Set load method for Python Predictor client
RobertLucian 7dc18c4
WIP TensorFlow API client
RobertLucian be46404
WIP TensorFlowServing API client
RobertLucian f3c1358
Implement TFS API for loading/unloading models dynamically
RobertLucian 99d8b46
Add extra arguments to TFS container
RobertLucian 88794cc
Add models to TFS tree even if they failed to load
RobertLucian 1574537
WIP MMC - TF client
RobertLucian d43ce17
WIP MMC - TF client
RobertLucian cefd834
Fully implement the TensorFlowServingAPI's class
RobertLucian 3f52919
Add extra keyword-arguments to load model func
RobertLucian a577031
WIP MMC - TF client
RobertLucian 46e75f4
WIP MMC - add remove callback for TF client
RobertLucian 253bff5
Add model version detector for TF client when num procs > 1
RobertLucian a25321e
Pass kwargs to load callback for TF client when num procs > 1
RobertLucian 4058861
Add cron to update TFS models independently when num procs > 1
RobertLucian e57d6f8
Mods to the cron that updates TFS models when num procs > 1
RobertLucian 01067db
Add all crons to the predictor class
RobertLucian df37358
Fix cron scheduling in Predictor class
RobertLucian 500b534
Support non-versioned TF models in the CLI
RobertLucian 8a58bd0
WIP MMC - python model validation
RobertLucian e542a6c
WIP MMC - tensorflow model validation
RobertLucian 88570a8
Rename functions more appropriately
RobertLucian cb80530
WIP MMC - rewrite ONNX validation for the CLI
RobertLucian 55fde7b
WIP MMC - rewrite TF validation (tested)
RobertLucian 4200da0
WIP MMC - rewrite python validation
RobertLucian 781327a
Properly wrap error messages for model validations
RobertLucian 6cc38a5
Don't ignore hidden files/folder when validating models
RobertLucian 5e45b50
Use variable to specify whether path is S3 or not
RobertLucian 9cc481b
Add testing logs to CLI API validation
RobertLucian 4501f1d
Remove model downloading with downloader container
RobertLucian d4f856a
Add a bunch of number slice to string slice funcs
RobertLucian fa6364d
Cache only local models when local provider is used
RobertLucian 1af5e53
Refactor & fix module import issues
RobertLucian 57abbdf
Fix files-based locking class
RobertLucian 6d80bca
Further fixes for file-based locking
RobertLucian 9261111
Add missing argument to lock classes
RobertLucian 71ca8ad
Fix thread locking syntax error
RobertLucian 7f5a31c
Fix logic errors in thread locking module
RobertLucian 9dd353f
Rename thread locking classes
RobertLucian b17585c
Prevent attribute not found error when __del__ is called
RobertLucian 0221d18
Add helper API dumps (temporarily)
RobertLucian 3e62767
Add another spec output helper (tbr)
RobertLucian ef1b0dc
Fix bugs in CuratedModelResources class
RobertLucian 932241a
Fix bugs with lib.concurrency/model
RobertLucian 7d1ffdc
Fix bugs with ModelTreeUpdater
RobertLucian 9ef1951
Fix find_all_s3_models func when is_dir_used=True
RobertLucian 4c65f1a
Fix multiple-bucket bug with the find_all_s3_models function
RobertLucian 6dcc7f6
Fix model validation bug (critical bug)
RobertLucian 46ffb3d
Fix multiple bugs
RobertLucian ba457d2
Ensure model path suffix
RobertLucian b0109bb
ONNX can no longer be specified as a single obj
RobertLucian 17ba5ca
WIP FileBasedModelsTreeUpdater (fixing/testing)
RobertLucian b7ebd0d
Fix WithBreak exception not getting surpressed
RobertLucian 59d361b
WIP fix find_ondisk_models_with_lock, LockedFile; FileBasedModelsTree…
RobertLucian 2733a5c
Further fixes for FileBasedModelsTreeUpdater
RobertLucian 2055d4e
Fix find_all_s3_models and FileBasedModelsTreeUpdater (fully tested)
RobertLucian 069e85d
Fix find_ondisk_model_info helper function
RobertLucian 706a8cf
Fix helper function find_ondisk_models
RobertLucian 4a3b9cf
Fix bugs with TFSModelLoader
RobertLucian 481f3ea
Fix timestamp bleeding into versions that have the same prefix as tha…
RobertLucian 817502f
Fix TFSModelLoader (with tests)
RobertLucian 40b5b79
Re-enable TFS communication
RobertLucian cd4d1e4
Fix bugs with TFSModelLoader & TensorFlowServingAPI
RobertLucian 49b8a66
Further fixes for the TFSModelLoader
RobertLucian 6c1b9d9
Fix TFSModelLoader when TFS is crashing due to OOM
RobertLucian 1486d23
Reload TFSModelLoader when TFS server crashes
RobertLucian 90ed102
Bugs and a bit of refactoring
RobertLucian c7cba58
Fix bug where the wrong versions were updated
RobertLucian f8b2f8b
Rename TFSModelLoader method
RobertLucian c022a98
Eliminate race condition when acquiring model lock
RobertLucian 46c8568
WIP Python client
RobertLucian cd4e30e
Add NumLocalModels func to API struct
RobertLucian f654440
Allow tfs module to be imported when dependencies are not in
RobertLucian 6ccc18e
Allow ONNX client to be imported without dependencies installed
RobertLucian 60c37a1
Fix issues in the serving startup scripts
RobertLucian 6c0b4a9
Object is not indexable - using paranthesis
RobertLucian 4db0181
Fixing a bunch of runtime errors
RobertLucian 9d6c288
Fix logger issue & fix more errors
RobertLucian 85fa362
Better exception message & fix validation bug
RobertLucian da52d7e
Fix syntax errors for ModelsGC
RobertLucian f03fb25
Fix model reloading cron for multiple processes (while running)
RobertLucian 2193faa
Ensure "/" suffix for model paths, use daemons, add grpcio to req
RobertLucian 220c97f
Use file-based locking just when caching is disabled
RobertLucian 963d1bf
Fixes for model live-reloading for Python/ONNX
RobertLucian cf01e59
Finalize fixes for the side-reloading feature for the ONNX and Python…
RobertLucian 9375813
Use shared volume for local TF deployments
RobertLucian 6f6a354
Fix issues with TFSModelLoader cron
RobertLucian 56f2e6f
Fix segmentation fault with the TFS API
RobertLucian 0790b13
Dw models in parallel and sync output logs when side-reloading
RobertLucian 3af8e0e
Fixing bugs with the TFSModelLoader (for edge cases)
RobertLucian 4b805e7
Fixes for the caching mechanism for the TF pred
RobertLucian 42f4f54
Fixes to the caching mechanism for the TF pred
RobertLucian e3bd863
Raise exception when model is not loaded (caching + TF)
RobertLucian 7ab7cfa
Don't download a model if it has already been downloaded (caching + a…
RobertLucian 8817b3d
Further fixes for the caching mechanism
RobertLucian 39c6ea3
Fix removal process of stale models for the GC
RobertLucian 22723d6
Fix max version meth when caching enabled + others
RobertLucian 1917aba
Fixes for the model caching GC
RobertLucian c18654d
Reset global preference policy for accessing resources for the GC
RobertLucian 3db7150
Improve logging for all predictor types
RobertLucian 409902f
Return metadata for models when live-reloading
RobertLucian c124a84
Model metadata retrieving from serving container
RobertLucian 1dff2fc
Add model stats when cortex getting
RobertLucian eb05980
Show model stats for TF & live reloading when cx getting
RobertLucian 6ce653f
Add examples and refactor all other examples
RobertLucian b6c4875
Fix merge conflicts for 'master' into feature/multi-model-caching
RobertLucian ec83408
Code fixes + make lint
RobertLucian 95179c8
Fix FileTree function output
RobertLucian 40aefd9
Merge branch 'master' into feature/multi-model-caching
RobertLucian 1772344
Bunch of fixes
RobertLucian 77466c3
Bunch of fixes
RobertLucian 4456fcb
Core tweaks to the GC for the MMC
RobertLucian ba56336
Merge branch 'master' into feature/multi-model-caching
RobertLucian a568f40
Log message correction
RobertLucian 3db0030
Merge branch 'master' into feature/multi-model-caching
RobertLucian 42d5ba5
Add get_model method for the ONNX predictor & fix ONNX example
RobertLucian be7e03c
Fixes to the CLI & to the MMC for the TF pred
RobertLucian 245f8ef
Handle situations when TFS gets unresponsive when caching is enabled
RobertLucian 45d4b73
Fix local models not working when caching enabled and ONNX/Py predict…
RobertLucian 288e6f4
Fix cron for when local models & the Python/ONNX predictors are used
RobertLucian 21c7e77
Disable caching for BatchAPI kind
RobertLucian b611871
Merge branch 'master' into feature/multi-model-caching
RobertLucian 1297147
Fix batch issues (with the live reloading feature)
RobertLucian 0dadc4d
Wait for TFS to be responsive when live-reloading
RobertLucian e88c087
Fix merge conflicts from 'master' into feature/multi-model-caching
RobertLucian a71d69b
Fix model paths for multi-model-classifier example
RobertLucian 2917159
Multiple procs for TF pred w/ caching disabled
RobertLucian 8a27981
Switch prints with logs
RobertLucian 7748fba
Address PR comments
RobertLucian e9c0b4d
Merge branch 'master' into feature/multi-model-caching
RobertLucian bc06d06
Indentation for comment
RobertLucian dce2dd9
Merge branch 'master' into feature/multi-model-caching
RobertLucian e24c799
Merge branch 'master' into feature/multi-model-caching
RobertLucian 6fc8fff
Remove CORTEX_MODEL_SOURCE_TYPE, CORTEX_MODEL_CACHE_SIZE, CORTEX_MODE…
RobertLucian f46b169
Add documentation for live-reloading/model caching
RobertLucian b897669
Merge branch 'master' into feature/multi-model-caching
RobertLucian 4fa7b05
Fixing the multi-model endpoint guide
RobertLucian acd30c8
Semantic fixes to models page
RobertLucian bbecb31
Doc fixes and re-ordering
RobertLucian 33e48e5
Merge branch 'master' into feature/multi-model-caching
RobertLucian b2200e7
Address 1st round of comments
RobertLucian fa0bf09
Add 2nd round of requested changes
RobertLucian 2cad972
Make lint
RobertLucian ac87a4a
Replace highest with latest and remove highest tag
RobertLucian 3e129f5
3rd round of addressing comments
RobertLucian 0ca3a9c
Expose underlying error when validating models
RobertLucian 4cd15d0
4th round of addressing comments
RobertLucian 4d3246a
Address merge conflicts from 'master' into feature/multi-model-caching
RobertLucian 7814ade
A few nits
RobertLucian 6edbf08
Fix merge conflicts from 'master' into feature/multi-model-caching
RobertLucian 39d9996
Merge branch 'master' into feature/multi-model-caching
RobertLucian f1fe6fa
Fix a few bugs
RobertLucian d184fcf
Run the python init script as a service
RobertLucian bb0ee67
"versions" field not set to empty list, but to None
RobertLucian e190408
Fix a bug with the live-reloading for TFS
RobertLucian ec5ef24
Don't restart service if exit code = 0; don't kill other processes if…
RobertLucian 8cf7a65
Prevent script.py from exiting when a cron runs; prevent HandleReload…
RobertLucian d05ac78
Move while loop after the touch of init_script_run.txt
RobertLucian 85ada50
Layout fixes for the model validation in the CLI
RobertLucian c5b2ab9
Make lint
RobertLucian 0559d54
Fix bug for the MMC Python/ONNX when dir field is used
RobertLucian 035ffb1
Allow additional files for ONNX/TensorFlowNeuron models
RobertLucian 6c70e45
Fix merge conflicts from 'master' into feature/multi-model-caching
RobertLucian 8dd7fcb
Merge branch 'master' into feature/multi-model-caching
RobertLucian 1cb6f73
Fix error formatting + disallow the addition of extra files for ONNX …
RobertLucian b63a621
Merge branch 'master' into feature/multi-model-caching
RobertLucian 7658d2d
Misc updates
deliahu 7780e64
Update models.md
deliahu bee8d0a
Fix TensorFlow + Inferentia + 1 process not live reloading
RobertLucian 03b1cee
Prevent the crons from re-downloading the same model over and over again
RobertLucian 3b7004d
Fix models not getting reloaded if the timestamp is older
RobertLucian 101641f
Merge branch 'master' into feature/multi-model-caching
RobertLucian 1fe6b7f
Remove slipped comment
RobertLucian 27aadf2
Fix bug that would make the API think the TensorFlow Neuron predictor…
RobertLucian 8c33b76
Fix bug that would lead to un-loadable non-versioned models when usin…
RobertLucian 076deb2
Start batch after scrip_py service has run; otherwise the API fails
RobertLucian 8fd2d8f
Set log level to INFO
RobertLucian 010199a
Fix model paths in each cortex.yaml example
RobertLucian d3fc746
Update docs
deliahu 4e25b99
Fix model path for batch example
RobertLucian 7673297
Update models.md
deliahu 552038c
update docstring
deliahu 056f319
Merge branch 'master' of github.com:cortexlabs/cortex into feature/mu…
deliahu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
/* | ||
Copyright 2020 Cortex Labs, Inc. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package cmd | ||
|
||
const ( | ||
_timeFormat = "02 Jan 06 15:04:05 MST" | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.