Open
Description
We use kiwix_storagelib for implementing S3 based optimization cache in the scrapers. However, this gives rise to redundant code. We put a version of the file along with the optimizer version as the metadata always. So, this can be better implemented in scraperlib. For a start, we can have a caching module that can have 3 functions, (or maybe a class containing methods). The primary 3 things we need are -
- download_from_cache()
- upload_to_cache()
- check_credentials()
There can be several ways to have this, but it should at least fulfill the following -
- Compare optimizer_version
- Compare file_version
Optional things can be to check file upload date and discard if it's older than a specified amount of time. If we go for a class based approach, we can also explore possibilities to improve performance.