Description
For a given time period, we calculate metrics, such as number reviewed, unreviewed, confirmed, false detection, and generate a list of comments & tags (see /api/metrics/system on swagger page). When timeframe is set to "all", all data is pulled from the system. This takes ~30 sec to load.
Several possible options outlined below.
Option 1: evaluate different data stores
Cosmos DB is not built for pulling cross-partition or for pulling huge amounts of data. Alternative services such as Azure SQL can be evaluated.
Option 2: tweaking with IQueryable
IQueryable utilizes lazy evaluation, which can include pushing execution to database engine. The full list of data is fetched at below point and processed in memory:
Note the ToList
call forces immediate evaluation. We may need some C# and Cosmos DB experts to evaluate whether forcing further lazy evaluation would improve performance. Be wary of potentially higher RU usage.
Option 3: track running count
A new service can track running count for metrics as detection candidates are flagged or moderators act on candidates. The moderator portal would then only need to fetch the count instead of all data.
A new container should be added to Cosmos DB and will hold either a single document indicating running count or multiple documents partitioned on time range. A serverless solution (such as Azure Functions) can trigger on the original "metadata" collection, fetch prior count from document in new collection, update the running counts as appropriate, and persist by overwriting the original count document in the new collection.