You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Out of 4.88M package downloads in the last 6 weeks, the operation is unknown for 4.53M of them.
We could reconsider storing this data when revisiting the design of the NuGet statistics feature, and if the v3 protocol somehow allows collecting this data.
Note
Even when we no longer import these datapoints into the stats database, we still have all the data in the w3c-logs we upload to blob storage.
Impact on database
A database size reduction could be achieved totalling 2,585.90 MB (or 2.53 GB)!
Removing these indices will also improve performance of insert/delete operations against the [dbo].[Fact_Download] and [dbo].[Fact_Dist_Download] tables.
In addition, we generate less churn on the data by avoiding clean-up operations.
The following database objects would become obsolete:
Non-clustered index [dbo].[Dimension_Operation].[Dimension_Operation_NCI_Operation] (size = 16 KB)
The following stored procedure would be modified:
[dbo].[DownloadReportRecentPopularityDetailByPackage] would no longer need to join with the [dbo].[Dimension_Operation] table.
The following database objects would decrease in size:
Non-clustered index [dbo].[Fact_Download].[Fact_Download_NCI_Package_Id] would no longer need to include the Dimension_Operation_Id column.
Impact on jobs
The Stats.CreateAzureCdnWarehouseReports job would be modified to no longer process the Operation dimension data, and likely gain faster throughput (less data, less table-joins, smaller blobs to upload to storage).
The Stats.ImportAzureCdnStatistics job would no longer need to store the Operation dimension data, which would improve import speed, throughput, and overall performance of the job.
We could:
avoid the need to check whether an Operation already exists, and if not, create it during import;
avoid the need to retain already imported Operation dimensions into in-memory cache;
reduce complexity of import by no more linking Operation dimension identifiers to the download facts.
Impact on gallery
Remove the Operation pivot from the UI
Stats-report blobs would no longer include operation dimension data (and thus be smaller + faster to consume)
The text was updated successfully, but these errors were encountered:
There is no reason to do this work, since after we reduce the raw data in the DB to 43 days from 90 days, the DB will be small enough to get us to the stats redesign work.
Is this really still needed? The v3 client does not emit this data using the
NuGet-Operation
HTTP header. See NuGet/Home#3708Check this example: https://www.nuget.org/stats/packages/Newtonsoft.Json?groupby=Operation
Out of 4.88M package downloads in the last 6 weeks, the operation is unknown for 4.53M of them.
We could reconsider storing this data when revisiting the design of the NuGet statistics feature, and if the v3 protocol somehow allows collecting this data.
Impact on database
A database size reduction could be achieved totalling 2,585.90 MB (or 2.53 GB)!
Removing these indices will also improve performance of insert/delete operations against the
[dbo].[Fact_Download]
and[dbo].[Fact_Dist_Download]
tables.In addition, we generate less churn on the data by avoiding clean-up operations.
The following database objects would become obsolete:
[dbo].[Dimension_Operation]
(size = 0.14 MB)[dbo].[Fact_Download].[Dimension_Operation_Id]
(size = 2,585.75 MB)[dbo].[EnsureOperationDimensionsExist]
[dbo].[Dimension_Operation].[Dimension_Operation_NCI_Operation]
(size = 16 KB)The following stored procedure would be modified:
[dbo].[DownloadReportRecentPopularityDetailByPackage]
would no longer need to join with the[dbo].[Dimension_Operation]
table.The following database objects would decrease in size:
[dbo].[Fact_Download].[Fact_Download_NCI_Package_Id]
would no longer need to include theDimension_Operation_Id
column.Impact on jobs
The
Stats.CreateAzureCdnWarehouseReports
job would be modified to no longer process the Operation dimension data, and likely gain faster throughput (less data, less table-joins, smaller blobs to upload to storage).The
Stats.ImportAzureCdnStatistics
job would no longer need to store the Operation dimension data, which would improve import speed, throughput, and overall performance of the job.We could:
Impact on gallery
The text was updated successfully, but these errors were encountered: