Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new interface to NumericDocValues to close #370 #371

Closed

Conversation

theolivenbaum
Copy link
Contributor

As mentioned in #370, we measured a significant number of allocations when reading values from the cache, due to a variable capture inside a lambda and a new FieldCache.T instance being created.

Adding an interface to the class NumericDocValues for each of the supported types instead of using the lambda, and adding a cached FieldCache.T for each type on the NumericDocValues class that is created only on first use.

@NightOwl888
Copy link
Contributor

Thanks. I finally had a chance to take a close look at this. Since adding the delegate was a recent change in 34758c1 (pull #348), I ran benchmarks both on reverting that commit vs the changes in this PR (both rebased against master).

Benchmarks

IndexFiles

Click to expand
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.403
  [Host]                                : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00005                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00006                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00007                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00008                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00009                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00010                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00011                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00012                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-wip00_before_GH-370             : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-wip01_revertdelegates_GH-370    : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-wip02_refactorfieldcache_GH-370 : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

InvocationCount=1  IterationCount=15  LaunchCount=2  
UnrollFactor=1  WarmupCount=10  
Method Job NuGetReferences Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
IndexFiles 4.8.0-beta00005 Lucene.Net.Analysis.Common 4.8.0-beta00005 687.7 ms 12.01 ms 16.44 ms 43000.0000 8000.0000 7000.0000 220.89 MB
IndexFiles 4.8.0-beta00006 Lucene.Net.Analysis.Common 4.8.0-beta00006 683.8 ms 15.25 ms 21.88 ms 43000.0000 8000.0000 7000.0000 220.94 MB
IndexFiles 4.8.0-beta00007 Lucene.Net.Analysis.Common 4.8.0-beta00007 680.3 ms 13.02 ms 19.49 ms 44000.0000 8000.0000 7000.0000 221 MB
IndexFiles 4.8.0-beta00008 Lucene.Net.Analysis.Common 4.8.0-beta00008 672.6 ms 13.61 ms 18.64 ms 44000.0000 8000.0000 7000.0000 221.39 MB
IndexFiles 4.8.0-beta00009 Lucene.Net.Analysis.Common 4.8.0-beta00009 704.5 ms 35.29 ms 51.73 ms 44000.0000 8000.0000 7000.0000 221.28 MB
IndexFiles 4.8.0-beta00010 Lucene.Net.Analysis.Common 4.8.0-beta00010 674.4 ms 11.88 ms 17.41 ms 44000.0000 8000.0000 7000.0000 221.23 MB
IndexFiles 4.8.0-beta00011 Lucene.Net.Analysis.Common 4.8.0-beta00011 675.7 ms 13.38 ms 19.61 ms 44000.0000 8000.0000 7000.0000 221.29 MB
IndexFiles 4.8.0-beta00012 Lucene.Net.Analysis.Common 4.8.0-beta00012 701.7 ms 9.53 ms 13.05 ms 56000.0000 7000.0000 6000.0000 287 MB
IndexFiles 4.8.0-wip00_before_GH-370 Lucene.Net.Analysis.Common 4.8.0-ci0000002208 673.6 ms 7.85 ms 11.00 ms 43000.0000 8000.0000 7000.0000 220.03 MB
IndexFiles 4.8.0-wip01_revertdelegates_GH-370 Lucene.Net.Analysis.Common 4.8.0-ci0000002212 673.7 ms 20.74 ms 29.08 ms 44000.0000 8000.0000 7000.0000 220.16 MB
IndexFiles 4.8.0-wip02_refactorfieldcache_GH-370 Lucene.Net.Analysis.Common 4.8.0-ci0000002213 671.3 ms 15.92 ms 22.31 ms 44000.0000 8000.0000 7000.0000 220.17 MB

SearchFiles

Click to expand
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.403
  [Host]                                : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00005                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00006                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00007                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00008                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00009                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00010                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00011                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-beta00012                       : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-wip00_before_GH-370             : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-wip01_revertdelegates_GH-370    : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  4.8.0-wip02_refactorfieldcache_GH-370 : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

IterationCount=15  LaunchCount=2  WarmupCount=10  
Method Job NuGetReferences Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
SearchFiles 4.8.0-beta00005 Lucene.Net.Analysis.Common 4.8.0-beta00005,Lucene.Net.QueryParser 4.8.0-beta00005 142.74 ms 3.010 ms 4.317 ms 8750.0000 500.0000 - 41.39 MB
SearchFiles 4.8.0-beta00006 Lucene.Net.Analysis.Common 4.8.0-beta00006,Lucene.Net.QueryParser 4.8.0-beta00006 145.19 ms 4.234 ms 6.206 ms 8750.0000 500.0000 - 41.37 MB
SearchFiles 4.8.0-beta00007 Lucene.Net.Analysis.Common 4.8.0-beta00007,Lucene.Net.QueryParser 4.8.0-beta00007 145.13 ms 3.663 ms 5.253 ms 8750.0000 250.0000 - 41.26 MB
SearchFiles 4.8.0-beta00008 Lucene.Net.Analysis.Common 4.8.0-beta00008,Lucene.Net.QueryParser 4.8.0-beta00008 93.55 ms 9.737 ms 14.273 ms 8666.6667 500.0000 - 40.33 MB
SearchFiles 4.8.0-beta00009 Lucene.Net.Analysis.Common 4.8.0-beta00009,Lucene.Net.QueryParser 4.8.0-beta00009 86.96 ms 2.683 ms 3.848 ms 8666.6667 500.0000 - 40.33 MB
SearchFiles 4.8.0-beta00010 Lucene.Net.Analysis.Common 4.8.0-beta00010,Lucene.Net.QueryParser 4.8.0-beta00010 93.86 ms 2.843 ms 4.256 ms 8600.0000 - - 40.2 MB
SearchFiles 4.8.0-beta00011 Lucene.Net.Analysis.Common 4.8.0-beta00011,Lucene.Net.QueryParser 4.8.0-beta00011 93.08 ms 3.083 ms 4.518 ms 8666.6667 500.0000 - 40.19 MB
SearchFiles 4.8.0-beta00012 Lucene.Net.Analysis.Common 4.8.0-beta00012,Lucene.Net.QueryParser 4.8.0-beta00012 97.00 ms 5.825 ms 8.719 ms 8833.3333 333.3333 - 40.81 MB
SearchFiles 4.8.0-wip00_before_GH-370 Lucene.Net.Analysis.Common 4.8.0-ci0000002208,Lucene.Net.QueryParser 4.8.0-ci0000002208 86.74 ms 0.621 ms 0.891 ms 7000.0000 333.3333 - 33.17 MB
SearchFiles 4.8.0-wip01_revertdelegates_GH-370 Lucene.Net.Analysis.Common 4.8.0-ci0000002212,Lucene.Net.QueryParser 4.8.0-ci0000002212 85.16 ms 1.542 ms 2.307 ms 7000.0000 333.3333 - 33.17 MB
SearchFiles 4.8.0-wip02_refactorfieldcache_GH-370 Lucene.Net.Analysis.Common 4.8.0-ci0000002213,Lucene.Net.QueryParser 4.8.0-ci0000002213 84.36 ms 1.144 ms 1.677 ms 7000.0000 333.3333 - 33.17 MB

While the results do show a minor performance gain, the margin of error accounts for nearly all of it. Unlike removing other delegates in #383, we are not seeing a significant drop in memory allocations when doing "basic" benchmarks.

Failing Tests

This PR has introduced 5 new failing tests.

image

You can view the details of these failures at https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=1170&view=ms.vss-test-web.build-test-results-tab

Double-caching

While I don't know for a fact that this is a problem, the comments indicate these values are already cached.

                // Not cached here by FieldCacheImpl (cached instead
                // per-thread by SegmentReader):

Digging into SegmentReader, there are 2 caches both of which use ThreadLocal<T>. I suspect that one of those caches is what the comment refers to. Putting these instances into a static variable and reusing them across threads seems like it might introduce a thread safety problem.

Summary

Basically, our three options boil down to:

  1. Leave FieldCacheImpl as-is
  2. Revert 34758c1
  3. Fix the failing tests and keep this design change

Changing to use delegates rather than using concrete subclasses was done deliberately to reduce the number of classes to maintain. So, the choice with reverting 34758c1 is raw performance vs maintainability. I am on the fence, but since the gains of reverting it are marginal at best, I am leaning toward leaving it as-is.

Changing the design to this PR means diverging from the Lucene source. It will be slightly more of a maintenance burden to upgrade to the next version of Lucene. While this is something that can be seriously considered if the performance gains are significant enough, there doesn't seem to be enough of a gain here to warrant the design change (not to mention the failing tests). These benchmarks are very narrowly scoped, though. If there is another metric that can be benchmarked that shows a significant gain, then we can reconsider.

That being said, don't be discouraged by this. While we haven't been able to use very many of your commits so far, the fact that you are identifying new issues that were previously unknown makes your contributions to Lucene.NET valuable. We have already improved search performance by around 10% over 4.8.0-beta00012 thanks in part to your help.

@theolivenbaum
Copy link
Contributor Author

Hi @NightOwl888 - sorry for the long time since my last answer - but I was off last week on long due holidays :)

I think this change is significant for our usage of Lucene, but probably is not captured by the benchmark as they only enumerate some results.

On our use-case, we often need to enumerate all hits up to a limit (usually hundreds of thousands) - and read the value of an identifier field to match with our data for further processing - which is what causes the large number of calls to this method. I can try to push a new benchmark to reflect that, but measuring locally running our code directly referencing the Lucene projects instead of the nuget package gave us a significant reduction in allocations.

@theolivenbaum
Copy link
Contributor Author

BTW - no worries reg. the comments - happy to keep contributing on my spare time as possible - will do some catch up on the feedback on the other PRs to see what I missed since I was off!

@NightOwl888
Copy link
Contributor

@theolivenbaum

I just submitted PR #389 and noticed while I was writing it up that you replied here. I am working on getting a release vote started in the next few days, so I put together a PR that reverts the commit 34758c1. Rolling back the change for the release is the safest route, because it has already been thoroughly tested.

measuring locally running our code directly referencing the Lucene projects instead of the nuget package gave us a significant reduction in allocations.

Good to hear. That boxing issue was the main culprit - it reduced RAM consumption by around 15%. But changing to use generic parameters on Debugging.Assert also helped a lot. On the testing side things slowed down by about 15% when running tests locally, but on Azure DevOps the tests are averaging about 10% faster.

@NightOwl888
Copy link
Contributor

Fixed in #389

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants