Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Collection with lightweight _SearchIndexer. #667

Merged
merged 30 commits into from
Feb 28, 2022
Merged

Conversation

bdice
Copy link
Member

@bdice bdice commented Jan 30, 2022

Description

This PR (draft) minimizes the Collection class for the two use cases it has in signac's internals: detecting schema and finding job ids.

The Collection class is 1174 lines and the _SearchIndexer class is 273 lines (including a helper method that I inlined, which makes it appear larger).

To-do:

  • Measure performance
    • See benchmarks below. This PR gives a 20-30% speedup for small projects and 7-15% speedup for large projects.
  • Change index format from list of dicts with an _id key like [{"a": 0, "_id": "abcdef"}] to a dict like {"abcdef": {"a": 0}}. This simplifies several areas of the code and makes _SearchIndexer's semantics dict-like.
  • Expand schema/find tests to cover all filter use cases
  • Reduce _SearchIndexer class to bare minimum amount of code
  • 100% test coverage of _index.py
  • Evaluate whether to remove _SearchIndexer and use a plain dict with helper functions?
    • This is viable. It would mean removing the public methods find and build_index, and instead defining them as free functions that act on a dict with certain assumptions about structure. Do we like this idea?
  • Remove Collection from public API
    • Planning a follow-up PR.
  • Consider moving "filter" docs to a better place
    • Investigate this in a follow-up PR when removing Collection.

Motivation and Context

Resolves #664.

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

@bdice bdice changed the base branch from master to next January 30, 2022 22:55
@codecov
Copy link

codecov bot commented Jan 30, 2022

Codecov Report

Merging #667 (752cebc) into next (c1ee933) will increase coverage by 0.01%.
The diff coverage is 90.69%.

❗ Current head 752cebc differs from pull request most recent head 6187c01. Consider uploading reports for the commit 6187c01 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             next     #667      +/-   ##
==========================================
+ Coverage   86.06%   86.07%   +0.01%     
==========================================
  Files          51       52       +1     
  Lines        5038     5122      +84     
  Branches     1109     1131      +22     
==========================================
+ Hits         4336     4409      +73     
- Misses        498      515      +17     
+ Partials      204      198       -6     
Impacted Files Coverage Δ
signac/common/errors.py 100.00% <ø> (ø)
signac/contrib/__init__.py 100.00% <ø> (ø)
signac/contrib/errors.py 94.11% <ø> (ø)
signac/contrib/filterparse.py 88.04% <ø> (ø)
signac/contrib/migration/__init__.py 83.05% <ø> (ø)
signac/synced_collections/_caching.py 0.00% <0.00%> (ø)
signac/contrib/utility.py 66.21% <50.00%> (ø)
signac/contrib/import_export.py 77.36% <72.41%> (+0.05%) ⬆️
signac/sync.py 82.50% <75.00%> (ø)
signac/__main__.py 76.04% <77.77%> (-1.30%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1ee933...6187c01. Read the comment docs.

@bdice
Copy link
Member Author

bdice commented Feb 8, 2022

Benchmarks below! I only compared the "find/search" benchmarks below -- all the other benchmarks are basically unchanged between master, next, and this branch.

Overall I would say the time spent searching appears to have dropped by 20-30% for lean search filters over small projects, and by 7-15% for richer search filters over larger projects. Not bad.

edit: I added benchmarks for a newer commit, which are even better. More like 30-40% gains for some queries.

Commit master (dac370a) next (d26930f) optimize-collection (37b84f1) optimize-collection (f4a05d4)
time_search_lean_filter (100) 351 us 385 us 265 us 231 us
time_search_lean_filter (1,000) 5.82 ms 6.01 ms 4.10 ms 3.83 ms
time_search_lean_filter (10,000) 156 ms 156 ms 144 ms 141 ms
time_search_rich_filter (100) 1.43 ms 1.50 ms 1.24 ms 0.945 ms
time_search_rich_filter (1,000) 25.7 ms 25.8 ms 23.5 ms 14.7 ms
time_search_rich_filter (10,000) 281 ms 278 ms 259 ms 231 ms

@bdice bdice marked this pull request as ready for review February 8, 2022 05:03
@bdice bdice requested review from a team as code owners February 8, 2022 05:03
@bdice bdice requested review from vyasr, Charlottez112 and csadorf and removed request for a team February 8, 2022 05:03
@bdice bdice self-assigned this Feb 8, 2022
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

Since this is largely a copy of the full Collection class, is the plan to remove it in a subsequent step or are we keeping both? If we keep both I think we should make an attempt at using composition to avoid the code duplication.

_PRIMARY_KEY = "_id"


class _SlimCollection(dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for a slightly better name.

Note that the return value is often called index. I think we had a similar class name somewhere else, but all of the indexing stuff will be removed, right? The full collection could use this class internally (composition).

Suggested change
class _SlimCollection(dict):
class _Index(dict):

This would express a bit more directly what the difference between this and the Collection class is:

Suggested change
class _SlimCollection(dict):
class _InMemoryCollection(dict):

The full collection class could also be based on this, right? (inheritance)

Suggested change
class _SlimCollection(dict):
class _BaseCollection(dict):

Copy link
Member Author

@bdice bdice Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csadorf I appreciate these suggestions! I had a similar series of ideas -- I kept _SlimCollection for the initial round review partly because I knew the name should change but didn't want to commit to a name until reviewing its use cases and getting feedback like what you provided. I lean towards _Index, which should not conflict with any other internals now that we've removed other forms of indexing from the next branch. My only hesitation is that it has a method build_index, which returns a dict that is organized by the given key. I wonder if that method (or the class) should be named differently. Otherwise the result of _Index.build_index(key) is just a dict with a particular structure, while it sounds like it should be a class factory method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've renamed this class to _Index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer a more disambiguated name. While I agree that _Index is not currently conflicting with anything, I don't want to hobble our ability to use related names in the future. Otherwise we could run into the same issues that we now have in signac-flow where vague naming leads to ambiguity between concepts like groups, aggregates, bundles, etc. Having reviewed the code, since the critical operation is the ability to search the elements maybe it's something like a _SearchableIndex? I would definitely like to avoid the term Collection given a) the overloads associated with collections.abc (and now SyncedCollections) and b) we are no longer trying to specifically make this class behave so much like a MongoDB collection

Copy link
Member Author

@bdice bdice Feb 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a more disambiguated name would be nice, but I was unable to come up with one that felt right. I am not opposed to _SearchableIndex but would like some consensus. I suppose something like _SearchableKeystore could also be used to avoid the term index as I previously mentioned the awkwardness of build_index returning something that is not related to the _Index class. See also: https://en.wikipedia.org/wiki/Search_data_structure

@bdice
Copy link
Member Author

bdice commented Feb 8, 2022

is the plan to remove it in a subsequent step or are we keeping both? If we keep both I think we should make an attempt at using composition to avoid the code duplication.

As I mentioned in our dev meeting today, my plan is to remove the Collection class entirely in a follow-up PR. Because I changed the class's internal data structures, it would probably be awkward to try and retain the old class while using composition or inheritance to reduce duplication.

@bdice bdice changed the title Replace Collection with lightweight _SlimCollection Replace Collection with lightweight _Index. Feb 8, 2022
@vyasr
Copy link
Contributor

vyasr commented Feb 9, 2022

I will get around to reviewing this soon, but in the meantime @bdice do you still think

Evaluate whether to remove _Index and use a plain dict with helper functions?

is viable and worth pursuing? What would be the pros and cons of doing that vs the current approach? I'm all for using the most stripped down solution that we can in 2.0. The other outstanding items in the PR description's checklist can definitely be addressed in a follow-up PR, but if there's a chance that we're going to end up removing this new class again because there's a simpler alternative it would be good to figure that out now.

@bdice
Copy link
Member Author

bdice commented Feb 9, 2022

@vyasr dict-only with free functions for find/build_index is viable. “Worth pursuing” is up for debate. Pro: no class = simpler? Con: pure dict inputs must have a particular structure that has no centralized home for documentation without a class. The new class inherits directly from dict, so the semantics of the class would not change significantly if we switched to using free functions. There is no internal state besides the data in the index.

@csadorf
Copy link
Contributor

csadorf commented Feb 9, 2022

There is no internal state besides the data in the index.

But would those indexes not be the best argument for a dedicated class to maintain that state? Building those indexes are the primary purpose of the class, no? Also, keep in mind that the Collection class was originally supposed to be a near drop-in for pymongo.Collection, that would certainly be lost at that point.

@bdice
Copy link
Member Author

bdice commented Feb 9, 2022

But would those indexes not be the best argument for a dedicated class to maintain that state?

The Collection class had support for building indices and tracking those internally in the class state. However, signac's use of the Collection was limited to two cases where the Collection was constructed, searched, and then went out of scope. As a result, the cached indices of the Collection class were never used more than once, and thus it made no sense to cache them. The logic for determining whether the index was dirty and needed to be rebuilt was unnecessary because the index was only queried once, and thus it was always "dirty" before that first query.

For simplicity, I removed key-based indices from the internal state of the _Index object. The _build_job_statepoint_index function has been refactored in this PR to store key-based indices locally in that function, rather than in the internal state of the _Index object:

indexes = {}
for _id in index.find():
doc = index[_id]
for key, _ in _nested_dicts_to_dotted_keys(doc):
if key.split(".")[0] == "sp":
indexes[key] = index.build_index(key)

For us to see any kind of benefit from re-using the key-based indices, we would need to heavily refactor the Project to use a Collection-like or _Index-like object instead of a plain dict for the statepoint cache so that the project can access those indices multiple times. We'd also have to track the "dirty" state whenever jobs are added or removed. The logic would be somewhat complicated and I'm not convinced that is worth the effort, since it would only benefit "find" and "schema" operations, which are generally not our bottlenecks.

@bdice bdice force-pushed the optimize-collection branch from 047333f to c485021 Compare February 11, 2022 23:39
@bdice bdice requested a review from csadorf February 11, 2022 23:41
@bdice
Copy link
Member Author

bdice commented Feb 11, 2022

@csadorf @vyasr This is ready for review. I would like to keep the current PR scope and hold off on switching to a pure dict with free functions for the build/find features. If you think that using a plain dict (rather than a subclass) is an improvement, please say so, but I would like to do that in a follow-up PR rather than this one.

@bdice bdice added enhancement New feature or request refactor Code refactoring labels Feb 11, 2022
@bdice bdice force-pushed the optimize-collection branch from 752cebc to 6187c01 Compare February 28, 2022 03:47
@bdice bdice enabled auto-merge (squash) February 28, 2022 04:04
@bdice bdice disabled auto-merge February 28, 2022 04:04
@bdice bdice changed the title Replace Collection with lightweight _Index. Replace Collection with lightweight _SearchIndexer. Feb 28, 2022
@bdice bdice enabled auto-merge (squash) February 28, 2022 04:06
@bdice bdice disabled auto-merge February 28, 2022 04:10
@bdice bdice merged commit 690457d into next Feb 28, 2022
@bdice bdice deleted the optimize-collection branch February 28, 2022 04:10
vyasr added a commit that referenced this pull request Mar 14, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
vyasr added a commit that referenced this pull request Apr 19, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
vyasr added a commit that referenced this pull request Apr 21, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
vyasr added a commit that referenced this pull request May 2, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
bdice added a commit that referenced this pull request Jun 14, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
bdice added a commit that referenced this pull request Aug 1, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
bdice added a commit that referenced this pull request Oct 7, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
bdice added a commit that referenced this pull request Oct 27, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
vyasr added a commit that referenced this pull request Oct 30, 2022
* Add optimized _SlimCollection.

* Further minimize _SlimCollection.

* Remove unused methods from _SlimCollection.

* Make _SlimCollection a dict.

* Add tests.

* Add test for invalid filter.

* Change _SlimCollection to dict initialization semantics, remove _id from document in favor of being the dict key.

* Update _SlimCollection tests to use dict semantics.

* Improve docs, move build_index to method.

* Update error handling.

* Add test coverage to Collection for invalid filter.

* Remove handling of unexpected errors.

This exception handling was introduced in bfb7a27 but its purpose is not clear.

* Remove if statement that is impossible to reach.

* Remove optimization note about searching by primary key.

* Add early exit for logical expressions.

* Rename _SlimCollection to _Index.

* Update docs.

* Update docs.

* Refactoring/PR comments.

* More refactoring/PR comments.

* Simplify error check.

* Use tighter logic in build_index.

* Fix test name.

* Update signac/contrib/_index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Use copy and pop optimization.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

* Revise documentation.

* Use isinstance when True is expected for micro-optimization.

* Remove extraneous helper function and unnecessary data copy operation.

* Apply suggestions from code review

* Rename to _SearchIndexer.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactor Code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Collection for internal use in Project
3 participants