Skip to content

Searching within a subset of documents #227

Answered by detaos
agmontpetit asked this question in Q&A
Discussion options

You must be logged in to vote

The first part is easy. The passages you pass to the indexer either already have them from the TSV or are given a sequential ID.

TSV Format we use:
[ID] \t [Passage Text] \t [Passage title / other meta data]

So, you don't get the passage IDs from the indexer. You give the passage IDs to the indexer. That way you know them.

The second part is best done with filtering. I'm assuming you have some way to do the subsetting. We do that with page metadata. Here's the search line to pass a filtering function:

        results = searcher.search(query.query, k=query.k, filter_fn=lambda pids: torch.tensor(
            [index for index in pids.numpy().tolist() if keepResult(query.conditions, index)], …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@agmontpetit
Comment options

Answer selected by agmontpetit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants