Skip to content

Fix typo #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 21, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,12 @@ Input data format <a name="input"></a>

The input file should list all completions in
*lexicographical* order.
For example, see the the file `test_data/trec05_efficiency_queries/trec05_efficiency_queries.completions`.
For example, see the the file `test_data/trec_05_efficiency_queries/trec_05_efficiency_queries.completions`.

The first column represent the
ID of the completion; the other columns contain the
tokens separated by white spaces.
(The IDs for the file `trec05_efficiency_queries.completions` are
(The IDs for the file `trec_05_efficiency_queries.completions` are
fake, i.e., they do not take into account any
particular assignment.)

Expand All @@ -119,49 +119,49 @@ preparing the datasets for indexing:

1. The command

$ extract_dict.py trec05_efficiency_queries/trec05_efficiency_queries.completions
$ extract_dict.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions

extract the dictionary
from a file listing all completions in textual form.

2. The command

$ python map_dataset.py trec05_efficiency_queries/trec05_efficiency_queries.completions
$ python map_dataset.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions

maps strings to integer ids.

3. The command

$ python build_stats.py trec05_efficiency_queries/trec05_efficiency_queries.completions.mapped
$ python build_stats.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions.mapped

calulcates the dataset statistics.

4. The command

$ python build_inverted_and_forward.py trec05_efficiency_queries/trec05_efficiency_queries.completions
$ python build_inverted_and_forward.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions

builds the inverted and forward files.

If you run the scripts in the reported order, you will get:

- `trec05_efficiency_queries.completions.dict`: lists all the distinct
- `trec_05_efficiency_queries.completions.dict`: lists all the distinct
tokens in the completions sorted in lexicographical
order.

- `trec05_efficiency_queries.completions.mapped`: lists all completions
- `trec_05_efficiency_queries.completions.mapped`: lists all completions
whose tokens have been mapped to integer ids
as assigned by a lexicographically-sorted
string dictionary (that should be built from the
tokens listed in `trec05_efficiency_queries.completions.dict`).
tokens listed in `trec_05_efficiency_queries.completions.dict`).
Each completion terminates with the id `0`.

- `trec05_efficiency_queries.completions.mapped.stats` contains some
- `trec_05_efficiency_queries.completions.mapped.stats` contains some
statistics about the datasets, needed to build
the data structures more efficiently.

- `trec05_efficiency_queries.completions.inverted` is the inverted file.

- `trec05_efficiency_queries.completions.forward` is the forward file. Note that each list is *not* sorted, thus the lists are the same as the ones contained in `trec05_efficiency_queries.completions.mapped` but sorted in docID order.
- `trec_05_efficiency_queries.completions.forward` is the forward file. Note that each list is *not* sorted, thus the lists are the same as the ones contained in `trec_05_efficiency_queries.completions.mapped` but sorted in docID order.

Benchmarks <a name="benchmarks"></a>
----------
Expand All @@ -174,4 +174,4 @@ Live demo <a name="demo"></a>
----------

Start the web server with the program `./web_server <port> <index_filename>` and access the demo at
`localhost:<port>`.
`localhost:<port>`.