Skip to content

New results #399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 29, 2023
Merged

New results #399

merged 4 commits into from
Apr 29, 2023

Conversation

erikbern
Copy link
Owner

@erikbern erikbern commented Apr 27, 2023

See new plots

@erikbern erikbern changed the title New resutls New results Apr 27, 2023
@erikbern erikbern mentioned this pull request Apr 27, 2023
@erikbern
Copy link
Owner Author

Will add a couple more datasets to this shortly!

@thomasahle
Copy link
Contributor

Oh no, looks like fast_pq/tinyknn regressed substantially since #363 (comment) and is completely broken on mnist.
Are these new runs final? Or do we still have time to fix bugs?

@erikbern
Copy link
Owner Author

Oh no, looks like fast_pq/tinyknn regressed substantially since #363 (comment) and is completely broken on mnist.
Are these new runs final? Or do we still have time to fix bugs?

If you fix it in the next few hours, then I can re-run it. But I want to tear down the r6.16xlarge instance soon (it's ~$100/day)

@erikbern erikbern mentioned this pull request Apr 27, 2023
@thomasahle
Copy link
Contributor

Is the run actually using the latest ann-benchmarks code? It seems the string fast_pq should have been replaced in the repo

@erikbern
Copy link
Owner Author

Is the run actually using the latest ann-benchmarks code? It seems the string fast_pq should have been replaced in the repo

I just realize I haven't re-run install.py in the last few days. Let me wipe the fast_pq data and re-run all benchmarks (will also wipe opensearchknn)

@thomasahle
Copy link
Contributor

Oh no, looks like fast_pq/tinyknn regressed substantially since #363 (comment) and is completely broken on mnist.
Are these new runs final? Or do we still have time to fix bugs?

If you fix it in the next few hours, then I can re-run it. But I want to tear down the r6.16xlarge instance soon (it's ~$100/day)

I found the problem. Just testing my fix now. Will push it asap.

@thomasahle
Copy link
Contributor

thomasahle commented Apr 27, 2023

It's pushed. Building the Docker image now should git clone the newest version.

@erikbern
Copy link
Owner Author

Pushed latest results. Going to run gist-960-euclidean as well just to have a high dimensional dataset that's a bit larger. It will probably take 10h.

@WPJiang
Copy link

WPJiang commented Apr 28, 2023

Pushed latest results. Going to run gist-960-euclidean as well just to have a high dimensional dataset that's a bit larger. It will probably take 10h.

hi,@erikbern there is only one line -tinyknn in the new result of glove25, would you please check it?

@erikbern
Copy link
Owner Author

hi,@erikbern there is only one line -tinyknn in the new result of glove25, would you please check it?

You're right, I accidentally ran glove-25 only for tinyknn. Let me run it for all algos.

@erikbern erikbern mentioned this pull request Apr 28, 2023
@thomasahle
Copy link
Contributor

It's strange. This is what I get when I spin up an r6i.8xlarge myself and run tinyknn on sift. Maybe I pushed the update too late, after you had already rerun install.py? Or maybe there's some other issue I can't think of...
sift-128-euclidean

@erikbern
Copy link
Owner Author

It's possible the image didn't rebuild because of Docker layer caching. I didn't check closely. I'll rerun soon though.

@thomasahle
Copy link
Contributor

If the docker image times out during querying, is the data gathered still saved? Or is saving the data part of the job being run inside the docker container, so it gets lost? In the later case I can try to reduce the number of query_args to prevent timeouts.
It seems to me that the build-args/groups each have their own 7200s timeout, so I reducing them probably won't make a difference?

@erikbern
Copy link
Owner Author

the process running inside the container saves the data to data that's mounted into the container from the outside – so if the container is killed after 2h, any data saved up to that point is kept

generally the 2h timeout happens during index building though, not afterwards during query processing. that step is usually pretty fast.

@WPJiang
Copy link

WPJiang commented Apr 28, 2023

It's possible the image didn't rebuild because of Docker layer caching. I didn't check closely. I'll rerun soon though.
I think it's possible because that the result file of glove25 already exists when you run, which can be checked by the file date. The following code in main.py will cause the execution of this configuration to be skipped.
for query_arguments in query_argument_groups: fn = get_result_filename(args.dataset, args.count, definition, query_arguments, args.batch) if args.force or not os.path.exists(fn): not_yet_run.append(query_arguments)

@thomasahle
Copy link
Contributor

generally the 2h timeout happens during index building though

It seems some of the last args were taking more than half an hour, so I trimmed the list down. Now none of the rounds should take more than a minute or two.
#406

@erikbern
Copy link
Owner Author

Merging this for now, but I'm planning to run this again in just a week or two. Will also polish the graphs a bit. But I don't want perfection to get in the way of getting something updated out. Won't promote this for now.

@erikbern erikbern merged commit 3d1b2e8 into main Apr 29, 2023
@erikbern erikbern deleted the erikbern/new-results branch April 29, 2023 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants