-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zoekt-webserver: add JSON API #414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keegancsmith if this looks generally acceptable to you, I can write actual test cases. I have done simple tests locally on my zoekt checkout:
λ echo -n '{"Q": "foobar"}' | xh post localhost:6070/api/search | jq -r .Result.Files[0].LineMatches[0].Line | base64 -d
in := "[a-zA-Z]fooBAR"⏎
rpc/internal/srv/srv.go
Outdated
// We need a different type for JSON requests because the query needs to be | ||
// parsed. | ||
type JSONListArgs struct { | ||
Q string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not downcased these (e.g. Q
-> q
) with JSON tags because it'd be inconsistent for these to be downcased but zoekt.ListOptions
or zoekt.SearchOptions
not to be (and so on, recursively). I can of course downcase all of that if that's the right thing to do.
web/server.go
Outdated
@@ -172,7 +172,8 @@ func NewMux(s *Server) (*http.ServeMux, error) { | |||
mux.HandleFunc("/print", s.servePrint) | |||
} | |||
if s.RPC { | |||
mux.Handle(rpc.DefaultRPCPath, rpc.Server(traceAwareSearcher{s.Searcher})) // /rpc | |||
mux.Handle(rpc.DefaultRPCPath, rpc.Server(traceAwareSearcher{s.Searcher})) // /rpc | |||
mux.Handle("/api/", http.StripPrefix("/api", rpc.JSONServer(traceAwareSearcher{s.Searcher}))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding on URL paths welcome.
Any thoughts on this PR? I'd ping the "Seach core" team @keegancsmith assigned but it seems like it's private or something else prevents me from doing so 🌞. |
I am going to take a look. I had a brief look already but I won't be able to finish my review today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I think the way to go is to extend serveSearch
rather than creating a separate API. For example, we could change serveSearchErr
to return zoekt.SearchResult
instead of ApiSearchResult
and, in serveSearch
, switch based on the requested format. Does this make sense?
I imagine as part of this refactor you have to pull out a converter zoekt.SearchResult -> ApiSearchResult
from serveSearchErr
and call it in serveSearch
.
To maintain backward compatibility we would have to keep the existing "json" as is (we know there are users relying on this) and add another format, EG "json2", "json_new" (I obviously have no imagination when it comes to names :-)).
WDYT?
@stefanhengl this is possible to do, I think, but there are some questions to be answered. I didn't implement this in For example, it takes an entirely different behavior if it thinks you're only querying for repos: Lines 245 to 264 in 2478bf5
There's no such behavior in the gob API. It goes from there to constrain or modify the request in ways that seem tightly fitted toward satisfying the HTML interface of the webserver. It really does feel like the existing JSON API was bolted onto this function after the fact. I'm aiming for a clean break, which is why I have straightforwardly exposed the request and response API objects in this PR. Additionally, you described how we might refactor to vary the response format, but what do we do about the request format? Are you suggesting to not expose Consider that |
Ok. I suggest to create a separate package As you have seen, we have two other API routes, |
I've moved the JSON endpoint into its own package as suggested. I also added a test mirroring that in rpc. The only functional change is that I've relaxed the /list endpoint to allow a missing query, such that all repos will be returned. I think it makes sense to want to list all repos this way, though the equivalent in /search remains nonsense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks so much. I'll let @stefanhengl take a look over before merging in.
return | ||
} | ||
if searchArgs.Opts == nil { | ||
searchArgs.Opts = &zoekt.SearchOptions{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: in the webserver handler it does some adjustments to SearchOptions based on the number of results it expects to find which is pretty neat. This may be useful behaviour to have in this handler if opts is not set:
Lines 290 to 316 in d1964a3
ctx := r.Context() | |
if result, err := s.Searcher.Search(ctx, q, &zoekt.SearchOptions{EstimateDocCount: true}); err != nil { | |
return nil, err | |
} else if numdocs := result.ShardFilesConsidered; numdocs > 10000 { | |
// If the search touches many shards and many files, we | |
// have to limit the number of matches. This setting | |
// is based on the number of documents eligible after | |
// considering reponames, so large repos (both | |
// android, chromium are about 500k files) aren't | |
// covered fairly. | |
// 10k docs, 50 num -> max match = (250 + 250 / 10) | |
sOpts.ShardMaxMatchCount = num*5 + (5*num)/(numdocs/1000) | |
// 10k docs, 50 num -> max important match = 4 | |
sOpts.ShardMaxImportantMatch = num/20 + num/(numdocs/500) | |
} else { | |
// Virtually no limits for a small corpus; important | |
// matches are just as expensive as normal matches. | |
n := numdocs + num*100 | |
sOpts.ShardMaxImportantMatch = n | |
sOpts.ShardMaxMatchCount = n | |
sOpts.TotalMaxMatchCount = n | |
sOpts.TotalMaxImportantMatch = n | |
} | |
sOpts.MaxDocDisplayCount = num | |
sOpts.DebugScore = debugScore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly imitated what rpc does (or does not) do when implementing this endpoint, instead of imitating the webserver. Am I correct in saying that rpc does not do this? Does sourcegraph (as clients of rpc/stream) implement something similar, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup RPC does not do this, so on sourcegraph's side we implement that behaviour. So doing it will improve the ergonomics for the client of this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted a shared function that's used by both the new JSON endpoint and the webserver. I put it in the new json package, but if you think there's a better place, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This adds a JSON API that directly exposes the Search and List functions at /api/search and /api/list in zoekt-webserveer, gated behind the existing -rpc flag. Closes sourcegraph#410.
Thanks to you both! |
This adds a JSON API that directly exposes the Search and List functions
at /api/search and /api/list in zoekt-webserveer, gated behind the
existing -rpc flag.
Closes #410.