Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zoekt-webserver: add JSON API #414

Merged
merged 1 commit into from
Sep 9, 2022
Merged

Conversation

isker
Copy link
Contributor

@isker isker commented Aug 25, 2022

This adds a JSON API that directly exposes the Search and List functions
at /api/search and /api/list in zoekt-webserveer, gated behind the
existing -rpc flag.

Closes #410.

Copy link
Contributor Author

@isker isker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keegancsmith if this looks generally acceptable to you, I can write actual test cases. I have done simple tests locally on my zoekt checkout:

λ echo -n '{"Q": "foobar"}' | xh post localhost:6070/api/search | jq -r .Result.Files[0].LineMatches[0].Line | base64 -d
	in := "[a-zA-Z]fooBAR"⏎   

rpc/internal/srv/srv.go Outdated Show resolved Hide resolved
// We need a different type for JSON requests because the query needs to be
// parsed.
type JSONListArgs struct {
Q string
Copy link
Contributor Author

@isker isker Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not downcased these (e.g. Q -> q) with JSON tags because it'd be inconsistent for these to be downcased but zoekt.ListOptions or zoekt.SearchOptions not to be (and so on, recursively). I can of course downcase all of that if that's the right thing to do.

web/server.go Outdated
@@ -172,7 +172,8 @@ func NewMux(s *Server) (*http.ServeMux, error) {
mux.HandleFunc("/print", s.servePrint)
}
if s.RPC {
mux.Handle(rpc.DefaultRPCPath, rpc.Server(traceAwareSearcher{s.Searcher})) // /rpc
mux.Handle(rpc.DefaultRPCPath, rpc.Server(traceAwareSearcher{s.Searcher})) // /rpc
mux.Handle("/api/", http.StripPrefix("/api", rpc.JSONServer(traceAwareSearcher{s.Searcher})))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding on URL paths welcome.

@keegancsmith keegancsmith requested a review from a team August 26, 2022 07:12
@isker
Copy link
Contributor Author

isker commented Aug 31, 2022

Any thoughts on this PR? I'd ping the "Seach core" team @keegancsmith assigned but it seems like it's private or something else prevents me from doing so 🌞.

@stefanhengl
Copy link
Member

I am going to take a look. I had a brief look already but I won't be able to finish my review today.

Copy link
Member

@stefanhengl stefanhengl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I think the way to go is to extend serveSearch rather than creating a separate API. For example, we could change serveSearchErr to return zoekt.SearchResult instead of ApiSearchResult and, in serveSearch, switch based on the requested format. Does this make sense?

I imagine as part of this refactor you have to pull out a converter zoekt.SearchResult -> ApiSearchResult from serveSearchErr and call it in serveSearch.

To maintain backward compatibility we would have to keep the existing "json" as is (we know there are users relying on this) and add another format, EG "json2", "json_new" (I obviously have no imagination when it comes to names :-)).

WDYT?

rpc/internal/srv/srv.go Outdated Show resolved Hide resolved
@isker
Copy link
Contributor Author

isker commented Sep 1, 2022

@stefanhengl this is possible to do, I think, but there are some questions to be answered. I didn't implement this in serveSearchErr to start with because it has a lot of strange behaviors compared to the gob API that I wanted to avoid.

For example, it takes an entirely different behavior if it thinks you're only querying for repos:

zoekt/web/server.go

Lines 245 to 264 in 2478bf5

repoOnly := true
query.VisitAtoms(q, func(q query.Q) {
_, ok := q.(*query.Repo)
repoOnly = repoOnly && ok
})
if repoOnly {
repos, err := s.serveListReposErr(q, queryStr, r)
if err == nil {
return &ApiSearchResult{Repos: repos}, nil
}
return nil, err
}
if qt, ok := q.(*query.Type); ok && qt.Type == query.TypeRepo {
repos, err := s.serveListReposErr(q, queryStr, r)
if err == nil {
return &ApiSearchResult{Repos: repos}, nil
}
return nil, err
}

There's no such behavior in the gob API.

It goes from there to constrain or modify the request in ways that seem tightly fitted toward satisfying the HTML interface of the webserver. It really does feel like the existing JSON API was bolted onto this function after the fact. I'm aiming for a clean break, which is why I have straightforwardly exposed the request and response API objects in this PR.

Additionally, you described how we might refactor to vary the response format, but what do we do about the request format? Are you suggesting to not expose zoekt.SearchOptions in this API, instead just accepting the relatively limited inputs serveSearchErr does today? Or should our new format value also fork request parsing behavior?

Consider that serveSearchErr serves GET requests with a query string; I'd want the new API to be served over POST with a JSON body, both for HTTP semantic reasons (this really is a remote procedure call that does not fit into the model of idempotency implied by GET), and, much more importantly, for practical implementation reasons (zoekt.SearchOptions is not flat, and so is not trivially parsable out of a query string).

@stefanhengl
Copy link
Member

Ok. I suggest to create a separate package json at the root of the repository. This way you loose all your JSON prefixes and the code stays nice and contained. You can move all your code from rpc to there.

As you have seen, we have two other API routes, rpc and stream, which serve the same content. I am not sure if /api is most intuitive route for the JSON API. Maybe @keegancsmith has a better idea?

@isker
Copy link
Contributor Author

isker commented Sep 5, 2022

I've moved the JSON endpoint into its own package as suggested. I also added a test mirroring that in rpc.

The only functional change is that I've relaxed the /list endpoint to allow a missing query, such that all repos will be returned. I think it makes sense to want to list all repos this way, though the equivalent in /search remains nonsense.

Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks so much. I'll let @stefanhengl take a look over before merging in.

json/json.go Show resolved Hide resolved
return
}
if searchArgs.Opts == nil {
searchArgs.Opts = &zoekt.SearchOptions{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: in the webserver handler it does some adjustments to SearchOptions based on the number of results it expects to find which is pretty neat. This may be useful behaviour to have in this handler if opts is not set:

zoekt/web/server.go

Lines 290 to 316 in d1964a3

ctx := r.Context()
if result, err := s.Searcher.Search(ctx, q, &zoekt.SearchOptions{EstimateDocCount: true}); err != nil {
return nil, err
} else if numdocs := result.ShardFilesConsidered; numdocs > 10000 {
// If the search touches many shards and many files, we
// have to limit the number of matches. This setting
// is based on the number of documents eligible after
// considering reponames, so large repos (both
// android, chromium are about 500k files) aren't
// covered fairly.
// 10k docs, 50 num -> max match = (250 + 250 / 10)
sOpts.ShardMaxMatchCount = num*5 + (5*num)/(numdocs/1000)
// 10k docs, 50 num -> max important match = 4
sOpts.ShardMaxImportantMatch = num/20 + num/(numdocs/500)
} else {
// Virtually no limits for a small corpus; important
// matches are just as expensive as normal matches.
n := numdocs + num*100
sOpts.ShardMaxImportantMatch = n
sOpts.ShardMaxMatchCount = n
sOpts.TotalMaxMatchCount = n
sOpts.TotalMaxImportantMatch = n
}
sOpts.MaxDocDisplayCount = num
sOpts.DebugScore = debugScore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly imitated what rpc does (or does not) do when implementing this endpoint, instead of imitating the webserver. Am I correct in saying that rpc does not do this? Does sourcegraph (as clients of rpc/stream) implement something similar, or?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup RPC does not do this, so on sourcegraph's side we implement that behaviour. So doing it will improve the ergonomics for the client of this API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted a shared function that's used by both the new JSON endpoint and the webserver. I put it in the new json package, but if you think there's a better place, let me know.

web/server.go Show resolved Hide resolved
Copy link
Member

@stefanhengl stefanhengl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

json/json.go Outdated Show resolved Hide resolved
This adds a JSON API that directly exposes the Search and List functions
at /api/search and /api/list in zoekt-webserveer, gated behind the
existing -rpc flag.

Closes sourcegraph#410.
@stefanhengl stefanhengl merged commit c70c729 into sourcegraph:main Sep 9, 2022
@isker
Copy link
Contributor Author

isker commented Sep 9, 2022

Thanks to you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC: support non-gob encodings?
3 participants