Skip to content

Add support to log slow queries in Frontend #1744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 7, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
* `-ingester.max-global-series-per-metric`
* [FEATURE] Flush chunks with stale markers early with `ingester.max-stale-chunk-idle`. #1759
* [FEATURE] EXPERIMENTAL: Added new KV Store backend based on memberlist library. Components can gossip about tokens and ingester states, instead of using Consul or Etcd. #1721
* [FEATURE] Allow Query Frontend to log slow queries with `frontend.log-queries-longer-than`. #1744
* [ENHANCEMENT] Allocation improvements in adding samples to Chunk. #1706
* [ENHANCEMENT] Consul client now follows recommended practices for blocking queries wrt returned Index value. #1708
* [ENHANCEMENT] Consul client can optionally rate-limit itself during Watch (used e.g. by ring watchers) and WatchPrefix (used by HA feature) operations. Rate limiting is disabled by default. New flags added: `--consul.watch-rate-limit`, and `--consul.watch-burst-size`. #1708
Expand Down
1 change: 1 addition & 0 deletions docs/prometheus-frontend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ server:
http_listen_port: 9091

frontend:
log_queries_longer_than: 1s
split_queries_by_day: true
align_queries_with_step: true
cache_results: true
Expand Down
25 changes: 21 additions & 4 deletions pkg/querier/frontend/frontend.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"bytes"
"context"
"flag"
"fmt"
"io"
"io/ioutil"
"math/rand"
Expand All @@ -15,6 +16,7 @@ import (

"github.com/NYTimes/gziphandler"
"github.com/go-kit/kit/log"
"github.com/go-kit/kit/log/level"
opentracing "github.com/opentracing/opentracing-go"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
Expand Down Expand Up @@ -42,16 +44,18 @@ var (

// Config for a Frontend.
type Config struct {
MaxOutstandingPerTenant int `yaml:"max_outstanding_per_tenant"`
CompressResponses bool `yaml:"compress_responses"`
DownstreamURL string `yaml:"downstream"`
MaxOutstandingPerTenant int `yaml:"max_outstanding_per_tenant"`
CompressResponses bool `yaml:"compress_responses"`
DownstreamURL string `yaml:"downstream"`
LogQueriesLongerThan time.Duration `yaml:"log_queries_longer_than"`
}

// RegisterFlags adds the flags required to config this to the given FlagSet.
func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.MaxOutstandingPerTenant, "querier.max-outstanding-requests-per-tenant", 100, "Maximum number of outstanding requests per tenant per frontend; requests beyond this error with HTTP 429.")
f.BoolVar(&cfg.CompressResponses, "querier.compress-http-responses", false, "Compress HTTP responses.")
f.StringVar(&cfg.DownstreamURL, "frontend.downstream-url", "", "URL of downstream Prometheus.")
f.DurationVar(&cfg.LogQueriesLongerThan, "frontend.log-queries-longer-than", 0, "Log queries that are slower than the specified duration. 0 to disable.")
}

// Frontend queues HTTP requests, dispatches them to backends, and handles retries
Expand Down Expand Up @@ -139,7 +143,20 @@ func (f *Frontend) Handler() http.Handler {
}

func (f *Frontend) handle(w http.ResponseWriter, r *http.Request) {
userID, err := user.ExtractOrgID(r.Context())
if err != nil {
server.WriteError(w, err)
return
}

startTime := time.Now()
resp, err := f.roundTripper.RoundTrip(r)
queryResponseTime := time.Now().Sub(startTime)

if f.cfg.LogQueriesLongerThan > 0 && queryResponseTime > f.cfg.LogQueriesLongerThan {
level.Info(f.log).Log("msg", "slow query", "org_id", userID, "url", fmt.Sprintf("http://%s", r.Host+r.RequestURI), "time_taken", queryResponseTime.String())
}

if err != nil {
server.WriteError(w, err)
return
Expand Down Expand Up @@ -187,7 +204,7 @@ func (c *httpgrpcHeadersCarrier) Set(key, val string) {
})
}

// RoundTripGRPC round trips a proto (instread of a HTTP request).
// RoundTripGRPC round trips a proto (instead of a HTTP request).
func (f *Frontend) RoundTripGRPC(ctx context.Context, req *ProcessRequest) (*ProcessResponse, error) {
// Propagate trace context in gRPC too - this will be ignored if using HTTP.
tracer, span := opentracing.GlobalTracer(), opentracing.SpanFromContext(ctx)
Expand Down