Skip to content

Commit

Permalink
Merge branch 'main' into s3_contextErr
Browse files Browse the repository at this point in the history
  • Loading branch information
liguozhong committed Nov 8, 2022
2 parents 5122b1f + fe49e66 commit 93f7baf
Show file tree
Hide file tree
Showing 21 changed files with 355 additions and 72 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ Check the history of the branch FIXME.
* [7264](https://github.com/grafana/loki/pull/7264) **bboreham**: Chunks: decode varints directly from byte buffer, for speed.
* [7263](https://github.com/grafana/loki/pull/7263) **bboreham**: Dependencies: klauspost/compress package to v1.15.11; improves performance.
* [7270](https://github.com/grafana/loki/pull/7270) **wilfriedroset**: Add support for `username` to redis cache configuration.
* [6952](https://github.com/grafana/loki/pull/6952) **DylanGuedes**: Experimental: Introduce a new feature named stream sharding.

##### Fixes
* [7453](https://github.com/grafana/loki/pull/7453) **periklis**: Add single compactor http client for delete and gennumber clients
Expand Down Expand Up @@ -98,6 +99,7 @@ Check the history of the branch FIXME.
* [7414](https://github.com/grafana/loki/pull/7414) **thepalbi**: Add basic tracing support

##### Fixes
* [7394](https://github.com/grafana/loki/pull/7394) **liguozhong**: Fix issue with the Cloudflare target that caused it to stop working after it received an error in the logpull request as explained in issue https://github.com/grafana/loki/issues/6150
* [6766](https://github.com/grafana/loki/pull/6766) **kavirajk**: fix(logql): Make `LabelSampleExtractor` ignore processing the line if it doesn't contain that specific label. Fixes unwrap behavior explained in the issue https://github.com/grafana/loki/issues/6713
* [7016](https://github.com/grafana/loki/pull/7016) **chodges15**: Fix issue with dropping logs when a file based SD target's labels are updated
* [7461](https://github.com/grafana/loki/pull/7461) **MarNicGit**: Promtail: Fix collecting userdata field from Windows Event Log
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ fluent-bit-plugin:
go build $(DYN_GO_FLAGS) -buildmode=c-shared -o clients/cmd/fluent-bit/out_grafana_loki.so ./clients/cmd/fluent-bit/

fluent-bit-image:
$(SUDO) docker build -t $(IMAGE_PREFIX)/fluent-bit-plugin-loki:$(IMAGE_TAG) -f clients/cmd/fluent-bit/Dockerfile .
$(SUDO) docker build -t $(IMAGE_PREFIX)/fluent-bit-plugin-loki:$(IMAGE_TAG) --build-arg LDFLAGS="-s -w $(GO_LDFLAGS)" -f clients/cmd/fluent-bit/Dockerfile .

fluent-bit-push:
$(SUDO) $(PUSH_OCI) $(IMAGE_PREFIX)/fluent-bit-plugin-loki:$(IMAGE_TAG)
Expand Down
26 changes: 20 additions & 6 deletions clients/cmd/fluent-bit/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,24 @@
FROM golang:1.19.2 as build
COPY . /src/loki
WORKDIR /src/loki
RUN make clean && make BUILD_IN_CONTAINER=false fluent-bit-plugin
FROM golang:1.19.2@sha256:b850621230956a6d960d6d7cfaba6a8a2e8e245b230a928ef66aa0cfd065e229 AS builder

FROM fluent/fluent-bit:1.8
COPY --from=build /src/loki/clients/cmd/fluent-bit/out_grafana_loki.so /fluent-bit/bin
COPY . /src

WORKDIR /src

ARG LDFLAGS
ENV CGO_ENABLED=1

RUN go build \
-trimpath -ldflags "${LDFLAGS}" \
-tags netgo \
-buildmode=c-shared \
-o clients/cmd/fluent-bit/out_grafana_loki.so \
/src/clients/cmd/fluent-bit

FROM fluent/fluent-bit:1.9.9@sha256:3045036b2ef35eae09a5f40273a0f1fbd70ca4d67e80918bfd0676b16ba43a29

COPY --from=builder /src/clients/cmd/fluent-bit/out_grafana_loki.so /fluent-bit/bin
COPY clients/cmd/fluent-bit/fluent-bit.conf /fluent-bit/etc/fluent-bit.conf

EXPOSE 2020

CMD ["/fluent-bit/bin/fluent-bit", "-e","/fluent-bit/bin/out_grafana_loki.so", "-c", "/fluent-bit/etc/fluent-bit.conf"]
29 changes: 22 additions & 7 deletions clients/cmd/fluent-bit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,40 @@

This plugin is implemented with [Fluent Bit's Go plugin](https://github.com/fluent/fluent-bit-go) interface. It pushes logs to Loki using a GRPC connection.

> syslog and systemd input plugin have not been tested yet, feedback appreciated.
> **Warning**
> `syslog` and `systemd` input plugins have not been tested yet. Feedback appreciated, file [an issue](https://github.com/grafana/loki/issues/new?template=bug_report.md) if you encounter any misbehaviors.
## Building

Prerequisites:
**Prerequisites**

* Go 1.16+
* gcc (for cgo)

To build the output plugin library file (`out_grafana_loki.so`), you can use:
To [build](https://docs.fluentbit.io/manual/development/golang-output-plugins#build-a-go-plugin) the output plugin library file `out_grafana_loki.so`, in the root directory of Loki source code, you can use:

```bash
make fluent-bit-plugin
$ make fluent-bit-plugin
```

You can also build the docker image with the plugin pre-installed using:
You can also build the Docker image with the plugin pre-installed using:

```bash
make fluent-bit-image
$ make fluent-bit-image
```

Finally if you want to test you can use `make fluent-bit-test` to send some logs to your local Loki instance.
## Running

```bash
$ fluent-bit -e out_grafana_loki.so -c /etc/fluent-bit.conf
```

**Testing**

Issue the following command to send `/var/log` logs to your `http://localhost:3100/loki/api/` Loki instance for testing:

```bash
$ make fluent-bit-test
```

You can easily override the address by setting the `$LOKI_URL` environment variable.
2 changes: 1 addition & 1 deletion clients/pkg/promtail/client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ func NewMetrics(reg prometheus.Registerer, streamLagLabels []string) *Metrics {
}, []string{HostLabel})

m.countersWithHost = []*prometheus.CounterVec{
m.encodedBytes, m.sentBytes, m.droppedBytes, m.sentEntries, m.droppedEntries,
m.encodedBytes, m.sentBytes, m.droppedBytes, m.sentEntries, m.droppedEntries, m.batchRetries,
}

streamLagLabelsMerged := []string{HostLabel, ClientLabel}
Expand Down
3 changes: 3 additions & 0 deletions clients/pkg/promtail/targets/cloudflare/target.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ func (t *Target) pull(ctx context.Context, start, end time.Time) error {
level.Warn(t.logger).Log("msg", "failed iterating over logs, out of cloudflare range, not retrying", "err", err, "start", start, "end", end, "retries", backoff.NumRetries())
return nil
} else if err != nil {
if it != nil {
it.Close()
}
errs.Add(err)
backoff.Wait()
continue
Expand Down
34 changes: 34 additions & 0 deletions clients/pkg/promtail/targets/cloudflare/target_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,37 @@ func Test_CloudflareTarget(t *testing.T) {
require.Greater(t, newPos, end.UnixNano())
}

func Test_RetryErrorLogpullReceived(t *testing.T) {
var (
w = log.NewSyncWriter(os.Stderr)
logger = log.NewLogfmtLogger(w)
end = time.Unix(0, time.Hour.Nanoseconds())
start = time.Unix(0, end.Add(-30*time.Minute).UnixNano())
client = fake.New(func() {})
cfClient = newFakeCloudflareClient()
)
cfClient.On("LogpullReceived", mock.Anything, start, end).Return(&fakeLogIterator{
err: ErrorLogpullReceived,
}, nil).Times(2) // just retry once
// replace the client
getClient = func(apiKey, zoneID string, fields []string) (Client, error) {
return cfClient, nil
}
defaultBackoff.MinBackoff = 0
defaultBackoff.MaxBackoff = 5
ta := &Target{
logger: logger,
handler: client,
client: cfClient,
config: &scrapeconfig.CloudflareConfig{
Labels: make(model.LabelSet),
},
metrics: NewMetrics(nil),
}

require.NoError(t, ta.pull(context.Background(), start, end))
}

func Test_RetryErrorIterating(t *testing.T) {
var (
w = log.NewSyncWriter(os.Stderr)
Expand All @@ -124,6 +155,9 @@ func Test_RetryErrorIterating(t *testing.T) {
`{"EdgeStartTimestamp":3, "EdgeRequestHost":"foo.com"}`,
},
}, nil).Once()
cfClient.On("LogpullReceived", mock.Anything, start, end).Return(&fakeLogIterator{
err: ErrorLogpullReceived,
}, nil).Once()
// replace the client.
getClient = func(apiKey, zoneID string, fields []string) (Client, error) {
return cfClient, nil
Expand Down
15 changes: 13 additions & 2 deletions clients/pkg/promtail/targets/cloudflare/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ import (
"github.com/stretchr/testify/mock"
)

var ErrorLogpullReceived = errors.New("error logpull received")

type fakeCloudflareClient struct {
mock.Mock
}
Expand Down Expand Up @@ -45,7 +47,12 @@ func (f *fakeLogIterator) Next() bool {
func (f *fakeLogIterator) Err() error { return f.err }
func (f *fakeLogIterator) Line() []byte { return []byte(f.current) }
func (f *fakeLogIterator) Fields() (map[string]string, error) { return nil, nil }
func (f *fakeLogIterator) Close() error { return nil }
func (f *fakeLogIterator) Close() error {
if f.err == ErrorLogpullReceived {
f.err = nil
}
return nil
}

func newFakeCloudflareClient() *fakeCloudflareClient {
return &fakeCloudflareClient{}
Expand All @@ -54,7 +61,11 @@ func newFakeCloudflareClient() *fakeCloudflareClient {
func (f *fakeCloudflareClient) LogpullReceived(ctx context.Context, start, end time.Time) (cloudflare.LogpullReceivedIterator, error) {
r := f.Called(ctx, start, end)
if r.Get(0) != nil {
return r.Get(0).(cloudflare.LogpullReceivedIterator), nil
it := r.Get(0).(cloudflare.LogpullReceivedIterator)
if it.Err() == ErrorLogpullReceived {
return it, it.Err()
}
return it, nil
}
return nil, r.Error(1)
}
4 changes: 2 additions & 2 deletions docs/sources/installation/helm/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1978,7 +1978,7 @@ null
<tr>
<td>monitoring.serviceMonitor.metricsInstance.annotations</td>
<td>object</td>
<td>MerticsInstance annotations</td>
<td>MetricsInstance annotations</td>
<td><pre lang="json">
{}
</pre>
Expand All @@ -1996,7 +1996,7 @@ true
<tr>
<td>monitoring.serviceMonitor.metricsInstance.labels</td>
<td>object</td>
<td>Additional MatricsInstance labels</td>
<td>Additional MetricsInstance labels</td>
<td><pre lang="json">
{}
</pre>
Expand Down
16 changes: 16 additions & 0 deletions docs/sources/operations/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,22 @@ attempt to run a [LogCLI](../../tools/logcli/) query in as direct a manner as yo
- Adjust the [Grafana dataproxy timeout](https://grafana.com/docs/grafana/latest/administration/configuration/#dataproxy). Configure Grafana with a large enough dataproxy timeout.
- Check timeouts for reverse proxies or load balancers between your client and Grafana. Queries to Grafana are made from the your local browser with Grafana serving as a proxy (a dataproxy). Therefore, connections from your client to Grafana must have their timeout configured as well.

## Cache Generation errors
Loki cache generation number errors(Loki >= 2.6)

### error loading cache generation numbers

- Symptom:

- Loki exposed errors on log with `msg="error loading cache generation numbers" err="unexpected status code: 403"` or `msg="error getting cache gen numbers from the store"`

- Investigation:

- Check the metric `loki_delete_cache_gen_load_failures_total` on `/metrics`, which is an indicator for the occurrence of the problem. If the value is greater than 1, it means that there is a problem with that component.

- Try Http GET request to route: /loki/api/v1/cache/generation_numbers
- If response is equal as `"deletion is not available for this tenant"`, this means the deletion API is not enabled for the tenant. To enable this api, set `allow_deletes: true` for this tenant via the configuration settings. Check more docs: https://grafana.com/docs/loki/latest/operations/storage/logs-deletion/

## Troubleshooting targets

Promtail exposes two web pages that can be used to understand how its service
Expand Down
2 changes: 1 addition & 1 deletion operator/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ scorecard: generate go-generate bundle ## Run scorecard test

.PHONY: lint
lint: $(GOLANGCI_LINT) | generate ## Run golangci-lint on source code.
$(GOLANGCI_LINT) run ./...
$(GOLANGCI_LINT) run --timeout=5m ./...

.PHONY: lint-prometheus
lint-prometheus: $(PROMTOOL) ## Run promtool check against recording rules and alerts.
Expand Down
18 changes: 9 additions & 9 deletions pkg/ingester/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -751,12 +751,7 @@ func parseShardFromRequest(reqShards []string) (*astmapper.ShardAnnotation, erro
}

func isDone(ctx context.Context) bool {
select {
case <-ctx.Done():
return true
default:
return false
}
return ctx.Err() != nil
}

// QuerierQueryServer is the GRPC server stream we use to send batch of entries.
Expand Down Expand Up @@ -790,7 +785,10 @@ func sendBatches(ctx context.Context, i iter.EntryIterator, queryServer QuerierQ
stats.AddIngesterBatch(int64(batchSize))
batch.Stats = stats.Ingester()

if err := queryServer.Send(batch); err != nil {
if isDone(ctx) {
break
}
if err := queryServer.Send(batch); err != nil && err != context.Canceled {
return err
}
stats.Reset()
Expand All @@ -811,8 +809,10 @@ func sendSampleBatches(ctx context.Context, it iter.SampleIterator, queryServer

stats.AddIngesterBatch(int64(size))
batch.Stats = stats.Ingester()

if err := queryServer.Send(batch); err != nil {
if isDone(ctx) {
break
}
if err := queryServer.Send(batch); err != nil && err != context.Canceled {
return err
}

Expand Down
Loading

0 comments on commit 93f7baf

Please sign in to comment.