Skip to content

using gcs for storage with default helm templates results in "caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"" #932

Closed
@shepely

Description

@shepely

Description

Hello!

I'm trying with helm to setup loki to use GCS for the object storage, while for index storage we're planning to use Cassandra eventualy. So no bigtable in the setup, which I assume is not needed for GCS usage, as there is no documenation controdicting this assumption. For a sake of simplicity, I'll keep default boltdb configuration for index storage below.

I've followed this modest intruction https://github.com/grafana/loki/blob/master/docs/operations.md#google-cloud-storage and this production setup

and got some ideas from here #256 as well.

As a result loki returns an error on a attempt to flush data to GCS:
level=error ts=2019-08-22T09:29:31.858305985Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"

To Reproduce
Steps to reproduce the behavior:

  1. Create GCS bucket.
  2. Create GCP service account and private JSON key for it.
  3. In the bucket permissions grant access for the SA by assigning a role Storage Object Admin (also tried with Storage Legacy Bucket Owner)
  4. Clone https://github.com/grafana/loki/tree/master/production/helm/loki to some local folder
  5. Add secrets.yaml file and place created JSON key in it.
loki_access_gcs: |+
    {
      "type": "service_account",
      "project_id": "my-project",
      "private_key_id": "123456789",
      "private_key": "-----BEGIN PRIVATE KEY-----\nmykey\n-----END PRIVATE KEY-----\n",
      "client_email": "loki-access-gcs@my-project.iam.gserviceaccount.com",
      "client_id": "123456789",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/loki-access-gcs%40my-project.iam.gserviceaccount.com"
    }

Using helm secrets plugin for encryption: https://github.com/futuresimple/helm-secrets

  1. In the existing templates/secret.yaml add new secret:
---
apiVersion: v1
kind: Secret
metadata:
  name: loki-access-gcs
type: Opaque
data:
  key.json: {{ .Values.loki_access_gcs | b64enc }}
  1. Slightly modify the existing templates/statefulset.yaml to include new GOOGLE_APPLICATION_CREDENTIALS env var:
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ template "loki.fullname" . }}
  namespace: {{ .Release.Namespace }}
  labels:
    app: {{ template "loki.name" . }}
    chart: {{ template "loki.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
  annotations:
    {{- toYaml .Values.annotations | nindent 4 }}
spec:
  podManagementPolicy: {{ .Values.podManagementPolicy }}
  replicas: {{ .Values.replicas }}
  selector:
    matchLabels:
      app: {{ template "loki.name" . }}
      release: {{ .Release.Name }}
  serviceName: {{ template "loki.fullname" . }}-headless
  updateStrategy:
    {{- toYaml .Values.updateStrategy | nindent 4 }}
  template:
    metadata:
      labels:
        app: {{ template "loki.name" . }}
        name: {{ template "loki.name" . }}
        release: {{ .Release.Name }}
        {{- with .Values.podLabels }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
    spec:
      serviceAccountName: {{ template "loki.serviceAccountName" . }}
    {{- if .Values.priorityClassName }}
      priorityClassName: {{ .Values.priorityClassName }}
    {{- end }}
      securityContext:
        {{- toYaml .Values.securityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          args:
            - "-config.file=/etc/loki/loki.yaml"
          {{- range $key, $value := .Values.extraArgs }}
            - "-{{ $key }}={{ $value }}"
          {{- end }}
          volumeMounts:
            - name: config
              mountPath: /etc/loki
            - name: storage
              mountPath: "/data"
              subPath: {{ .Values.persistence.subPath }}
            - name: loki-access-gcs
              mountPath: /etc/secrets
          ports:
            - name: http-metrics
              containerPort: {{ .Values.config.server.http_listen_port }}
              protocol: TCP
          livenessProbe:
            {{- toYaml .Values.livenessProbe | nindent 12 }}
          readinessProbe:
            {{- toYaml .Values.readinessProbe | nindent 12 }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          securityContext:
            readOnlyRootFilesystem: true
          env:
            {{- if .Values.tracing.jaegerAgentHost }}
            - name: JAEGER_AGENT_HOST
              value: "{{ .Values.tracing.jaegerAgentHost }}"
            {{- end }}
            - name: GOOGLE_APPLICATION_CREDENTIALS
              value: /etc/secrets/key.json
      nodeSelector:
        {{- toYaml .Values.nodeSelector | nindent 8 }}
      affinity:
        {{- toYaml .Values.affinity | nindent 8 }}
      tolerations:
        {{- toYaml .Values.tolerations | nindent 8 }}
      terminationGracePeriodSeconds: {{ .Values.terminationGracePeriodSeconds }}
      volumes:
        - name: config
          secret:
            secretName: {{ template "loki.fullname" . }}
        - name: loki-access-gcs
          secret:
            secretName: loki-access-gcs
  {{- if not .Values.persistence.enabled }}
        - name: storage
          emptyDir: {}
  {{- else if .Values.persistence.existingClaim }}
        - name: storage
          persistentVolumeClaim:
            claimName: {{ .Values.persistence.existingClaim }}
  {{- else }}
  volumeClaimTemplates:
  - metadata:
      name: storage
      annotations:
        {{- toYaml .Values.persistence.annotations | nindent 8 }}
    spec:
      accessModes:
        {{- toYaml .Values.persistence.accessModes | nindent 8 }}
      resources:
        requests:
          storage: {{ .Values.persistence.size | quote }}
      storageClassName: {{ .Values.persistence.storageClassName }}
  {{- end }}
  1. Modify loki/values.yaml to use gcs as an object_storage and add your bucket name:
## Affinity for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# podAntiAffinity:
#   requiredDuringSchedulingIgnoredDuringExecution:
#   - labelSelector:
#       matchExpressions:
#       - key: app
#         operator: In
#         values:
#         - loki
#     topologyKey: "kubernetes.io/hostname"

## StatefulSet annotations
annotations: {}

# enable tracing for debug, need install jaeger and specify right jaeger_agent_host
tracing:
  jaegerAgentHost:

config:
  auth_enabled: false
  ingester:
    chunk_idle_period: 15m
    chunk_block_size: 262144
    lifecycler:
      ring:
        kvstore:
          store: inmemory
        replication_factor: 1

      ## Different ring configs can be used. E.g. Consul
      # ring:
      #   store: consul
      #   replication_factor: 1
      #   consul:
      #     host: "consul:8500"
      #     prefix: ""
      #     httpclienttimeout: "20s"
      #     consistentreads: true
  limits_config:
    enforce_metric_name: false
    reject_old_samples: true
    reject_old_samples_max_age: 168h
  schema_config:
    configs:
    - from: 2018-04-15
      store: boltdb
      object_store: gcs
      schema: v9
      index:
        prefix: index_
        period: 168h
  server:
    http_listen_port: 3100
  storage_config:
    boltdb:
      directory: /data/loki/index
    gcs:
      bucket_name: my-bucket-name
  chunk_store_config:
    max_look_back_period: 0
  table_manager:
    retention_deletes_enabled: false
    retention_period: 0

image:
  repository: grafana/loki
  tag: v0.3.0
  pullPolicy: IfNotPresent

## Additional Loki container arguments, e.g. log level (debug, info, warn, error)
extraArgs: {}
  # log.level: debug

livenessProbe:
  httpGet:
    path: /ready
    port: http-metrics
  initialDelaySeconds: 45

## Enable persistence using Persistent Volume Claims
networkPolicy:
  enabled: false

## ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector:
  fixed: "true"

## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
## If you set enabled as "True", you need :
## - create a pv which above 10Gi and has same namespace with loki
## - keep storageClassName same with below setting
persistence:
  enabled: false
  accessModes:
  - ReadWriteOnce
  size: 10Gi
  storageClassName: default
  annotations: {}
  # subPath: ""
  # existingClaim:

## Pod Labels
podLabels: {}

## Pod Annotations
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "http-metrics"

podManagementPolicy: OrderedReady

## Assign a PriorityClassName to pods if set
# priorityClassName:

rbac:
  create: true
  pspEnabled: true

readinessProbe:
  httpGet:
    path: /ready
    port: http-metrics
  initialDelaySeconds: 45

replicas: 1

resources: {}
# limits:
#   cpu: 200m
#   memory: 256Mi
# requests:
#   cpu: 100m
#   memory: 128Mi

securityContext:
  fsGroup: 10001
  runAsGroup: 10001
  runAsNonRoot: true
  runAsUser: 10001

service:
  type: ClusterIP
  nodePort:
  port: 3100
  annotations: {}
  labels: {}

serviceAccount:
  create: true
  name:

terminationGracePeriodSeconds: 30

## Tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations:
- key: "fixed"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

# The values to set in the PodDisruptionBudget spec
# If not set then a PodDisruptionBudget will not be created
podDisruptionBudget: {}
# minAvailable: 1
# maxUnavailable: 1

updateStrategy:
  type: RollingUpdate

serviceMonitor:
  enabled: false
  interval: ""
  1. Assuming that promtail is already running on your nodes, udate loki:
 helm secrets upgrade --install loki loki/ -f loki/values.yaml -f loki/secrets.yaml
  1. Lastly check loki logs, to see whether you get errors similar to:
level=error ts=2019-08-22T11:57:30.752325389Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.75423081Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.761445231Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.765350267Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.772100702Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.772169302Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"

Expected behavior
Data is flushed to GCS.

Environment:

  • Infrastructure: GKE 1.13
  • Deployment tool: Helm

Additional information
To validate the json key for the service account itself is valid, I've exec'ed into a devbox container within the same GKE cluster as a loki and performed following:

root@devbox-68bd5ccc68-lxbfv:/# vi key.json
root@devbox-68bd5ccc68-lxbfv:/# cat key.json
{
    "type": "service_account",
    "project_id": "my-project",
    "private_key_id": "123456789",
    "private_key": "-----BEGIN PRIVATEKEY-----\nmykey\n-----END PRIVATE KEY-----\n",
    "client_email":"loki-access-gcs@my-project.iam.gserviceaccountcom",
    "client_id": "123456789",
    "auth_uri": "https://accounts.google.com/o/oauth2auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https:/www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.comrobot/v1/metadata/x509loki-access-gcs%40my-project.iam.gserviceaccountcom"
}
root@devbox-68bd5ccc68-lxbfv:/# gcloud auth activate-service-account --key-file key.json
Activated service account credentials for: [loki-access-gcs@my-project.iam.gserviceaccount.com]

To take a quick anonymous survey, run:
  $ gcloud alpha survey

root@devbox-68bd5ccc68-lxbfv:/# touch test.txt
root@devbox-68bd5ccc68-lxbfv:/# vi test.txt
root@devbox-68bd5ccc68-lxbfv:/# gsutil cp test.txt gs://my-bucket-name/
Copying file://test.txt [Content-Type=text/plain]...
/ [1 files][   10.0 B/   10.0 B]
Operation completed over 1 objects/10.0 B.

Also I've exec'ed into the loki container, to ensure that key.json is properly mounted, see below:

k exec -it -n loki loki-0 sh
/ $ cat etc/secrets/key.json
{
  "type": "service_account",
  "project_id": "my-project",
  "private_key_id": "123456789",
  "private_key": "-----BEGIN PRIVATE KEY-----\nmykey\n-----END PRIVATE KEY-----\n",
  "client_email": "loki-access-gcs@my-project.iam.gserviceaccount.com",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/loki-access-gcs%40my-project.iam.gserviceaccount.com"
}
/ $ echo $GOOGLE_APPLICATION_CREDENTIALS
/etc/secrets/key.json

P.S.: Obviously, all sensitive data is replaced with sample one (e.g. project name, bucket name and etc)

Please, advice on how to approch the issue or confirm whether this is a Bug, as I can't be certain that the setup above is correct. Thanks!

Metadata

Metadata

Assignees

Labels

type/bugSomehing is not working as expected

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions