Description
Description
Hello!
I'm trying with helm to setup loki to use GCS for the object storage, while for index storage we're planning to use Cassandra eventualy. So no bigtable in the setup, which I assume is not needed for GCS usage, as there is no documenation controdicting this assumption. For a sake of simplicity, I'll keep default boltdb configuration for index storage below.
I've followed this modest intruction https://github.com/grafana/loki/blob/master/docs/operations.md#google-cloud-storage and this production setup
and got some ideas from here #256 as well.As a result loki returns an error on a attempt to flush data to GCS:
level=error ts=2019-08-22T09:29:31.858305985Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
To Reproduce
Steps to reproduce the behavior:
- Create GCS bucket.
- Create GCP service account and private JSON key for it.
- In the bucket permissions grant access for the SA by assigning a role Storage Object Admin (also tried with Storage Legacy Bucket Owner)
- Clone https://github.com/grafana/loki/tree/master/production/helm/loki to some local folder
- Add secrets.yaml file and place created JSON key in it.
loki_access_gcs: |+
{
"type": "service_account",
"project_id": "my-project",
"private_key_id": "123456789",
"private_key": "-----BEGIN PRIVATE KEY-----\nmykey\n-----END PRIVATE KEY-----\n",
"client_email": "loki-access-gcs@my-project.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/loki-access-gcs%40my-project.iam.gserviceaccount.com"
}
Using helm secrets plugin for encryption: https://github.com/futuresimple/helm-secrets
- In the existing templates/secret.yaml add new secret:
---
apiVersion: v1
kind: Secret
metadata:
name: loki-access-gcs
type: Opaque
data:
key.json: {{ .Values.loki_access_gcs | b64enc }}
- Slightly modify the existing templates/statefulset.yaml to include new GOOGLE_APPLICATION_CREDENTIALS env var:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ template "loki.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
app: {{ template "loki.name" . }}
chart: {{ template "loki.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
annotations:
{{- toYaml .Values.annotations | nindent 4 }}
spec:
podManagementPolicy: {{ .Values.podManagementPolicy }}
replicas: {{ .Values.replicas }}
selector:
matchLabels:
app: {{ template "loki.name" . }}
release: {{ .Release.Name }}
serviceName: {{ template "loki.fullname" . }}-headless
updateStrategy:
{{- toYaml .Values.updateStrategy | nindent 4 }}
template:
metadata:
labels:
app: {{ template "loki.name" . }}
name: {{ template "loki.name" . }}
release: {{ .Release.Name }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
annotations:
checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ template "loki.serviceAccountName" . }}
{{- if .Values.priorityClassName }}
priorityClassName: {{ .Values.priorityClassName }}
{{- end }}
securityContext:
{{- toYaml .Values.securityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
args:
- "-config.file=/etc/loki/loki.yaml"
{{- range $key, $value := .Values.extraArgs }}
- "-{{ $key }}={{ $value }}"
{{- end }}
volumeMounts:
- name: config
mountPath: /etc/loki
- name: storage
mountPath: "/data"
subPath: {{ .Values.persistence.subPath }}
- name: loki-access-gcs
mountPath: /etc/secrets
ports:
- name: http-metrics
containerPort: {{ .Values.config.server.http_listen_port }}
protocol: TCP
livenessProbe:
{{- toYaml .Values.livenessProbe | nindent 12 }}
readinessProbe:
{{- toYaml .Values.readinessProbe | nindent 12 }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
securityContext:
readOnlyRootFilesystem: true
env:
{{- if .Values.tracing.jaegerAgentHost }}
- name: JAEGER_AGENT_HOST
value: "{{ .Values.tracing.jaegerAgentHost }}"
{{- end }}
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /etc/secrets/key.json
nodeSelector:
{{- toYaml .Values.nodeSelector | nindent 8 }}
affinity:
{{- toYaml .Values.affinity | nindent 8 }}
tolerations:
{{- toYaml .Values.tolerations | nindent 8 }}
terminationGracePeriodSeconds: {{ .Values.terminationGracePeriodSeconds }}
volumes:
- name: config
secret:
secretName: {{ template "loki.fullname" . }}
- name: loki-access-gcs
secret:
secretName: loki-access-gcs
{{- if not .Values.persistence.enabled }}
- name: storage
emptyDir: {}
{{- else if .Values.persistence.existingClaim }}
- name: storage
persistentVolumeClaim:
claimName: {{ .Values.persistence.existingClaim }}
{{- else }}
volumeClaimTemplates:
- metadata:
name: storage
annotations:
{{- toYaml .Values.persistence.annotations | nindent 8 }}
spec:
accessModes:
{{- toYaml .Values.persistence.accessModes | nindent 8 }}
resources:
requests:
storage: {{ .Values.persistence.size | quote }}
storageClassName: {{ .Values.persistence.storageClassName }}
{{- end }}
- Modify loki/values.yaml to use gcs as an object_storage and add your bucket name:
## Affinity for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# podAntiAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# - labelSelector:
# matchExpressions:
# - key: app
# operator: In
# values:
# - loki
# topologyKey: "kubernetes.io/hostname"
## StatefulSet annotations
annotations: {}
# enable tracing for debug, need install jaeger and specify right jaeger_agent_host
tracing:
jaegerAgentHost:
config:
auth_enabled: false
ingester:
chunk_idle_period: 15m
chunk_block_size: 262144
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
## Different ring configs can be used. E.g. Consul
# ring:
# store: consul
# replication_factor: 1
# consul:
# host: "consul:8500"
# prefix: ""
# httpclienttimeout: "20s"
# consistentreads: true
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
schema_config:
configs:
- from: 2018-04-15
store: boltdb
object_store: gcs
schema: v9
index:
prefix: index_
period: 168h
server:
http_listen_port: 3100
storage_config:
boltdb:
directory: /data/loki/index
gcs:
bucket_name: my-bucket-name
chunk_store_config:
max_look_back_period: 0
table_manager:
retention_deletes_enabled: false
retention_period: 0
image:
repository: grafana/loki
tag: v0.3.0
pullPolicy: IfNotPresent
## Additional Loki container arguments, e.g. log level (debug, info, warn, error)
extraArgs: {}
# log.level: debug
livenessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
## Enable persistence using Persistent Volume Claims
networkPolicy:
enabled: false
## ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector:
fixed: "true"
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
## If you set enabled as "True", you need :
## - create a pv which above 10Gi and has same namespace with loki
## - keep storageClassName same with below setting
persistence:
enabled: false
accessModes:
- ReadWriteOnce
size: 10Gi
storageClassName: default
annotations: {}
# subPath: ""
# existingClaim:
## Pod Labels
podLabels: {}
## Pod Annotations
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "http-metrics"
podManagementPolicy: OrderedReady
## Assign a PriorityClassName to pods if set
# priorityClassName:
rbac:
create: true
pspEnabled: true
readinessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
replicas: 1
resources: {}
# limits:
# cpu: 200m
# memory: 256Mi
# requests:
# cpu: 100m
# memory: 128Mi
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
service:
type: ClusterIP
nodePort:
port: 3100
annotations: {}
labels: {}
serviceAccount:
create: true
name:
terminationGracePeriodSeconds: 30
## Tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations:
- key: "fixed"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# The values to set in the PodDisruptionBudget spec
# If not set then a PodDisruptionBudget will not be created
podDisruptionBudget: {}
# minAvailable: 1
# maxUnavailable: 1
updateStrategy:
type: RollingUpdate
serviceMonitor:
enabled: false
interval: ""
- Assuming that promtail is already running on your nodes, udate loki:
helm secrets upgrade --install loki loki/ -f loki/values.yaml -f loki/secrets.yaml
- Lastly check loki logs, to see whether you get errors similar to:
level=error ts=2019-08-22T11:57:30.752325389Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.75423081Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.761445231Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.765350267Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.772100702Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.772169302Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
Expected behavior
Data is flushed to GCS.
Environment:
- Infrastructure: GKE 1.13
- Deployment tool: Helm
Additional information
To validate the json key for the service account itself is valid, I've exec'ed into a devbox container within the same GKE cluster as a loki and performed following:
root@devbox-68bd5ccc68-lxbfv:/# vi key.json
root@devbox-68bd5ccc68-lxbfv:/# cat key.json
{
"type": "service_account",
"project_id": "my-project",
"private_key_id": "123456789",
"private_key": "-----BEGIN PRIVATEKEY-----\nmykey\n-----END PRIVATE KEY-----\n",
"client_email":"loki-access-gcs@my-project.iam.gserviceaccountcom",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https:/www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.comrobot/v1/metadata/x509loki-access-gcs%40my-project.iam.gserviceaccountcom"
}
root@devbox-68bd5ccc68-lxbfv:/# gcloud auth activate-service-account --key-file key.json
Activated service account credentials for: [loki-access-gcs@my-project.iam.gserviceaccount.com]
To take a quick anonymous survey, run:
$ gcloud alpha survey
root@devbox-68bd5ccc68-lxbfv:/# touch test.txt
root@devbox-68bd5ccc68-lxbfv:/# vi test.txt
root@devbox-68bd5ccc68-lxbfv:/# gsutil cp test.txt gs://my-bucket-name/
Copying file://test.txt [Content-Type=text/plain]...
/ [1 files][ 10.0 B/ 10.0 B]
Operation completed over 1 objects/10.0 B.
Also I've exec'ed into the loki container, to ensure that key.json is properly mounted, see below:
k exec -it -n loki loki-0 sh
/ $ cat etc/secrets/key.json
{
"type": "service_account",
"project_id": "my-project",
"private_key_id": "123456789",
"private_key": "-----BEGIN PRIVATE KEY-----\nmykey\n-----END PRIVATE KEY-----\n",
"client_email": "loki-access-gcs@my-project.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/loki-access-gcs%40my-project.iam.gserviceaccount.com"
}
/ $ echo $GOOGLE_APPLICATION_CREDENTIALS
/etc/secrets/key.json
P.S.: Obviously, all sensitive data is replaced with sample one (e.g. project name, bucket name and etc)
Please, advice on how to approch the issue or confirm whether this is a Bug, as I can't be certain that the setup above is correct. Thanks!