Skip to content

[Filebeat] Inefficient Grok expression in Postgresql module causes Cluster to Stop Responding #7201

Closed
@gingerwizard

Description

@gingerwizard
  • Version: 6.2.4
  • Operating System: Linux k8 on GCP
  • postgresql - 9.6.2

The postgresql module for filebeat contains a grok expression for parsing the log lines https://github.com/elastic/beats/blob/master/filebeat/module/postgresql/log/ingest/pipeline.json#L8-L13

This expression works but is extremely inefficient. Even on moderate loads (100 docs/sec) the indexing latency increases and causes the ES queue to fill - thus preventing any indexing on the cluster. ES hot_threads highlight the hot spot:

97.4% (486.9ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000005][bulk][T#1]'
     3/10 snapshots sharing following 16 elements
       org.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:203)
       org.joni.Matcher.matchCheck(Matcher.java:304)
       org.joni.Matcher.searchInterruptible(Matcher.java:457)
       org.joni.Matcher.search(Matcher.java:318)
       org.elasticsearch.ingest.common.Grok.captures(Grok.java:206)
       org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:65)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)

Testing indicates the https://github.com/elastic/beats/blob/master/filebeat/module/postgresql/log/ingest/pipeline.json#L18
is responsible.

On Postgresql 9.6.2 this isn't actually necessary and the following is sufficient.

{
          "grok": {
            "field": "message",
            "ignore_missing": true,
            "patterns": [
              "^%{LOCALDATETIME:postgresql.log.timestamp} %{WORD:postgresql.log.timezone} \\[%{NUMBER:postgresql.log.thread_id}\\] %{USERNAME:postgresql.log.user}@%{HOSTNAME:postgresql.log.database} %{WORD:postgresql.log.level}:  duration: %{NUMBER:postgresql.log.duration} ms  statement: %{GREEDYDATA:postgresql.log.query}",
              "^%{LOCALDATETIME:postgresql.log.timestamp} %{WORD:postgresql.log.timezone} \\[%{NUMBER:postgresql.log.thread_id}\\] \\[%{USERNAME:postgresql.log.user}\\]@\\[%{HOSTNAME:postgresql.log.database}\\] %{WORD:postgresql.log.level}:  duration: %{NUMBER:postgresql.log.duration} ms  statement: %{GREEDYDATA:postgresql.log.query}",
              "^%{LOCALDATETIME:postgresql.log.timestamp} %{WORD:postgresql.log.timezone} \\[%{NUMBER:postgresql.log.thread_id}\\] %{USERNAME:postgresql.log.user}@%{HOSTNAME:postgresql.log.database} %{WORD:postgresql.log.level}:  ?%{GREEDYDATA:postgresql.log.message}",
              "^%{LOCALDATETIME:postgresql.log.timestamp} %{WORD:postgresql.log.timezone} \\[%{NUMBER:postgresql.log.thread_id}\\] \\[%{USERNAME:postgresql.log.user}\\]@\\[%{HOSTNAME:postgresql.log.database}\\] %{WORD:postgresql.log.level}:  ?%{GREEDYDATA:postgresql.log.message}",
              "^%{LOCALDATETIME:postgresql.log.timestamp} %{WORD:postgresql.log.timezone} \\[%{NUMBER:postgresql.log.thread_id}\\] %{WORD:postgresql.log.level}:  ?%{GREEDYDATA:postgresql.log.message}"
            ],
            "pattern_definitions": {
              "LOCALDATETIME": "[-0-9]+ %{TIME}"
            }
          }
        }

I believe the above can be further refactored - possibly to a single expression. @ph for further action

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions