New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

PostgreSQL output plugin #3428

Closed

svenklemm wants to merge 79 commits into influxdata:master from svenklemm:postgres

svenklemm commented Nov 5, 2017 •

edited

Loading

This is an output plugin to store metrics in a postgresql database. It will create the tables required if they do not yet exist. Table names are derived from the metric name.

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.

svenklemm mentioned this pull request

Postgresql output plugin (for using with timescaledb) #3408

Closed

cwadley commented Feb 19, 2018

@svenklemm Any update on the status of this PR? As per #3408 it seems like it might be ready to merge. Maybe add @danielnelson as a reviewer? I'd love to use this plugin.

Author

svenklemm commented Feb 22, 2018

@cwadley I'm using it for a couple weeks now without problems but it could use more testing and proper integration tests.

cwadley commented Feb 26, 2018

@svenklemm I'll get to work :)

otherpirate mentioned this pull request

Output plugin to PostgreSQL (Copy, Batch insert) #3912

Closed

3 tasks

Contributor

otherpirate commented Mar 21, 2018

Hi folks,

First things first, thanks @svenklemm I learned a lot with your code

I think this PR #3912 is a better/faster solution for large number of metrics.

saaros commented Apr 19, 2018

I wrote an experimental batching mode on top of this branch, see the commits in https://github.com/saaros/telegraf/tree/postgres_output_batching

This speeds up the processing of 10k metric batches 5-10x on my test environment when running locally and even more when telegraf & pg are running on separate hosts.

Contributor

otherpirate commented Apr 19, 2018

I did some pretty similar, but even faster using COPY command, you can check it in here #3912

saaros commented Apr 19, 2018

Thanks @otherpirate -- I had a look at your PR, but since it didn't have any DDL support I figured it's faster to add batching to this version than DDL to the other one.

Would be great to get something in good enough shape to be merged to master.

Author

svenklemm commented Jun 11, 2018

I've merged the batch changes from @saaros into this PR.
@danielnelson could you give any feedback whats missing to get this merged?

danielnelson added the new plugin label

Contributor

danielnelson commented Jun 12, 2018

@svenklemm Thanks for the heads up, it might take awhile yet but we will at this plugin as well as #3408, #3912, and #4205

Author

svenklemm commented Sep 3, 2018

@danielnelson any update on this?

Contributor

danielnelson commented Sep 4, 2018

@svenklemm Sorry, I haven't had time to work on this yet.

danielnelson added this to the 1.9.0 milestone

glinton approved these changes

View reviewed changes

Contributor

glinton commented Oct 17, 2018

oops, that review was for the most recent changes

glinton self-requested a review

October 17, 2018 17:19

glinton suggested changes

View reviewed changes

Contributor

glinton left a comment

Thanks

plugins/outputs/postgresql/README.md Outdated

+                address = "host=localhost user=postgres sslmode=verify-full"
+                ## A list of tags to exclude from storing. If not specified, all tags are stored.
+                # ignored_tags = ["foo", "bar"]

Contributor

glinton Oct 19, 2018

Remove this, seems you opted to use the global tagexclude

plugins/outputs/postgresql/postgresql.go

		}

		func (p *Postgresql) generateInsert(tablename string, columns []string) string {

Contributor

glinton Oct 19, 2018

nitpick: remove this line

plugins/outputs/postgresql/postgresql.go Outdated

+              		quoted = append(quoted, quoteIdent(column))
+              	}
+              	sql := fmt.Sprintf("INSERT INTO %s(%s) VALUES(%s)", quoteIdent(tablename), strings.Join(quoted, ","), strings.Join(placeholder, ","))

Contributor

glinton Oct 19, 2018

nitpick: just return fmt.Sprintf(...

plugins/outputs/postgresql/postgresql.go Outdated

+              			p.Tables[tablename] = true
+              		}
+              		var columns []string

Contributor

glinton Oct 19, 2018

initialize columns with "time" as a value to skip line 231's append

plugins/outputs/postgresql/postgresql.go Outdated

+              		}
+              		var columns []string
+              		var values []interface{}

Contributor

glinton Oct 19, 2018

initialize values with metric.Time() as a value to skip line 232's append

plugins/outputs/postgresql/postgresql.go Outdated

+              				err := p.db.QueryRow(query, where_values...).Scan(&tag_id)
+              				if err != nil {
+              					// log.Printf("I! Foreign key reference not found %s: %v", tablename, err)

Contributor

glinton Oct 19, 2018

Remove commented code

plugins/outputs/postgresql/postgresql.go Outdated

+              	}
+              	for table_and_cols, values := range batches {
+              		// log.Printf("Writing %d metrics into %s", len(params[table_and_cols]), table_and_cols)

Contributor

glinton Oct 19, 2018

Remove commented code

russorat modified the milestones: 1.9.0, 1.10

glinton previously approved these changes

View reviewed changes

markusr commented Nov 12, 2018

@danielnelson @glinton
Can you merge this? I would like to have this feature.

Contributor

sandervandegeijn commented Jan 5, 2019

Nice work, would like to have this very much :)

russorat modified the milestones: 1.10.0, 1.11.0

siftercoder2 approved these changes

View reviewed changes

danielnelson mentioned this pull request

Merging a number of plugins #5781

Closed

danielnelson modified the milestones: 1.11.0, 1.12.0

Contributor

phemmer commented May 19, 2020 •

edited

Loading

Just ran across another issue.
Some inputs on telegraf take advantage of the fact that if you send multiple points with different fields to the same series with the same timestamp, InfluxDB will automatically merge them to the same point.
For example:

mymeasurement,mytag=x foo=1 1234567890
mymeasurement,mytag=x bar=2 1234567890

Will effectively be merged together as

mymeasurement,mytag=x foo=1,bar=2 1234567890

When the postgresql output sends these, they're getting inserted as separate records.
For example, one input which does this is the cpu input. Here's an example config:

[[inputs.cpu]]
 name_override = "cputest"
 collect_cpu_time = true
[[outputs.postgresql]]
 connection = "host=localhost port=5432 dbname=x user=telegraf password=y"
 table_template = "CREATE TABLE {TABLE}({COLUMNS}); SELECT create_hypertable({TABLELITERAL},'time');"

Which outputs points like this:

> cputest,cpu=cpu0 time_guest=0,time_guest_nice=0,time_idle=173118.64,time_iowait=189.16,time_irq=124.16,time_nice=29.53,time_softirq=96.45,time_steal=75.42,time_system=275.77,time_user=503.56 1589862145000000000
> cputest,cpu=cpu0 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1589862145000000000
> cputest,cpu=cpu1 time_guest=0,time_guest_nice=0,time_idle=173140.95,time_iowait=221.75,time_irq=122.46,time_nice=13.45,time_softirq=100.72,time_steal=75.79,time_system=277.15,time_user=449.39 1589862145000000000
> cputest,cpu=cpu1 usage_guest=0,usage_guest_nice=0,usage_idle=98.03921569007893,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=1.9607843137997953 1589862145000000000
> cputest,cpu=cpu-total time_guest=0,time_guest_nice=0,time_idle=346259.6,time_iowait=410.91,time_irq=246.63,time_nice=42.99,time_softirq=197.18,time_steal=151.22,time_system=552.92,time_user=952.95 1589862145000000000
> cputest,cpu=cpu-total usage_guest=0,usage_guest_nice=0,usage_idle=98.01980198402109,usage_iowait=0,usage_irq=0.9900990099480206,usage_nice=0,usage_softirq=0.9900990099508347,usage_steal=0,usage_system=0,usage_user=0 1589862145000000000

And if you query the data in postgres, it looks like this:

epitome=# select time, cpu, time_user, usage_user from cputest;
          time          |    cpu    | time_user |     usage_user      
------------------------+-----------+-----------+---------------------
 2020-05-19 04:24:40+00 | cpu-total |    954.28 |                    
 2020-05-19 04:24:40+00 | cpu1      |    450.14 |                    
 2020-05-19 04:24:40+00 | cpu0      |    504.14 |                    
 2020-05-19 04:24:50+00 | cpu-total |           | 0.40000000000020464
 2020-05-19 04:24:50+00 | cpu-total |    954.36 |                    
 2020-05-19 04:24:50+00 | cpu1      |           |   0.501002004007604
 2020-05-19 04:24:50+00 | cpu1      |    450.19 |                    
 2020-05-19 04:24:50+00 | cpu0      |           | 0.20000000000038654
 2020-05-19 04:24:50+00 | cpu0      |    504.16 |                    
 2020-05-19 04:25:00+00 | cpu-total |           | 0.40000000000020464
 2020-05-19 04:25:00+00 | cpu-total |    954.44 |                    
 2020-05-19 04:25:00+00 | cpu1      |           | 0.49950049950014835
 2020-05-19 04:25:00+00 | cpu1      |    450.24 |                    
 2020-05-19 04:25:00+00 | cpu0      |           |  0.4008016032056275
 2020-05-19 04:25:00+00 | cpu0      |     504.2 |                    
 2020-05-19 04:25:10+00 | cpu-total |           |  0.5505505505505631
 2020-05-19 04:25:10+00 | cpu-total |    954.55 |                    
 2020-05-19 04:25:10+00 | cpu1      |           |   0.800800800801388
 2020-05-19 04:25:10+00 | cpu1      |    450.32 |                    
 2020-05-19 04:25:10+00 | cpu0      |           | 0.29970029970031614
 2020-05-19 04:25:10+00 | cpu0      |    504.23 |                    
(21 rows)

Since it seems that TimescaleDB doesn't merge points like InfluxDB does (and vanilla postgres definitely doesn't), I think telegraf needs to be responsible for handling this and doing the merge.

Now I'm not sure that this needs to be handled by the postgres plugin itself though. This seems like it might be a good candidate for a processor plugin. I'm also not sure if processors receive points as a continuous stream, or batches like output plugins. If it is a continuous stream, I think it would be good to add a "merge" processor which adds a short time buffer (e.g. 10ms), such that when a point comes through, it's held back for that configured amount of time looking for other points to merge.

endersonmaia commented May 19, 2020

When the postgresql output sends these, they're getting inserted as separate records.

if we have time field as a PK/Unique constraint and make an UPSERT, would solve ?

INSERT INTO .... ON CONFLICT UPDATE...

See: https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT

Contributor

phemmer commented May 19, 2020

if we have time field as a PK/Unique constraint and make an UPSERT, would solve ?

INSERT INTO .... ON CONFLICT UPDATE...

Well right now the plugin doesn't use INSERT, it uses COPY. While bulk inserts do exist, and would allow ON CONFLICT to work, I'm not sure of the implications of switching.

In addition, the creation of a unique constraint will impact performance, but I'm not sure by how much.

endersonmaia commented May 19, 2020

Well right now the plugin doesn't use INSERT, it uses COPY. While bulk inserts do exist, and would allow ON CONFLICT to work, I'm not sure of the implications of switching.

🤔

AFAIK, COPY doesn't support 'ON CONFLICT` nor something similar

we could use a TEMPORARY TABLE to COPY into and then make INSERT... ON CONFLICT from it, but it's a bit hacky to me

Contributor

phemmer commented May 19, 2020 •

edited

Loading

Now I'm not sure that this needs to be handled by the postgres plugin itself though. This seems like it might be a good candidate for a processor plugin. I'm also not sure if processors receive points as a continuous stream, or batches like output plugins. If it is a continuous stream, I think it would be good to add a "merge" processor which adds a short time buffer (e.g. 10ms), such that when a point comes through, it's held back for that configured amount of time looking for other points to merge.

Well it seems that such a plugin already exists: https://github.com/influxdata/telegraf/tree/master/plugins/aggregators/merge
I swear I checked aggregators prior to my post, but it seems I'm blind.

The way the plugin is implemented isn't the greatest solution, as it operates in batches rather than as a stream, meaning that there's still the risk of a single point getting written twice if it's in 2 different batches. But it's better than nothing.

Contributor

phemmer commented Jun 8, 2020 •

edited

Loading

Just ran into another issue. I caused a situation (totally my fault) where a few records couldn't be written to the DB, and so errors were returned back to telegraf. I'm not sure how this plugin is handling transactioning, but other records were inserted fine. The problem is that the plugin kept trying to resend the bad records, along side the good records. This resulted in massive amounts of duplicate data in the database (approx 3.8 million rows).

I think the error handling approach needs to be changed.
I'm not too familiar with error handling with COPY, but from some quick research, it seems like a good solution to this problem would be to have a config param that allows the user to select one of 3 options for error handling:

Use transactions, and fail the whole transaction, retrying it later. The whole thing succeeds, or the whole thing fails.
Ignore errors and continue. Just drop the bad records (with a log message telling the user), allowing the good records, and don't retry.
Use COPY error logging to insert the bad records into a temp table, and periodically retry those records. Perhaps also with an expiry time.

As it is, the only way I could stop the massive data duplication was to bounce telegraf so it'd discard the bad records and stop retrying them.

danielnelson removed this from the 1.15.0 milestone

bithavoc commented Jul 28, 2020

Thanks for such great project, any ETA to merge this PR?

Contributor

sjwang90 commented Jul 28, 2020

@svenklemm or @blagojts, what are your thoughts around @phemmer's error handling concerns above?

duanhongyi commented Aug 23, 2020

I like this RP very much. Is there a time point for merging.

sjwang90 mentioned this pull request

Error parsing /etc/telegraf/telegraf.conf, Undefined but requested output #8022

Closed

mjrussell mentioned this pull request

postgresql input plugin does not honour the "~/.pgpass" file #4870

Closed

ZedZipDev commented Sep 12, 2020

Hi team,
btw, still waiting this plugin.
Thnx

Contributor

phemmer commented Oct 2, 2020

Can we get some clarification on the state of this plugin & pull request.

After about 10 months without any update or response, it would appear that it has been abandoned. This is unfortunate, but it happens as priorities change. But if it needs a new maintainer, can we state that so people know it's open for someone to pick up and carry?

Contributor

sjwang90 commented Oct 2, 2020

@svenklemm, please chime in if you would like to continue to own this. Otherwise anyone who's interested in working on building on @svenklemm's code feel free to address necessary fixes.

@phemmer Seems like you've provided a lot of great feedback if you'd like to take it over.

Contributor

phemmer commented Oct 13, 2020 •

edited

Loading

I don't know that I have the time to be able to take this on. But I'll keep it near the top of my list of potential work when not pressed for other work.

In the mean time I'll keep adding my experiences with it.
So on that note, have another bit of feedback :-)

Currently the plugin's table_template value allows you to define some SQL that'll be run when the table needs to be created. I have one strong recommendation for a change regarding this, and a lesser request.

Change it to accept a list of statements instead of a single statement.
Ran into a limitation when I wanted to have the plugin create a timescaledb table, as well as enable compression on it. You cannot do table_template = "CREATE TABLE ...; SELECT create_hypertable(...); ALTER TABLE ...". If you try you get cannot insert multiple commands into a prepared statement (SQLSTATE 42601).
So the solution is to allow passing multiple statements. All the statements should still be executed within the same transaction though.
Add an option to create a view when using tags_as_foreignkeys. This view would transparently join the main table with the tag table. It might also be nice to throw the tables & the views in separate schemas, as with the view, you'll likely rarely use the underlying tables themselves, so putting them in a separate schema would get them out of the way.
The construction of this view might be somewhat opinionated (what do you name the view, what do you name the underlying tables, etc). So it might also be good to provide parameters so the naming can be controlled.

AtakanColak commented Oct 13, 2020

As someone who used this plugin for months, I must warn everyone that this plugin does not scale well with many agents, many metrics and has many performance issues that quadruple when tags_as_foreignkeys are used. My recommendation for this plugin is a complete overhaul, remake, or even changing to Gorm v2 which has support for create from map. Would work on one if only had time.

Contributor

phemmer commented Oct 13, 2020

As someone who used this plugin for months, I must warn everyone that this plugin does not scale well with many agents, many metrics and has many performance issues that quadruple when tags_as_foreignkeys are used. My recommendation for this plugin is a complete overhaul, remake, or even changing to Gorm v2 which has support for create from map. Would work on one if only had time.

That's interesting. Can you elaborate any further? Given your statement that switching to an ORM will help performance, this implies that the performance issue is client side. I'm having a hard time imagining what that could be. I would expect the issue to be on the database side as the number of different series grows, in which it's not an ORM problem, but one of defining a proper segmentation key.

Also I'm not familiar with GORM, but ORMs tend to revolve around mapping defined types (e.g. struct{}s) to DB schemas. Given that you cannot create defined types for all the possible input metrics, how would that even work?

AtakanColak commented Oct 13, 2020 •

edited

Loading

As someone who used this plugin for months, I must warn everyone that this plugin does not scale well with many agents, many metrics and has many performance issues that quadruple when tags_as_foreignkeys are used. My recommendation for this plugin is a complete overhaul, remake, or even changing to Gorm v2 which has support for create from map. Would work on one if only had time.

That's interesting. Can you elaborate any further? Given your statement that switching to an ORM will help performance, this implies that the performance issue is client side. I'm having a hard time imagining what that could be. I would expect the issue to be on the database side as the number of different series grows, in which it's not an ORM problem, but one of defining a proper segmentation key.

Also I'm not familiar with GORM, but ORMs tend to revolve around mapping defined types (e.g. struct{}s) to DB schemas. Given that you cannot create defined types for all the possible input metrics, how would that even work?

Performance issues exist both on DB side and client side. For example, tags_as_foreignkeys caused a block on the DB side with a high load of cpu and memory usage. On client (telegraf) side, this plugin parses the queries by hand to my knowledge whereas gorm already optimizes this, so I think that would be improvement over high CPU usage of this plugin which I measured to be about %60 of telegrafs total cpu usage. If it was DB related then high cpu usage on client side would be suspicious. And regarding type part, I think it would need a workaround if gorm was to be used, with tables still created by hand, but metrics directly passed to gorm as maps.

I would like to reiterate my warning especially for deployments in commercial projects.

sjwang90 added the help wanted label

Contributor

phemmer commented Nov 11, 2020

Just FYI, I've started looking into addressing some of the issues on this plugin.

@AtakanColak I think I found the performance issue you're experiencing. The issue is that when using tags_as_foreignkeys, telegraf has to perform a query against the tag table to find the tag ID for a given series. Additionally it performes this task on a one-by-one basis, which is extremely slow. Now the plugin does cache the IDs it obtains, however it's only an LRU cache with a default size of 1000 (which isn't documented on timescaledb's docs), which is very easily exceeded. And once it is exceeded, the performance immediately tanks. There is no in-between. As soon as you hit 1001 items, the LRU is effectively useless. On top of that, when telegraf first starts up, the cache is cold, and in my case it took several minutes to populate it, during which time telegraf can't keep up and is dropping metrics.

I was able to solve this issue by changing the code to use a deterministic key. Meaning telegraf computes the tag ID and doesn't have to do a lookup.

With this change I was able to go from about 15k points per minute to about 50k.

Additionally with the need for using the aggregators.merge plugin as explained here, this ended up being the next bottleneck. I opened PR #8391 to address that issue.

With that, and some other hackery involving multiple connections, I'm able to get telegraf up to about 1m points per minute, a massive improvement over 15k. But it's still telegraf bottlenecking, and not the DB.

I'm going to clean up the plugin a bit so I can remove some of the performance hacks I had to do, and will submit to a new branch.

sjwang90 added wip and removed help wanted labels

phemmer mentioned this pull request

feat: postgresql output #8651

Closed

3 tasks

Contributor

phemmer commented Jan 5, 2021

I've put up a draft PR of the changes over at #8651. While it is a draft, it should be fully functional, and I would welcome feedback.

The plugin has been massively reworked. But it's a lot more flexible, fault tolerant, and performant.

sjwang90 added the plugin/output label

Contributor

sjwang90 commented Feb 1, 2021

Closing this for #8651

sjwang90 closed this

devosalain commented Jul 15, 2021

I use freebsd13.0 Where can i download the sourcecode for this plugin to compile it ?

powersj mentioned this pull request

feat(outputs.postgresql): add Postgresql output #11672

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new plugin plugin/output

37 participants