Skip to content

Implement SQL Subscriber #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2018
Merged

Conversation

ChrisBr
Copy link
Collaborator

@ChrisBr ChrisBr commented Nov 30, 2018

ActiveSupport instrumentation sql.active_record events are getting
now also written into the InfluxDB.

Therefore the RenderSubscriber was splitted into a base class
which gets reused in the SqlSubscriber.

The queries also getting normalized from parameters first, e.g.
'SELECT * FROM posts where id = 42' becomes
'SELECT * FROM posts where id = xxx'. This idea was shameless copied
from http://stevenyue.com/blogs/tracking-sql-queries-in-rails/.

https://guides.rubyonrails.org/active_support_instrumentation.html#sql-active-record

This got extracted #53 and needs to get rebased as soon as #53 is merged.

@ChrisBr ChrisBr changed the title Sql subscriber Implement SQL Subscriber Nov 30, 2018
@ChrisBr
Copy link
Collaborator Author

ChrisBr commented Nov 30, 2018

I'll sample a few queries from a few different Rails apps I've access to, and throw it against your normalizer. I suspect the interesting queries (those with sub-sejects, joins, and/or group-by clauses) are going to be boiled down too much. I fear that the simple regex is both too simple has a too large CPU footprint (I've seen hundrets of queries on a single page).

That would be great. As said, maybe I can also run it for a few days on our server. With the path element we pretty quickly experienced that it is not usable as a tag.

Skylight does something similar, they combine it with the partial and view results

That looks to me, as if the omit the conditions from the query (I see select from venues, but no select from venue where x in (y). This suggests two tags (table_name and op).

Hm, right. I will maybe dig a little into the skylight gem to see how they normalize the queries.

@ChrisBr
Copy link
Collaborator Author

ChrisBr commented Nov 30, 2018

Ok, seems skylight indeed does only store operation and table:

require 'skylight'

result = Skylight.lex_sql("SELECT name FROM users WHERE id = 42")
puts result.inspect
# => ["SELECT FROM users", "SELECT name FROM users WHERE id = ?"]

Additionally they do the normalization in a C extension. So I agree that we follow your suggestion to store operation and table as tags and the normalized query as a field value.

Question: Do you think the regex will really be a problem performance wise? Do you have any other suggestion (except writing a C extension as well?). Or we just go ahead with the regex for now and implement a switch to disable this metric (e.g. when setting the series_name to nil / ""). What do you think?

@dmke dmke mentioned this pull request Nov 30, 2018
4 tasks
@dmke
Copy link
Collaborator

dmke commented Nov 30, 2018

Additionally they do the normalization in a C extension. So I agree that we follow your suggestion to store operation and table as tags and the normalized query as a field value.

👍

Question: Do you think the regex will really be a problem performance wise? Do you have any other suggestion (except writing a C extension as well?). Or we just go ahead with the regex for now and implement a switch to disable this metric (e.g. when setting the series_name to nil / ""). What do you think?

I'd prefer not to write and/or maintain C extensions. Having an off-switch for a regex-based query instrumentation sounds great – but I'd have this feature disabled by default, where the consumer needs to opt-in.

@dmke dmke added this to the 1.0 milestone Dec 1, 2018
@ChrisBr
Copy link
Collaborator Author

ChrisBr commented Dec 1, 2018

I'd prefer not to write and/or maintain C extensions. Having an off-switch for a regex-based query instrumentation sounds great – but I'd have this feature disabled by default, where the consumer needs to opt-in.

@dmke ok, I will implement this as soon as #53 is implemented.

@ChrisBr ChrisBr force-pushed the sql_subscriber branch 3 times, most recently from ceaaa4b to 3637a8f Compare December 2, 2018 17:24
@ChrisBr
Copy link
Collaborator Author

ChrisBr commented Dec 2, 2018

@dmke Rebased and implemented your feedback:

  • SQL reports are disabled by default
  • We report now sql operation, class name and name
  • The normalized query gets stored as an additional value :sql

Please have a look again, thanks 👍

Copy link
Collaborator

@dmke dmke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, some nitpicks. I'm not sure whether the SimpleSubscriber class add an unnecessary level of abstraction... let me think about this a bit.

README.md Outdated
@@ -72,6 +73,10 @@ defined in `lib/influxdb/rails/configuration.rb`

Out of the box, you'll automatically get reporting of your controller,
view, and db runtimes and rendering of template, partial and collection for each request.
Reporting of SQL queries is disabled by default because it is still in experimental mode
and currently requires String parsing which might cause performance issues on query
intensive applications. You can enable it by setting the ``series_name_for_sql``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ``series_name_for_sql``
+ `series_name_for_sql`

timestamp: ts
InfluxDB::Rails.client.write_point series_name,
values: { value: value },
tags: tags(payload),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the optimization introduced in 1734e4f.

end

describe "#track?" do
it { expect(described_class.new("INSERT").table_name).to be true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to confuse track? with table_name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right ... and it was not even executed because of the missing _spec.rb postfix 🤦‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shame Rubocop didn't catch that, whilst it complains about everything else :-D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That... causes too much work :-)

{ value: ((finished - started) * 1000).ceil }
end

def time_stamp(finished)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use timestamp, not time_stamp.

ActiveSupport instrumentation sql.active_record events are getting
now also written into the InfluxDB.

Therefore the RenderSubscriber was splitted into a base class
which gets reused in the SqlSubscriber.

The queries also getting normalized from parameters first, e.g.
'SELECT * FROM posts where id = 42' becomes
'SELECT * FROM posts where id = xxx'. This idea was shameless copied
from http://stevenyue.com/blogs/tracking-sql-queries-in-rails/.

https://guides.rubyonrails.org/active_support_instrumentation.html#sql-active-record
@ChrisBr
Copy link
Collaborator Author

ChrisBr commented Dec 2, 2018

@dmke fixed

I'm not sure whether the SimpleSubscriber class add an unnecessary level of abstraction... let me think about this a bit.

Hm, I see you ... however, the render and sql subscriber share quite some functionality 🤔 I was even thinking about moving everything into the Subscriber class and just overwrite the call method in RequestSubscriber as this potentially is the only subscriber which sends several datapoints to Influx. All other subscriber send a single data point. Or maybe we also split the request subscriber into just sending a single datapoint but subscribe it three times instead? Might be a little cleaner ...

@dmke
Copy link
Collaborator

dmke commented Dec 2, 2018

RequestSubscriber [...] is the only subscriber which sends several datapoints to Influx.
[...] split the request subscriber into just sending a single datapoint but subscribe it three times instead? Might be a little cleaner ...

Cleaner maybe, but this would be unnecessary implementation work for a temporary workaround. Pending #56, I'd like to resolve this in the next week, so you don't need to touch this ;-)

@dmke dmke merged commit 3a46a3e into InfluxCommunity:1.0-beta Dec 7, 2018
@ChrisBr
Copy link
Collaborator Author

ChrisBr commented Dec 9, 2018

Thanks for merging and releasing a new version ❤️ And happy birthday (xing...) 🍰 🎉

@dmke
Copy link
Collaborator

dmke commented Dec 9, 2018

No problem (and thanks! 🎈)

@ChrisBr ChrisBr deleted the sql_subscriber branch January 13, 2019 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants