Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce log option #197

Open
y-ken opened this issue Sep 14, 2020 · 4 comments
Open

Reduce log option #197

y-ken opened this issue Sep 14, 2020 · 4 comments

Comments

@y-ken
Copy link
Contributor

y-ken commented Sep 14, 2020

Do you have some idea to reduce log volume of ingestion as a option?

Once I have ingestion for near the billion of records, There are so huge outputs generated.
I would like to have some option to reduce log volume. But --log-level warn or --log-level error is too much reduced.

2020-09-14 04:02:42.879 +0000 [INFO] (0017:task-0000): SQL: FETCH FORWARD 10000 FROM cur
2020-09-14 04:02:42.890 +0000 [INFO] (0017:task-0000): > 0.01 seconds
2020-09-14 04:02:42.891 +0000 [INFO] (0017:task-0000): Fetched 500 rows.
2020-09-14 04:02:42.893 +0000 [INFO] (0017:task-0000): Fetched 1,000 rows.
2020-09-14 04:02:42.896 +0000 [INFO] (0017:task-0000): Fetched 2,000 rows.
2020-09-14 04:02:42.903 +0000 [INFO] (0017:task-0000): Fetched 4,000 rows.
2020-09-14 04:02:42.916 +0000 [INFO] (0017:task-0000): Fetched 8,000 rows.
2020-09-14 04:02:42.921 +0000 [INFO] (0017:task-0000): SQL: FETCH FORWARD 10000 FROM cur
2020-09-14 04:02:42.931 +0000 [INFO] (0017:task-0000): > 0.01 seconds
2020-09-14 04:02:42.933 +0000 [INFO] (0017:task-0000): Fetched 500 rows.
2020-09-14 04:02:42.935 +0000 [INFO] (0017:task-0000): Fetched 1,000 rows.
2020-09-14 04:02:42.938 +0000 [INFO] (0017:task-0000): Fetched 2,000 rows.
2020-09-14 04:02:42.944 +0000 [INFO] (0017:task-0000): Fetched 4,000 rows.
2020-09-14 04:02:42.957 +0000 [INFO] (0017:task-0000): Fetched 8,000 rows.
2020-09-14 04:02:42.962 +0000 [INFO] (0017:task-0000): SQL: FETCH FORWARD 10000 FROM cur
2020-09-14 04:02:42.971 +0000 [INFO] (0017:task-0000): > 0.01 seconds
2020-09-14 04:02:42.974 +0000 [INFO] (0017:task-0000): Fetched 500 rows.
2020-09-14 04:02:42.976 +0000 [INFO] (0017:task-0000): Fetched 1,000 rows.
2020-09-14 04:02:42.979 +0000 [INFO] (0017:task-0000): Fetched 2,000 rows.
2020-09-14 04:02:42.985 +0000 [INFO] (0017:task-0000): Fetched 4,000 rows.
2020-09-14 04:02:43.000 +0000 [INFO] (0017:task-0000): Fetched 8,000 rows.
2020-09-14 04:02:43.006 +0000 [INFO] (0017:task-0000): SQL: FETCH FORWARD 10000 FROM cur
2020-09-14 04:02:43.113 +0000 [INFO] (0017:task-0000): > 0.11 seconds
2020-09-14 04:02:43.116 +0000 [INFO] (0017:task-0000): Fetched 500 rows.
2020-09-14 04:02:43.118 +0000 [INFO] (0017:task-0000): Fetched 1,000 rows.
2020-09-14 04:02:43.121 +0000 [INFO] (0017:task-0000): Fetched 2,000 rows.
2020-09-14 04:02:43.127 +0000 [INFO] (0017:task-0000): Fetched 4,000 rows.
2020-09-14 04:02:43.140 +0000 [INFO] (0017:task-0000): Fetched 8,000 rows.
@hiroyuki-sato
Copy link
Member

Hello, @y-ken

Have you ever tried |? Otherwise, I think you need to modify the source code.

embulk run conf.yml | grep 'FETCH FORWARD'
2020-09-15 09:52:53.710 +0900 [INFO] (0016:task-0000): SQL: FETCH FORWARD 10000 FROM cur
2020-09-15 09:52:53.738 +0900 [INFO] (0016:task-0000): SQL: FETCH FORWARD 10000 FROM cur

@y-ken
Copy link
Contributor Author

y-ken commented Sep 15, 2020

Hi, yes, | fgrep -v ": Fetched" may help reduce log output as workaround.
According to the code, I thought logger.debug is suitable for some of the log output for production.
Thus I have reported to ask idea.

public ResultSet fetch() throws SQLException
{
logger.info("SQL: " + fetchSql);
long startTime = System.currentTimeMillis();
ResultSet rs = fetchStatement.executeQuery();
double seconds = (System.currentTimeMillis() - startTime) / 1000.0;
logger.info(String.format("> %.2f seconds", seconds));
return rs;
}

@hiroyuki-sato
Copy link
Member

IMO(a contributor comment).
If you introduce a new option and the option keeps current behavior, this proposal will be a good improvement.
(I think that changing loglevel isn't good way. I'm worrying someone capture this log entries.)

fetch_log_output: false # true by default.

@hito4t
Copy link
Contributor

hito4t commented Dec 28, 2020

Hi @y-ken and @hiroyuki-sato ,

Thank you for your proposals!

I think it is no problem to change the loglevel of the fetch logs from INFO to DEBUG.

  • The plugin outputs 'SQL: FECTH ...' every time, but it's enough to output it only first time, because the SQL is never changed.
  • The log 'Fetch NNN rows' wouldn't be important, because it only shows the progress and the final count isn't output.
  • The log '> NN.N seconds' wouldn't be important too, because it only shows the time to fetching the first row, and a lot of rows might be fetched after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants