Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structured logging for CSV format #43283

Open
linux-china opened this issue Nov 25, 2024 · 9 comments
Open

structured logging for CSV format #43283

linux-china opened this issue Nov 25, 2024 · 9 comments
Labels
status: pending-design-work Needs design work before any code can be developed type: enhancement A general enhancement
Milestone

Comments

@linux-china
Copy link

Now structured logging built-in support for Elastic Common Schema (ecs), Graylog Extended Log Format (gelf) and Logstash (logstash), any plan to support CSV format? Now DuckDB, DataFusion, ClickHouse-local all support to query CSV directly with SQL, and CSV friendly to AWK/DataFrame too.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 25, 2024
@mhalbritter
Copy link
Contributor

mhalbritter commented Nov 29, 2024

That's quite an interesting idea, I can see that this is useful. We have to do some research on existing CSV formats, i guess.

@mhalbritter mhalbritter added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Nov 29, 2024
@mhalbritter mhalbritter added this to the 3.x milestone Nov 29, 2024
@linux-china
Copy link
Author

linux-china commented Nov 29, 2024

That's quite an interesting idea, I can see that this is useful. We have to do some research on existing CSV formats, i guess.

My examples now:

$ duckdb -c "select count(*) from read_csv('http://localhost:8888/actuator/logfile') where column01 = 'ERROR'"
$ duckdb -c "select * from read_csv('http://localhost:8888/actuator/csv/metrics') where name like 'jvm%'"

Of course, you can union logfiles/metrics from multi instances, and it's very convenient.

@ivamly
Copy link
Contributor

ivamly commented Nov 29, 2024

Hello, @mhalbritter!
May I work on this issue? If so, do you have any suggestions or guidance on how to get started? Thank you!

@linux-china
Copy link
Author

Examples for logfile, metrics, env, beans with CSV support.

image

@mhalbritter
Copy link
Contributor

Hey @ivamly, thanks for the offer. For this issue, we'd like to spend some time on design work, so it's not open for contributions yet.

@mhalbritter mhalbritter added the status: pending-design-work Needs design work before any code can be developed label Dec 2, 2024
@wilkinsona
Copy link
Member

wilkinsona commented Dec 2, 2024

@linux-china are you aware of any standards or conventions for the column ordering, the contents and their format, and so on in the CSV data for logging?

@linux-china
Copy link
Author

linux-china commented Dec 2, 2024

Now I use @JsonPropertyOrder to convert POJO to CSV as following:

@JsonPropertyOrder({"id", "nick", "email", "tags"})
public class User {
    private Integer id;
    private String nick;
    private String email;
    private String tags;

For column type or format, and I think CSV Schema Language 1.2 some complicated.

DuckDB use struct style: column1: type, colum2: type , and example as following:

SELECT *
FROM read_csv('flights.csv',
    delim = '|',
    header = true,
    columns = {
        'FlightDate': 'DATE',
        'UniqueCarrier': 'VARCHAR',
        'OriginCityName': 'VARCHAR',
        'DestCityName': 'VARCHAR'
    });

For column ordering and format, and I think the following is fine.

logging.structured.csv.format=column1:type, column2:type, mdc_user, key_code, message

type name is not required if it's text, for most time, and type is not necessary at all. mdc_ prefix is for MDC, and key_ prefix is for KeyValuePair from slf4j 1.3.

Another question is about CSV headers. For new created logfile or rotatated logfile, the headers should be added as first line.

@philwebb
Copy link
Member

philwebb commented Dec 2, 2024

Another question is about CSV headers. For new created logfile or rotatated logfile, the headers should be added as first line.

This will be quite tricky for us as currently StructuredLogFormatter has no knowledge of the way logs are being written. It will also be difficult if an app is restarted and appends to an existing log.

@linux-china
Copy link
Author

@philwebb CSV headers is not a must, and most developers will use schema by themselves, supplied by schema registry or input by themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: pending-design-work Needs design work before any code can be developed type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

6 participants