Skip to content

[BUG] CSV output does not handle entries with CRLF #3514

@Michael-S

Description

@Michael-S

What is the bug?
CSV output of entries like a\nb or a\r\nb should, according to RFC 4180, escape them to be "a\nb" and "a\r\nb", respectively, respectively. See section 2.5, https://www.rfc-editor.org/rfc/rfc4180

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Add values to a test index
#!/usr/bin/env bash

# Assuming OPENSEARCH_SERVER is the server,
# OPENSEARCH_USER is the host, OPENSEARCH_PASSWORD is the password.
# and the user has the rights to create and query this index.

# Add an entry to the index that has a newline in it.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/1" \
    --insecure \
    -H 'Content-Type: application/json' \
    -d'{  "field1": "a\nb" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has a quote in it.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/2" \
    --insecure \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has no quotes, newlines, or commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/3" \
    --insecure \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "a b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
  1. Query the index to CSV output or RAW output
#!/usr/bin/env bash

# CSV
curl -XPOST "https://${OPENSEARCH_SERVER}:9200/_plugins/_sql?format=csv" \
    --insecure \
    -H 'Content-Type: application/json' \
    -d'{ "query": "SELECT * FROM csv_error LIMIT 50" }' \
    --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"

# Raw
curl -XPOST "https://${OPENSEARCH_SERVER}:9200/_plugins/_sql?format=raw" \
    --insecure \
    -H 'Content-Type: application/json' \
    -d'{ "query": "SELECT * FROM csv_error LIMIT 50" }' \
    --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"

Output:

field1
a                      <-- no start quote
b                      <-- no end quote
"""a"" b"

What is the expected behavior?
The entries with \r and \n in any combination will be quoted.

What is your host/environment?

  • OS: Ubuntu 20.04
  • Version 2.19.1
  • Plugins - default

Do you have any additional context?
I should have caught this when I worked on #3050

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions