Skip to content

Add a streaming-based CSV parser to k6 #2976

Closed
@oleiade

Description

@oleiade

Users with big CSV files with a size superior to, say, 500MB, and using a large numbers of VU directly experience our issue with handling large files.

As a result, we would like for k6 to offer an alternative way to handle CSV files. Ideally, we would like it to be streaming-based, and to hold only a subset of the data at a time in memory. That way k6 memory footprint would remain sustainable for such users.

Non-final target API

import http from 'k6/http';
import { csv } from 'k6/files'

let filename = '10M-rows.csv';

// username, password, email
// pawel, test123, pawel@k6.io
// ...

// not using the old open() api.
// let fileContents = open(filename);

let fileHandle = streamingOpenFileHandler(filename);

const csvHandler = csv.objectReader(fileHandle.stream, {
  delimiter: ',',
  consumptionStrategy: 'uniqueSequential', // VU-safe, non-repeating.
  endOfFileHandling: 'startFromBeginning', // what to do when we run out of rows
})

export default function () {
  let object = csvHandler.next() // unique row across all VUs
  object.username

  const res = http.post('http://test.k6.io/login', {
    user: object.username,
    pass: object.password
  });
}

Prerequisites

However, being able to provide such an alternative implementation of a CSV parser that would work both for open-source and cloud users is currently blocked by issues listed in "improving the handling of large files in k6".

Namely, we would first need the ability to access such files, seek through them, and stream their content without having to first decompress them on disk, and without having to load their whole content in memory first. Another prerequisites would also be the presence of an API that allows to open and read files separately too, as opposed to storing their content in memory.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions