Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV column filtering and re-ordering during parsing #498

Closed
nyamsprod opened this issue Sep 30, 2023 · 0 comments · Fixed by #499
Closed

Add CSV column filtering and re-ordering during parsing #498

nyamsprod opened this issue Sep 30, 2023 · 0 comments · Fixed by #499

Comments

@nyamsprod
Copy link
Member

nyamsprod commented Sep 30, 2023

Feature Request

Q A
New Feature yes
BC Break no

Introduction

TabularDataReader::getRecords allows to set specific headers to the CSV document. Currently the method accepts a list of unique string representing the header value.

  • if the list is shorter than the CSV column count then the missing CSV column are removed from the returned Iterator.
  • If the list is longer than the CSV column count the the extra CSV column are added with a null value.

It would be useful to allow more fined grained mapping by relaxing the limitation imposed by the assumption that a list of string is submitted.

Proposal

TabularDataReader::getRecords optional array should is no longer restricted to a list (ie if and now is a header mapper, mapping the CSV header column index to the header mapper index if present. This will complement the current behaviour with the following rules:

  • if the submitted array is empty; then no header is added
  • else the array key which is an integer will correspond to the CSV column offset start at 0 and the array value will represent the header value.

This means that using TabularDataReader::getRecords you will be able to

  • skip CSV columns
  • re-order CSV columns

let's assume the following CSV document:

Abel,14,M,2004
Abiga,6,F,2004
Aboubacar,8,M,2004
Aboubakar,6,M,2004

With the new feature we would be able to do the following:

use League\Csv\Reader;

$csv = <<<CSV
Abel,14,M,2004
Abiga,6,F,2004
Aboubacar,8,M,2004
Aboubakar,6,M,2004
CSV;

$reader = Reader::createFromString($csv);
$reader->getRecords([3 => 'Year', 0 => 'Firstname', 1 => 'Count'])->first();
var_dump([...$reader]);
//returns something like this
// array:4 [
//     "Year" => "2004"
//     "Firstname" => "Abel"
//     "Count" => "14"
//  ]
  • The column containing the gender information is skipped
  • The other columns are re-arranged
  • To avoid breaking everything else, once the re-arrangement is done, the header given to the returned TabularDataReader should be the returned value of array_values on the submitted header so that a the header list it received reflect the new header order and content.

This feature request encompass the current behaviour while completing the feature.

Affected methods are:

  • Reader:getRecords
  • ResultSet:getRecords
  • Statement::process

Possible BC break

If the header maaper is not a list, the result may differ with previous releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant