-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delimiter detection to Reader #16
Comments
Hi, As it stands the library was not design to validate CSV. So detecting CSV controls is out of the scope of the library, for now. CSV documents come in essentially 2 forms, those you create and those you import from outside your controlled environment.
This means that whoever is providing you some data using CSV should tell which controls is being used and how the CSV is being construct just like with any other data transfer format. If the information is not given you should contact the provider and ask him/her to give it to you! So you end up with the first situation where you do know the CSV constrols without processing the document. If processing fails then it's either a bug in the library or the API documentation is wrong!!
What processing software like Open office do is giving you hint or suggestion, the final decision is always made by you because sometimes (often I may say) they are incorrect. |
I have flag your issue as enhancement and I'm working on adding your feature request but I have some questions and maybe you could help me shaping the feature with your feedback. I've created a function called $nb_rows = 5; //the numbers of rows to parse to detect the delimiter by default $nb_rows = 1;
$additional_delimiters = ["\s", "|"]; //additional_delimiters (",", ";", "\t") are already considered
$csv = new Reader('/path/to/my/csv/file');
$res = $csv->detectDelimiter($nb_rows, $additional_delimiters); Now the crucial part is what the return value
using this first option you can do something like this if (is_null($res)) {
echo 'no delimiter detected';
} elseif (false === $res) {
echo 'the CSV is inconsistent it uses several different delimiters';
} else {
$csv->setDelimiter($res);
} In the second option This array will contained the founded delimiters.
$nb_delimiters = count($res);
if ($nb_delimiters == 1) {
$csv->setDelimiter($res[0]);
} elseif ($nb_delimiters > 1) {
echo 'the CSV uses ', implode(' ', $res). ' delimiters';
} else {
echo 'no delimiter detected';
} Can you tell me which version you prefer and why ? Of course if you have a better solution it is more than welcomed 👍 Of note:
|
Hi, I got the impression that you weren't that interested in the feature so it came as a really nice surprise to see you doing this. I'm more than willing to help out if you've got this in a branch that I can try out and send PRs for. How about using exceptions? Either an InconsistentDelimiterException and a NoDelimiterException or a more generic CSVParseException that tells you what the problem is in a message or a method. I think that's more clear than keeping track of the different meanings for null and false in your first option. If you end up getting an exception odds are that reading the file wouldn't give you a good result anyway. Another question is if \s really should be considered as a valid delimiter considering that you'd have to switch the parsing strategy. Using \s really only works if the text is in enclosures so you'd have to keep track of that, you can't just split the string on \s and then count the results. I'm willing to help out with these kinds of special scenarios so that you don't have to spend time on it. Is this stuff in a branch that I can check out and help you with? |
Hi,
|
# The first commit's message is: # This is a combination of 4 commits. # The first commit's message is: implementing detectDelimiter #16 # The 2nd commit message will be skipped: # implementing detectDelimiter with Exceptions #16 # The 3rd commit message will be skipped: # implementing detectDelimiter #16 # The 4th commit message will be skipped: # implementing detectDelimiter with Exceptions #16 # The 2nd commit message will be skipped: # implementing detectDelimiter with Exceptions #16
adding detectDelimiter to AbstractCsv class #16
Hello,
I've tried out your and many other CSV libraries in the hope of finding one that automatically detects the delimiter and sadly none of the libraries I've tried have had a go at this.
I find this odd since it seems like a really useful abstraction that I as the consumer of the library don't want to worry about. The delimiter is a ;? Bakame should have my back without me having to set this manually.
Just grabbing the first line and counting occurences of valid delimiters for a csv file should cover 95% of the cases. Open office lists tab, comma, semicolon and space as valid delimiters for csv.
I'm guessing that if space is the delimiter then you have to use an enclosure for it to work.
The text was updated successfully, but these errors were encountered: