Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for testing block_given? functionality of DataFrame.from_csv() method. #308

Open
v0dro opened this issue Feb 10, 2017 · 14 comments

Comments

@v0dro
Copy link
Member

v0dro commented Feb 10, 2017

The DataFrame.from_csv method currently has a provision for accepting blocks and performing some manipulation on a row that has been read before loading the data into a dataframe.

However, there are no tests in io_spec.rb for testing this.

Tests should amply test error conditions too.

@zverok
Copy link
Collaborator

zverok commented Feb 15, 2017

@v0dro is there a real life reason for this functionality, in the first place?

@v0dro
Copy link
Member Author

v0dro commented Feb 25, 2017

I recently saw a PR that attempted to remove the line containing block_given? and the specs passed since there was no test of this sort.

@v0dro v0dro closed this as completed Feb 25, 2017
@v0dro v0dro reopened this Feb 25, 2017
@zverok
Copy link
Collaborator

zverok commented Feb 25, 2017

Yes, I understand why we need the test, if functionality exists.
But my question is WHY the functionality exists?

@v0dro
Copy link
Member Author

v0dro commented Feb 27, 2017

Say you have a CSV column that contains dates in the form DD MONTH YEAR (example 12 february 2016) and you want to convert this to a DateTime when you read the file to a dataframe by using your own conversion logic. The easiest way to do it would be by passing a block to from_csv that can modify the data as it comes.

@gnilrets
Copy link
Contributor

I process a lot of CSV files and have gotten into the habbit of reading all fields as strings and doing conversions after it's built into a dataframe.

@zverok
Copy link
Collaborator

zverok commented Feb 28, 2017

Got it, thanks 👍

@v0dro
Copy link
Member Author

v0dro commented Mar 1, 2017

@gnilrets for smaller dataframes and simpler usage scenarios I think passing a block is more readable and straightforward.

@gusandrianos
Copy link

I want to work on this. I'm looking for something I can do for GSoC and I think it's a good fit.

@v0dro
Copy link
Member Author

v0dro commented Mar 26, 2017

@gusandrianos yes this would be a great and simple issue to start with. Have you had a look at the source code yet? You should hurry up with your proposal since the deadline for submitting the final proposal is 4th April.

@gusandrianos
Copy link

gusandrianos commented Mar 26, 2017

@v0dro This wasn't what I had in mind for GSoC so having to submit patches for every organization I am interested in kind of caught me off guard. I'll try to solve this quickly as this is the only thing missing from my proposal. :)

Anything you want me to know before starting?

@v0dro
Copy link
Member Author

v0dro commented Mar 26, 2017

Well this is a pretty easy patch so I don't think you will require my help for it. Make sure you submit your draft proposal early. A proposal without a patch submission is also fine since we can start evaluating it. You can always add information about the code submission later.

@gusandrianos
Copy link

@v0dro That's awesome, I haven't really found anything that fits me better than SciRuby.

@gusandrianos
Copy link

gusandrianos commented Mar 26, 2017

I am a bit confused. Can you give a usage example? I'm stuck on this for a while now.

@parthm
Copy link
Contributor

parthm commented Oct 10, 2017

Can we close this (based on discussion on #413)? #428 has been filed removal of block support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants