-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract popular images - Data Frame implementation #382
Conversation
Scala doesn't support function overloading with default arguments. For the RDD implementation, minWidth and minHeight arguments were optional. For the current data frame implementation, they are necessary. If it is required to be kept as optional, I can
|
Codecov Report
@@ Coverage Diff @@
## master #382 +/- ##
=========================================
+ Coverage 76.47% 76.7% +0.22%
=========================================
Files 40 41 +1
Lines 1437 1451 +14
Branches 268 268
=========================================
+ Hits 1099 1113 +14
Misses 221 221
Partials 117 117 |
A new method, @SinghGursimran tests? 😃 |
That makes sense to me too! |
Let's use a different convention for "end-to-end" functionalities. One option would be to have all UDFs be verb phrases, e.g., |
@lintool so, should we have @SinghGursimran change the existing |
Yes, if you like my suggestion of nouns vs. verbs. I.e., UDFs are verbs, "do this". |
Cool. That make sense @SinghGursimran? |
@ruebot |
@SinghGursimran so, for the test. Can we assert other items in the DataFrame that is returned, that is not dependent on the order it returns in? |
Actually, for the archive available in the resources, the count is 1 for each data entry in the row. |
Yes, let's do that to get something in there, and we can loop back around to it later and see if we can in improve it. |
Tested on 10 local GeoCities WARCs:
I'll squash and merge once we get the test. |
#28) * Add example for Scala DF version of "Extract Most Frequent Images MD5 Hash". - See archivesunleashed/aut#382 * rename
Extract popular images - Data Frame implementation
#380
For Testing: