Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Python versions of Matchbox utilities #408

Closed
ruebot opened this issue Jan 17, 2020 · 0 comments · Fixed by #463
Closed

Implement Python versions of Matchbox utilities #408

ruebot opened this issue Jan 17, 2020 · 0 comments · Fixed by #463

Comments

@ruebot
Copy link
Member

ruebot commented Jan 17, 2020

RDD Scala DF Python DF
ComputeImageSize
ComputeMD5RDD ComputeMD5DF in progress
ComputeSHA1RDD ComputeSHA1DF in progress
DetectLanguageRDD DetectLanguageDF in progress
DetectMimeTypeTika DetectMimeTypeTikaDF
ExtractBoilerPipeTextRDD ExtractBoilerPipeTextDF
ExtractDateRDD ExtractDateDF
ExtractDomainRDD ExtractDomainDF ✔️
ExtractImageDetails
ExtractImageLinksRDD ExtractImageLinksDF
ExtractLinksRDD ExtractLinksDF
ExtractTextFromPDFs -
GetExtensionMimeRDD GetExtensionMimeDF
RemoveHTMLRDD RemoveHTMLDF ✔️
RemoveHTTPHeaderRDD RemoveHTTPHeaderDF ✔️
NERClassifier -
RemovePrefixWWW RemovePrefixWWWDF ✔️

Stealing @SinghGursimran's very helpful tables here 😃

ruebot added a commit that referenced this issue May 19, 2020
- Resolves #408
- Alphabetizes DataFrameloader functions
- Alphabetizes UDFs functions
- Move DataFrameLoader to df packages
- Move UDFs out of df into their own package
- Rename UDFs (no more DF tagged to the end).
- Update tests as necessary
- Partially addresses #410, #409
- Supersedes #412.
ianmilligan1 pushed a commit that referenced this issue May 19, 2020
- Resolves #408
- Alphabetizes DataFrameloader functions
- Alphabetizes UDFs functions
- Move DataFrameLoader to df packages
- Move UDFs out of df into their own package
- Rename UDFs (no more DF tagged to the end).
- Update tests as necessary
- Partially addresses #410, #409
- Supersedes #412.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant