We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently, Cobrix uses RDDs and .textFile() for handling UTF-8 ASCII files. For other encodings slower processing is used.
.textFile()
This can be improved based on this idea: https://github.com/apache/spark/pull/21287/files
Improve performance of non-UTF8 encoded ASCII files processing
The text was updated successfully, but these errors were encountered:
#572 Paradoxically no need to pass charset name when splitting custom…
8a881c8
… encoded text.
#572 Improve performance of ASCII files processing with custom charset.
02d6da8
cf2b579
6c84d22
No branches or pull requests
Background
Currently, Cobrix uses RDDs and
.textFile()
for handling UTF-8 ASCII files. For other encodings slower processing is used.This can be improved based on this idea:
https://github.com/apache/spark/pull/21287/files
Feature
Improve performance of non-UTF8 encoded ASCII files processing
The text was updated successfully, but these errors were encountered: