-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request to Add Support for IBM-277 (EBCDIC Code Page 277) in Cobrix #722
Comments
Hi, yes, will add CP277. Might reach you out with questions. It might take a week or so. In the meantime, there is a workaround to use. You can define a custom code page like this: cobrix/spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/utils/CustomCodePage.scala Line 21 in d3e953f
And then, you can use it in Spark when loading data by specifying the fully qualified class name of the custom code page:
If you decide to go the custom code page way, please share your CP277 table. This would speed up incorporating it directly into Cobrix. But in any case, we are going to add the support for CP277 soon. |
Going with the table from here: https://en.wikibooks.org/wiki/Character_Encodings/Code_Tables/EBCDIC/EBCDIC_277 Will check against https://www.ibm.com/docs/en/SSEQ5Y_12.0.0/com.ibm.pcomm.doc/reference/pdf/hcp_referenceV58.pdf |
Hi, |
Great, so if you not in a rush, I'll let you know when the code page is added sometime next week, and you can test if it works for your use case. In this case, you don't have to add custom code pages. |
Hi again, Thank you so much! I didn't have time to look into it last week. But I would definitely like to test it out with the current test data I have. |
The fix is merged. The 'cp277' should be available now. Can you test the current master branch? Or if you want, I can generate a snapshot artifact for you.
|
Hi, |
Awesome! This will be part of the new version soon. |
Released v2.7.9 with the fix |
I'm currently working with COBOL files that use the IBM-277 (EBCDIC Code Page 277) encoding, which is primarily used for Danish and Norwegian characters. Unfortunately, it appears that CP277 is not one of the built-in EBCDIC code pages in Cobrix. As a result, I'm unable to process these files correctly due to the missing support for certain characters, such as Æ, Ø, and Å, among others.
Additionally, I tried using the CP037 code page to parse the data, but it led to incorrect results. For instance, the # character is parsed as a whitespace " " and the ! character is parsed as pipe "|". This issue makes it challenging to work with COBOL files encoded in CP277 using the current built-in options.
I would like to kindly request that support for IBM-277 (CP277) be added to Cobrix as an available option. This would allow users like me to seamlessly decode files that use this EBCDIC encoding, particularly for Danish and Norwegian-specific characters.
The text was updated successfully, but these errors were encountered: