Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to Add Support for IBM-277 (EBCDIC Code Page 277) in Cobrix #722

Closed
Maggie9402 opened this issue Oct 23, 2024 · 9 comments · Fixed by #724
Closed

Request to Add Support for IBM-277 (EBCDIC Code Page 277) in Cobrix #722

Maggie9402 opened this issue Oct 23, 2024 · 9 comments · Fixed by #724
Labels
enhancement New feature or request

Comments

@Maggie9402
Copy link

I'm currently working with COBOL files that use the IBM-277 (EBCDIC Code Page 277) encoding, which is primarily used for Danish and Norwegian characters. Unfortunately, it appears that CP277 is not one of the built-in EBCDIC code pages in Cobrix. As a result, I'm unable to process these files correctly due to the missing support for certain characters, such as Æ, Ø, and Å, among others.

Additionally, I tried using the CP037 code page to parse the data, but it led to incorrect results. For instance, the # character is parsed as a whitespace " " and the ! character is parsed as pipe "|". This issue makes it challenging to work with COBOL files encoded in CP277 using the current built-in options.

I would like to kindly request that support for IBM-277 (CP277) be added to Cobrix as an available option. This would allow users like me to seamlessly decode files that use this EBCDIC encoding, particularly for Danish and Norwegian-specific characters.

@Maggie9402 Maggie9402 added the enhancement New feature or request label Oct 23, 2024
@yruslan
Copy link
Collaborator

yruslan commented Oct 25, 2024

Hi, yes, will add CP277. Might reach you out with questions. It might take a week or so.

In the meantime, there is a workaround to use. You can define a custom code page like this:

class CustomCodePage extends SingleByteCodePage(CustomCodePage.ebcdicToAsciiMapping) {

And then, you can use it in Spark when loading data by specifying the fully qualified class name of the custom code page:

.option("ebcdic_code_page_class", "za.co.absa.cobrix.spark.cobol.source.utils.CustomCodePage")

If you decide to go the custom code page way, please share your CP277 table. This would speed up incorporating it directly into Cobrix. But in any case, we are going to add the support for CP277 soon.

@yruslan
Copy link
Collaborator

yruslan commented Oct 25, 2024

@Maggie9402
Copy link
Author

Hi, yes, will add CP277. Might reach you out with questions. It might take a week or so.

In the meantime, there is a workaround to use. You can define a custom code page like this:

class CustomCodePage extends SingleByteCodePage(CustomCodePage.ebcdicToAsciiMapping) {

And then, you can use it in Spark when loading data by specifying the fully qualified class name of the custom code page:

.option("ebcdic_code_page_class", "za.co.absa.cobrix.spark.cobol.source.utils.CustomCodePage")

If you decide to go the custom code page way, please share your CP277 table. This would speed up incorporating it directly into Cobrix. But in any case, we are going to add the support for CP277 soon.

Hi,
Thank you for responding so quickly. I’ve been looking into using the custom code page, but as I’m still new to this, it may take me a little time to test it fully.

@yruslan
Copy link
Collaborator

yruslan commented Oct 25, 2024

Great, so if you not in a rush, I'll let you know when the code page is added sometime next week, and you can test if it works for your use case. In this case, you don't have to add custom code pages.

@Maggie9402
Copy link
Author

Great, so if you not in a rush, I'll let you know when the code page is added sometime next week, and you can test if it works for your use case. In this case, you don't have to add custom code pages.

Hi again,

Thank you so much! I didn't have time to look into it last week. But I would definitely like to test it out with the current test data I have.

@yruslan
Copy link
Collaborator

yruslan commented Oct 30, 2024

The fix is merged. The 'cp277' should be available now. Can you test the current master branch? Or if you want, I can generate a snapshot artifact for you.

.option("ebcdic_code_page", "cp277")

@yruslan yruslan reopened this Oct 30, 2024
@Maggie9402
Copy link
Author

The fix is merged. The 'cp277' should be available now. Can you test the current master branch? Or if you want, I can generate a snapshot artifact for you.

.option("ebcdic_code_page", "cp277")

Hi,
Great News! Thanks for your help! I have tried generating the snapshot from the feature branch this morning and it gives me a expected decoded output with the test sample :)

@yruslan
Copy link
Collaborator

yruslan commented Oct 30, 2024

Awesome! This will be part of the new version soon.

@yruslan
Copy link
Collaborator

yruslan commented Nov 8, 2024

Released v2.7.9 with the fix

@yruslan yruslan closed this as completed Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants