-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark DataFrame Writer for Cobol datafiles #415
Comments
This sounds great. The demand for the feature seems to exists already, but the feature requires a lot of effort. This could be a good collaboration. As soon as the implementation of VBVR is finished (probably end of the next week), I can prepare a design document for a Cobol fire writer. We can discuss features the writer can support and prioritize features required for your use case. The features that are useful but not immediately required for you we can implement later from our side. |
I had a meeting to discuss the first draft of these requirements and one of my peers suggested that while dynamically creating a copybook from a Spark schema and declarative configuration was a nice feature, that it might be complex to implement and isn't really necessary for MVP. My colleague suggested that perhaps a better idea would be to require a copybook layout be passed into the data frame writer, since we would have to set static field sizes for every column in the data frame anyway. Of course we would have to verify that the DF schema can be mapped to the Copybook schema, but that may be an easier lift than programmatically generating a copybook. In our use case the copybook is defined by our business partner, and we would have to ensure that the DF we generate can map to the service contract (copybook) that they are expecting. Also on the subject of narrowing the MVP features, our use case only requires a single code page (I believe it is cp037 but will verify with the business partner), and only big endian. All of our data ingest code uses CodePageCommon which is working adequately so far. |
Good. We can start looking into requirements in about 2 weeks. Actually, generating our own copybook from a Spark dataframe is easier since we can choose output data types. Conforming to an existing copybook would require supporting the plethora of formats that COBOL supports (picture, usage, etc). But conforming to an existing copybook is something that is usually required, so that's something that we should implement at some point anyway. And since it matches your use case we can look into that first. Supporting only cp037 or basic + cp037 is good as well. |
What about data formats? Do you need the support of F, V, VB (RDW, no RDW, BDW+RDW), or we can just start with basic V (RDW)? |
I have a colleague researching this now, but the preliminary answer is that we need FB and VB formats. In a day or two I'll have a final answer and copybooks for you to review. |
Mark is leaving Nordstrom and I will be taking over as a contact for Nordstrom |
@yruslan as @milehighhokie indicated I have accepted a new position in another company and Bill will be taking over this issue for my former employer. We had a turnover meeting this morning, and I reminded him that you are still waiting on copybook examples for the outbound data transfer use case that I outlined in this issue. I want to extend my thanks for the excellent support I have received while using Cobrix, and in particular I appreciate the opportunity to collaborate with you on adding the new record format readers. |
Thanks for the kind words, Mark! Enjoy the holiday season and the best of luck at the new role! @milehighhokie , looking forward to future collaboration. |
Hi yruslan, we have a similar requirement for copybook writer. You have closed this issue. Did you make any progress in Spark Dataframe writer for copybook data files? |
Hi, sorry, the writer would require a lot of effort and we don't have the capacity nor internal demand for it at the moment. |
Background
I work for a credit card company in the retail sector, and we are currently utilizing Cobrix to acquire data from our credit card transaction processor and produce business events to Kafka for our event driven architecture and analytic platform.
Thanks to @yruslan and his work with #338 Cobrix is now fully functional for our data ingest use case, however, our electronic data interchange with this business partner is bidirectional.
For example we receive mainframe data transmissions for things like customer purchases, and account status. But we also have to transmit monetary data to our mainframe based partner for things like credits and adjustments, and non-monetary data for account configuration changes including but not limited to change of address.
Additionally, we also believe that such a feature could also be used to simplify the process of creating test data for our system.
Feature
Implement a Spark DataFrame writer for Cobol data, the feature should:
Proposed Solution [Optional]
We could contribute development labor to the implementation of the feature, however we would need assistance with high level design should such a feature be accepted.
At this point I would like to open a discussion about how such a feature might be implemented, and as I mentioned we would be willing to contribute some development labor to help make this feature a reality, but we would need some assistance in the architecture of the solution.
The text was updated successfully, but these errors were encountered: