Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement EDIFACT Syntax elements #18

Open
nerdoc opened this issue May 5, 2020 · 25 comments
Open

How to implement EDIFACT Syntax elements #18

nerdoc opened this issue May 5, 2020 · 25 comments

Comments

@nerdoc
Copy link
Member

nerdoc commented May 5, 2020

I just want to gather a few ideas on how to implement interchange syntax elements like UNB, UNH, UNZ etc.: Messages, Control Header, and other things.
There are syntax implemetation guidelines of the UNECE, and other sources in the internet, like here

Best thing would be to write some sort of "syntax description language" which describes these syntax elements, and parse them. There are a few possibilities for the notation of this "syntax description":

  • external files (YAML, TOML, CSV etc.)
  • python classes

I had an idea writing these descriptions as classes, like Django does this in it's ORM, describing a syntax element by a class using attributes:

class CharType(Enum):
    ALPHA = 0
    ALPHANUMERIC = 1
    NUMERIC = 2

class SyntaxElement:
    """Common description of an EDIFACT syntax element"""
    def __init__(
        self,
        id: str,
        mandatory: bool = False,
        type: CharType = None,
        length: int = None,
        max_length: int = None,
    ):
        # checks...

        self.mandatory = mandatory
        self.type = type
        self.max = False
        self.length = 0
        if length:
            self.length = length
        elif max_length:
            self.length = max_length
            self.max = True

The first bytes of an Interchange Control Header would be then:

class InterchangeControlHeader:
    syntax_identifier = SyntaxElement(
        id="0001", mandatory=True, type=CharType.ALPHA, length=4
    )
    syntax_version_number = SyntaxElement(
        id="0002", mandatory=True, type=CharType.ALPHANUMERIC, length=1
    )
    service_code_list_directory_version_number = SyntaxElement(
        id="0080", type=CharType.ALPHANUMERIC, max_length=6
    )
    character_encoding = SyntaxElement(
        id="0133", type=CharType.ALPHANUMERIC, max_length=3
    )

@JocelynDelalande what do you think about that?

@JocelynDelalande
Copy link
Contributor

LGTM ! (good idea to implement it as python classes IMHO, that will be the simplest way).

A Simple note on Wording : SyntaxElement must be called Component to respect EDIFACT wording.

Do you want to use those classes to build the API ? (replacing dict-list current approach) or merely to do data validation, leaving an API with same style for Component and Elements level ?

@nerdoc
Copy link
Member Author

nerdoc commented May 7, 2020

Yeah, I just created a buzz-word - could be named better. Component - hm dunno. I found a list of "service codes" here: https://www.stylusstudio.com/edifact/40102/codelist.htm
Maybe ServiceCode could be appropriate too?
In the first place I just thought about data validation, not API. OMG, EDIFACT is complicated, Why did I start this project again ;-)
I think the API should be done via other structures. #19 seems a better place to discuss that, thanks for that. So, let's just keep this here in mind for validation.

@nerdoc
Copy link
Member Author

nerdoc commented May 7, 2020

It would be good to create something like SegmentTables which define how messages are structured. Here we have to include segment groups (for loops) too, which can be nested. I am still not sure if doing that using Python classes is the best way. or if just lists of e.g. tuples would be better...

From your link, "example of segment groups for the Extended Payment Order (PAYEXT)" - first group.

[
    ("UNH", "Message header", "M", 1),
    ("BGM", "Beginning of Message", "M", 1),
    ("BUS", "Business function", "C", 1),
    # ...
]

This looks a bit cleaner for creating these descriptions. But we would have to write another parser which handles nested groups etc. This is easier done in Python classes, but it is more boilerplate code to write when defining structures.
I like Django very much, and came up with this idea from how Django creates database tables in it's ORM just from Python class declarations which describe the data object. This would be similar here.
I don't know, @JocelynDelalande what do you think?

@JocelynDelalande
Copy link
Contributor

I like Django very much, and came up with this idea from how Django creates database tables in it's ORM just from Python class declarations which describe the data object. This would be similar here.
I don't know, @JocelynDelalande what do you think?

I do not really have any strong opinion on that, sorry.

@nerdoc
Copy link
Member Author

nerdoc commented May 15, 2020

Do you know if there is an official downloadable data format for the implementation of those fields available anywhere? Something like a xml or even CSV which could be parsed?

@maanas
Copy link
Contributor

maanas commented Jun 9, 2020

Use of django classes is a good idea. Another alternative could be to use yml file where we specify the options. The nesting could be well taken care of by the yml files.

@nerdoc
Copy link
Member Author

nerdoc commented Jun 20, 2020

YAML would be an option, but I definitely want to keep pydifact as clean (few dependencies) as possible. So "Django"-like classes will be the way to go.

@srinirokz
Copy link

Where can i ask a questions. ??

@nerdoc
Copy link
Member Author

nerdoc commented Aug 18, 2020

@srinirokz Here, if its related to this issue. Else just start a new issue.

@nerdoc
Copy link
Member Author

nerdoc commented Aug 22, 2020

I asked www.stylusstudio.com where they have a good overview of EDIFACT data if I can parse their site using beautifulsoup and extract Service Code data from it. This would make it much easier to get a bigger amount of data for Service codes.
If they don't agree (which I think that will be the case), I'll asked the official UN source too by mail. I nowhere can see if the EDIFACT data are protected by patents etc. or if one can use them freely.

@theangryangel
Copy link
Contributor

theangryangel commented Jan 12, 2021

Hi people 👋 , I’ve been poking at pydifact today and am I right in reading this issue is about adding support for definitions, which would describe things like segment groups, etc and smartly reading/writing? Am I correct that without this it’s up to users or the library to split groups?

@nerdoc
Copy link
Member Author

nerdoc commented Jan 13, 2021

Yes, that's right. I just don't have time ATM to implement this, so development has slowed down a bit. But nevertheless, PRs are more than welcome... I'd like to create a high level API to read and write documents, including groups.

@theangryangel
Copy link
Contributor

Cool, I just wanted to check that I wasn't fundamentally misunderstanding something :)

I spent far longer than I would like to admit yesterday getting a handle on a project, dealing with malformed sample edifact files, reading specs, etc. and by the end of the day I was going a bit cross-eyed 😅

Practically I wasn't "budgeting" for time to add support, but I may be down the path far enough that a PR or two will come this way over the next few days.

@nerdoc
Copy link
Member Author

nerdoc commented Jan 13, 2021

Oh, and malformed sample edifact files? Please, just correct them, any help welcome. I stumbled into creating that library just by the need of a good lib in Python, and lack of that. I found good code in PHP, and transcoded it, learning EDIFACT on-the-fly along. So don't expect that this lib is from a company with many years of experience in EDIFACT. I'm a medical doctor, and coding is done in my free time ;-)
But: I'm dedicated to use this in a medical project - and therefore quality should be in the first place.

@theangryangel
Copy link
Contributor

theangryangel commented Jan 27, 2021

I've had a bit of time to work on this today. I'm not super happy with it, but it's a rough proof of concept.

I've tried to avoid touching the core library so far; Component is effectively just a wrapper for Segment. SegmentGroup, and SegmentLoop are higher level classes used to describe the file format.

For the samples I've got it working relatively well to read, and it looks a little something like this at the moment.

class OrderLine(SegmentGroup):
    line_id = Component("LIN", mandatory=True)
    description = Component("IMD", mandatory=True)
    quantity = Component("QTY", mandatory=True)
    moa = Component("MOA")
    pri = Component("PRI")
    rff = Component("RFF", mandatory=True)


class Order(SegmentGroup):
    purchase_order_id = Component("BGM", mandatory=True)
    date = Component("DTM", mandatory=True)
    delivery_date = Component("DTM", mandatory=True)
    delivery_instructions = Component("FTX", mandatory=True)
    supplier_id_gln = Component("NAD", mandatory=True)
    supplier_id_tprg = Component("NAD", mandatory=True)
    ref = Component("RFF", mandatory=True)
    ship_to = Component("NAD", mandatory=True)

    ship_to_contact = Component("CTA", mandatory=True)
    ship_to_phone = Component("COM", mandatory=True)
    ship_to_email = Component("COM", mandatory=True)
    cux = Component("CUX", mandatory=True)
    tdt = Component("TDT", mandatory=True)

    lines = SegmentLoop(
        OrderLine,
        max=99,
        mandatory=True
    )

    uns = Component("UNS", mandatory=True)
    cnt = Component("CNT", mandatory=True)


TYPE_TO_PARSER_DICT = {
    "ORDERS": Order
}


for message in interchange.get_messages():
    cls = TYPE_TO_PARSER_DICT.get(message.type)
    if not cls:
        raise NotImplementedError("Unsupported message type '{}'".format(message.type))

    obj = cls()
    obj.from_message(message)
    print(obj)

I'm still tinkering, so any input over the API, or suggestions on whether or not I should touch the core library would be appreciated.

@cmsdroff
Copy link

cmsdroff commented Apr 3, 2022

I asked www.stylusstudio.com where they have a good overview of EDIFACT data if I can parse their site using beautifulsoup and extract Service Code data from it. This would make it much easier to get a bigger amount of data for Service codes.
If they don't agree (which I think that will be the case), I'll asked the official UN source too by mail. I nowhere can see if the EDIFACT data are protected by patents etc. or if one can use them freely.

The UN publish their work openly and its free to use. EDIFACT outputs are limited a little, in comparison to their reference data models from what I can get from the tooling used to maintain the libraries.

I co-ordinate the Transport and Logistics domain in UN/CEFACT so if you need some things to support this work I can maybe help? Just give me the Wishlist I'll see what I have.

Keep up the good work on Pydifact I've used it for some work and find it very good!

@nerdoc
Copy link
Member Author

nerdoc commented Apr 3, 2022

Hey @cmsdroff, thanks for the words. ATM Pydifact is just low level. At some time I want to get a hiver level API, but am uncertain how to do this.
I need EDIFACT for medical data exchange, which is still quite common in e.g. Austria, mainly used for doctors' reports exchanges from laboratory or specialists to general practitioners.
I am very busy ATM, so pydifact is just held on it's current status from my side for the moment - accepting PRs happily. On the long term It definitely will be used by myself, so there's no way of abandon it.

What would be helpful is a list of higher level definitions that are used in the world - everything I found so far is a bit clumsy and not really helpful for me at least...

@sabas
Copy link

sabas commented Nov 23, 2023

@nerdoc I mantain https://github.com/php-edifact and I just started using your library to check if I can do something like the work I am doing in PHP.
If you need something you can ask me as well!
For example I converted the UN/CEFACT schemas in XML (edifact-mapping project), and I experimented in converting into json-schemas...

@nerdoc
Copy link
Member Author

nerdoc commented Nov 24, 2023

Interesting. Yes, this would be cool. How did you implement/convert the schemas? Manually?

@sabas
Copy link

sabas commented Nov 27, 2023

I wrote a converter from the xml (https://github.com/php-edifact/edifact-mapping) to a schema, I am thinking of releasing it someway... If you want to write me an email I can send you something :-)

@nerdoc

This comment was marked as off-topic.

@nerdoc

This comment was marked as off-topic.

@sabas

This comment was marked as off-topic.

@nerdoc

This comment was marked as off-topic.

@sabas

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants