-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interoperability #8
Comments
Was there also something similar as the text interchange format set up for dependency parsing annotation usage (e.g. network or easy structure which allows for nlp applications) during the 2017 workshop? |
Not that I recall, although we took some inspiration for the TIF from the CoNLL-U format. |
Java and MALLET may be interesting. I didn't realize this until recently, but Mallet has its own token-level format it can export (their "save current topic model state" functionality). |
I have my doubts about standardizing on a specific interchange format, as it quickly leads to different groups standardizing on different formats, and often tool X will not work with format Y and format Z won't have feature F that someone needs. Even for R dtm's there are multiple competing standards which I'm sure all have their benefits. But I'd love to talk about how to ensure that we can all use each other's software and how to make sure we don't invent the same wheel to many times. |
A shared corpus data format is kind of a big problem -- there's just no standardized method at all, if you wanted to import into various tools, and if you're writing a new tool, it's not clear what format you want to support. It seems like some people would be happy just with a solution within the R world; I'm curious how that has worked so far. I have this little JSONL format I keep using for many things, but I have no idea if it's really a general purpose solution. |
i will attend. re interoperability: i believe there are many different aspects/levels to this; (1) data, (2) model specification/implementation, (3) learning vs. inference, and so on. for instance, i may want to use a perl script (hypothetically, although i rarely do so) for data preparation, use pytorch to specify, implement and train a model on this prepared data, load it up in R for analysis. |
This ought to be very broadly interpreted this year, since we have broadened the group to include Python as well as the previous year's mainly R community. This year, it should include inter-operability of packages with one another, but also with toolkits from one language environment (e.g. Python) with another (e.g. R).
Some discussion of the Text Interchange Format would also be useful, as this was something we developed last year but have left officially unfinished.
The text was updated successfully, but these errors were encountered: