Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interoperability #8

Open
kbenoit opened this issue Mar 13, 2018 · 6 comments
Open

Interoperability #8

kbenoit opened this issue Mar 13, 2018 · 6 comments

Comments

@kbenoit
Copy link
Collaborator

kbenoit commented Mar 13, 2018

This ought to be very broadly interpreted this year, since we have broadened the group to include Python as well as the previous year's mainly R community. This year, it should include inter-operability of packages with one another, but also with toolkits from one language environment (e.g. Python) with another (e.g. R).

Some discussion of the Text Interchange Format would also be useful, as this was something we developed last year but have left officially unfinished.

@jwijffels
Copy link
Contributor

Was there also something similar as the text interchange format set up for dependency parsing annotation usage (e.g. network or easy structure which allows for nlp applications) during the 2017 workshop?

@kbenoit
Copy link
Collaborator Author

kbenoit commented Apr 7, 2018

Not that I recall, although we took some inspiration for the TIF from the CoNLL-U format.

@brendano
Copy link

Java and MALLET may be interesting. I didn't realize this until recently, but Mallet has its own token-level format it can export (their "save current topic model state" functionality).

@vanatteveldt
Copy link

I have my doubts about standardizing on a specific interchange format, as it quickly leads to different groups standardizing on different formats, and often tool X will not work with format Y and format Z won't have feature F that someone needs. Even for R dtm's there are multiple competing standards which I'm sure all have their benefits.

But I'd love to talk about how to ensure that we can all use each other's software and how to make sure we don't invent the same wheel to many times.

@brendano
Copy link

A shared corpus data format is kind of a big problem -- there's just no standardized method at all, if you wanted to import into various tools, and if you're writing a new tool, it's not clear what format you want to support. It seems like some people would be happy just with a solution within the R world; I'm curious how that has worked so far.

I have this little JSONL format I keep using for many things, but I have no idea if it's really a general purpose solution.

@kyunghyuncho
Copy link

i will attend.

re interoperability: i believe there are many different aspects/levels to this; (1) data, (2) model specification/implementation, (3) learning vs. inference, and so on. for instance, i may want to use a perl script (hypothetically, although i rarely do so) for data preparation, use pytorch to specify, implement and train a model on this prepared data, load it up in R for analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants