Skip to content

Networkx for enhanced dependencies #1295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 15, 2023
Merged

Networkx for enhanced dependencies #1295

merged 3 commits into from
Oct 15, 2023

Conversation

AngledLuffa
Copy link
Collaborator

@AngledLuffa AngledLuffa commented Oct 15, 2023

Use the networkx library to represent enhanced dependencies when reading in a UD data file

Also, correctly process extra words if ignore_gapping is False. That includes reading them, attaching them as a second list to the sentence, and then outputting them back in the conll or dict formats

The networkx graph is a little slow, so we lazy initialize it (such as when using the Pipeline, which won't create enhanced dependencies)

Installs networkx in setup.py
Pass around the Sentence when making Words and Tokens
  (they are only ever created as part of the Sentence creation)
The Words will attach their enhanced deps to the sentence's graph

Add a utility method to check if the enhanced dependencies are empty or not

Treat empty nodes as tuples in the enhanced dependency graph (those nodes are not kept in the data conversion yet, though...).

We store all parents as tuples in the enhanced graph, not just the
ones which represent empty nodes.  That makes the sorting code easier
later on, when retrieving the deps
…dices into their own list

The only other use of conll2dict can just ignore the empty words

Then we pass the empty words to the Document constructor and attach them to a Sentence

Output empty_words as part of the conll or when calling to_dict()
Includes a test of the output formats
@AngledLuffa AngledLuffa merged commit 0d1fe2d into dev Oct 15, 2023
@AngledLuffa AngledLuffa deleted the networkx_only branch October 15, 2023 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant