Skip to content

A variation of the Transformer implementation to learn Knowledge Graphs

License

Notifications You must be signed in to change notification settings

ashwin-014/knowledge-graphformer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

knowledge-graphformer

What do transformers learn?

Unlike word embeddings which learn the co-occurence of words into one single global vector, transformers learn Instance specific (kind of Contextual) word vectors. ie. the vectors vary for the same word based on the sentence it occurs.

Transformers are known to learn something similar to knowledge graphs within their attention matrices.

Visualizing and Measuring the Geometry of BERT have shown that "just as there are specific syntactic subspaces, there is evidence for subspaces that represent semantic information"

knowledge-graphs and transformers

  • Understanding a piece of text involves understanding every word within it
  • Word embeddings are useful since they provide context for every word within the sentence
  • Fundamentally, words / tokens are knowledge about the world
  • Do embeddings which give subgraphs of every node (word) enable better understanding of the text?
  • Currently, language modelling - learning to predict the next word - learns word respresentations along with language syntax in a way
  • This semi supervised method enables us to train with large amounts of data

I try to ask the following questions

  • By combining word and language representations, are we complicating the problem?
  • Can we have a knowledge graph embedding (of every word) and a language syntax (learnt as weights) separately learnt by the transformer?

This variation of the transformer model aims to learn Graph Embeddings + Language syntax in separate embedding subspaces by design

  • By learning the KG separately, this gives us a kind of universal language model which can be used for multiple languages with only the syntax model change

Note: This is a Work in Progress!!

The base transformer implementation is forked from transformer-pytorch

Prerequisite

It's using SpaCy to tokenize languages for wmt32k dataset. So, if you want to run wmt32k problem which is a de/en translation dataset, you should download language models first with the following command.

$ pip install spacy
$ python -m spacy download en
$ python -m spacy download de

Usage

  1. Train a model.
$ python train.py --problem wmt32k --output_dir ./output --data_dir ./wmt32k_data
or
$ python train.py --problem lm1b --output_dir ./output --data_dir ./lm1b_data

If you want to try fast_transformer, give a model argument after installing tcop-pytorch.

$ python train.py --problem lm1b --output_dir ./output --data_dir ./lm1b_data --model fast_transformer
  1. You can translate a single sentence with the trained model.
$ python decoder.py --translate --data_dir ./wmt32k_data --model_dir ./output/last/models

About

A variation of the Transformer implementation to learn Knowledge Graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%