Boli Chen*, Yao Fu*, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing. Probing BERT in Hyperbolic Spaces. ICLR 2021.
$ pip install -r requirements.txt
Probe checkpoints are in the folder ./SentimentProbe/checkpoint
.
Use the Jupyter Notebook Sentiment-Visualization-Poincare/Euclidean.ipynb
.
Play with it by changing the input text.
BERT is a good model
Use the Jupyter Notebook SyntaxProbe/Syntax-Visualization-Poincare/Euclidean.ipynb
.
Equation (1), the poincare distance is implemented within the geoopt library, Stereographic
class. In our work we just call its implementation.
Our Poincare probe is implemented in SyntaxProbe/probe/hyper.py
, PoincareProbe
class and SentimentProbe/probe/probe.py
, PoincareProbe
class.
Equation (2) and (3) in the paper correspond to:
SyntaxProbe/probe/hyper.py
line 80 to 82 for syntaxSentimentProbe/probe/probe.py
line 60 - 62 for sentiment
Equation (4) and (5) in the paper correspond to SyntaxProbe/probe/hyper.py
, PoincareProbeBase
class, function distance
(line 16 - 34) and depth
(line 36 - 50).
Equation (6) in the paper correspond to SentimentProbe/probe/probe.py
line 63 - 64.
Optimization:
- For the syntax probe, there is no parameters lay in the hyperbolic space (we project embeddings into the hyperbolic space, but the parameters for the projections are themselves in euclidean). So we use normal Adam optimizer.
- For the sentiment probe, the two meta embeddings are in the hyperbolic space (
SentimentProbe/probe/probe.py
line 51 - 52) while the rest are in the euclidean space. So we useRiemannianAdam
for the meta embeddings in ``SentimentProbe/run.py` line 91-92. Note that this optimizer will different the space of the parameters and operate accordingly.
You will need access to the PTB dataset from LDC.
Then use the scripts from Hewitt to preprocess the data. You only need the convert_splits_to_depparse.sh
and convert_conll_to_raw.py
file since we will use our own in SyntaxProbe/data.py
to get BERT embeddings from the current huggingface transformers library (current = 2021.03.19).
You will need to name the datasets and create folders accordingly, checkout SyntaxProbe/data.py
for the directory names.
Then use run.py
, which can be found in the sub-folders ./SyntaxProbe
and ./SentimentProbe
. This would enable you to reproduce Table 1 and Table 2, which are our major results.
We do not use fixed random seeds. Generally we do not recommend using any fixed seeds. We recommend doing multiple runs for training and observe the mean and variance of the performance.
can be reproduced by modifying the jupyter notebooks.
Q: if I only know sequence to sequence, but I want to really understand what you are doing, what should I read to get sufficient background knowledge?
A: to understand our paper, you will need (a) background about dependency parsing, (b) background about hyperbolic geometry, (c) background about probing BERT.
For parsing, start with slp chap 14 to learn what is dependency parsing and read biaffine to know what makes a good parser. A good practice is to put the two side by side and do comparative-contrastive reading. After reading them, you will know the basics about parsing used in our paper (e.g., the meaning of UUAS in Table 1, the meaning of edges in Figure 3)
For hyperbolic geometry, start with the blog posts about differential geometry (tensors, manifolds). Then read the Poincare embeddings paper and a blog post about it. After reading them, you should have deep enough knowledge to understand the techniques in our chapter 3.
For probing syntax in BERT, start with the structural probe paper (this is the primary paper we follow) then a discussion paper about a probe and a parser (this paper motivates our heavy discussion about the probe sensitivity in our introduction).