demos.md

Demos

The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types.

For publicly-visible demos hosted on Google Cloud, see https://pair-code.github.io/lit/demos/.

Classification

Sentiment and NLI

Hosted instance: https://pair-code.github.io/lit/demos/glue.html
Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py

Multi-task demo:
- Sentiment analysis as a binary classification task (SST-2) on single sentences.
- Natural Language Inference (NLI) using MultiNLI, as a three-way classification task with two-segment input (premise, hypothesis).
- STS-B textual similarity task (see Regression / Scoring below).
- Switch tasks using the Settings (⚙️) menu.
BERT models of different sizes, built on HuggingFace TF2 (Keras).
Supports the widest range of LIT interpretability features:
- Model output probabilities, custom thresholds, and multiclass metrics.
- Jitter plot of output scores, to find confident examples or ones near the margin.
- Embedding projector to find clusters in representation space.
- Integrated Gradients, LIME, and other salience methods.
- Attention visualization.
- Counterfactual generators, including HotFlip for targeted adversarial perturbations.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/sentiment

Multilingual (XNLI)

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/xnli_demo.py

XNLI dataset translates a subset of MultiNLI into 14 different languages.
Specify --languages=en,jp,hi,... flag to select which languages to load.
NLI as a three-way classification task with two-segment input (premise, hypothesis).
Fine-tuned multilingual BERT model.
Salience methods work with non-whitespace-delimited text, by using the model's wordpiece tokenization.

Regression / Scoring

Textual Similarity (STS-B)

Hosted instance: https://pair-code.github.io/lit/demos/glue.html?models=stsb&dataset=stsb_dev
Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py

STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar).
BERT models built on HuggingFace TF2 (Keras).
Supports a wide range of LIT interpretability features:
- Model output scores and metrics.
- Scatter plot of scores and error, and jitter plot of true labels for quick filtering.
- Embedding projector to find clusters in representation space.
- Integrated Gradients, LIME, and other salience methods.
- Attention visualization.

Sequence-to-Sequence

Open-source T5

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/t5_demo.py

Supports HuggingFace TF2 (Keras) models as well as TensorFlow SavedModel formats.
Visualize beam candidates and highlight diffs against references.
Visualize per-token decoder hypotheses to see where the model veers away from desired output.
Filter examples by ROUGE score against reference.
Embeddings from last layer of model, visualized with UMAP or PCA.
Task wrappers to handle pre- and post-processing for summarization and machine translation tasks.
Pre-loaded eval sets for CNNDM and WMT.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/generation

Language Modeling

BERT and GPT-2

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/lm_demo.py

Compare multiple BERT and GPT-2 models side-by-side on a variety of plain-text corpora.
LM visualization supports different modes:
- BERT masked language model: click-to-mask, and query model at that position.
- GPT-2 shows left-to-right hypotheses for each target token.
Embedding projector to show latent space of the model.

Structured Prediction

Gender Bias in Coreference

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/coref/coref_demo.py

Gold-mention coreference model, trained on OntoNotes.
Evaluate on the Winogender schemas (Rudinger et al. 2018) which test for gendered associations with profession names.
Visualizations of coreference edges, as well as binary classification between two candidate referents.
Stratified metrics for quantifying model bias as a function of pronoun gender or Bureau of Labor Statistics profession data.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/coref

Multimodal

Tabular Data: Penguin Classification

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin_demo.py

Binary classification on penguin dataset.
Showing using of LIT on non-text data (numeric and categorical features).
Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset.
Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.

Image Classification with MobileNet

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/image_demo.py

Classification on ImageNet labels using a MobileNet model.
Showing using of LIT on image data.
Explore results of multiple gradient-based image saliency techniques in the Salience Maps module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

demos.md

Demos

Classification

Sentiment and NLI

Multilingual (XNLI)

Regression / Scoring

Textual Similarity (STS-B)

Sequence-to-Sequence

Open-source T5

Language Modeling

BERT and GPT-2

Structured Prediction

Gender Bias in Coreference

Multimodal

Tabular Data: Penguin Classification

Image Classification with MobileNet

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally