Understanding and effectively utilizing sentence embeddings—a cornerstone of modern Natural Language Processing (NLP)—is often challenging due to their "black-box" nature and the limitations of traditional aggregate evaluation metrics. This project addresses these challenges by providing a novel Framework and Visual Analytics (VA) Tool designed to empower researchers with deeper, interactive insights into how various embedding models, composition functions, and similarity metrics influence textual representations.
This work's core contribution is the VA tool itself, which integrates comprehensive visualizations with interactive filtering and detailed drill-down capabilities. To enable this granular analysis, we developed an experimental framework or pipeline that systematically processes and normalizes the outputs of diverse embedding models, ensuring consistent and comparable data for the VA application. This framework is a valuable artifact, facilitating the reproduction of results and the expansion of the tool's dataset, and motivates the subsequent development of the VA tool.
The VA tool enhances embedding model interpretability by allowing visual exploration of embedding behavior across different configurations and layers. It facilitates systematic comparison across models, even those with disparate architectures, within a unified analytical environment. Crucially, it moves beyond aggregate performance by focusing on the error gap between predicted and actual similarity scores. Through detailed examples, interactive error gap heatmaps, and an Alternative Functions Heatmap for specific challenging instances, the tool enables fine-grained evaluation, revealing nuanced model strengths and limitations often obscured by summary statistics. This work provides an intuitive platform for diagnosing model failures, understanding representational biases, and fostering more informed decisions in the development and application of sentence embeddings.
- Interactive Visual Analytics: Explore sentence embedding behavior with rich, interactive visualizations.
- Error Gap Analysis: Focus on the discrepancies between predicted and actual similarity scores for in-depth model diagnosis.
- Comparative Analysis: Systematically compare diverse embedding models and composition functions within a unified environment.
- Drill-down Capabilities: Investigate specific challenging instances with detailed error gap heatmaps and alternative function analyses.
- Reproducible Experimental Framework: A robust pipeline for processing and normalizing embedding model outputs, ensuring consistent and comparable data.
This project is built using Python and Dash Plotly.
To set up the project, you will need a typical Python environment.
- Clone the repository:
git clone https://github.com/david-xander/visual-analytics-tool-sentence-embeddings cd visual-analytics-tool-sentence-embeddings - Install dependencies:
pip install -r requirements.txt
The experimental framework processes and normalizes embedding model outputs.
To run the framework:
python run_experiment.pyThe VA tool is a Dash Plotly application that runs as a web server.
To run the server:
python run_dashboard.pyOnce the server is running, you can access the VA tool through your web browser, typically at http://127.0.0.1:8050/.
The datasets generated by the experimental framework, particularly the extended STS-B dataset with computed similarities and correlation results, are included within this GitHub repository to facilitate direct use with the VA tool.
Note: Intermediate embedding files (".pt" files mentioned in Chapter 3 of the thesis) are not included due to GitHub's file size limits. The current version of the VA tool does not include embeddings visualization through a 2D or 3D scatterplot view, making these files unnecessary for its functionality.
The composition functions used in this project are derived from the AllSpark project. I would like to acknowledge their valuable contribution:
- GitHub Repository: https://github.com/adriangh-ai/AllSpark
- Publication: Ghajari, Adrián, Victor Fresno, and Enrique Amigo. “Platform for exploring Semantic Composition from pre-trained Language Models and static embeddings.” In Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2022), Vol-3224:52–56. A Coruña, Spain: CEUR-WS, 2022. http://ceur-ws.org/Vol-3224/paper13.pdf.