Skip to content

Visualizations #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 85 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
3ea9d47
display cos distance of each nugget in document view
nils-bz May 29, 2024
282ef5a
add 3D grid to document widget which could later be used for visualiz…
nils-bz May 30, 2024
5b881fe
Show the cosine similarity value beneath the nuggets names
eneapane Jun 1, 2024
38e6fe9
Add bar chart, design and button need to be improved
eneapane Jun 1, 2024
4b47368
add scatterplot
tagzyassi Jun 11, 2024
5e28aaa
Adjust buttons in the view, lay they side by side
eneapane Jun 12, 2024
1df9746
Add labels on click for bar chart
eneapane Jun 13, 2024
4345330
Add colored bar charts
eneapane Jun 13, 2024
52790ed
show pca reduced embedding of attribute in DocumentWidget
nils-bz Jun 14, 2024
d05f514
fix type of point passed to update grid leading to wrong points being…
nils-bz Jun 14, 2024
e40734c
refactor dim_red_value computation and add nugget embeddings to grid
nils-bz Jun 16, 2024
ea7eb23
Add full screen 3D Grid View in separate window
eneapane Jun 22, 2024
bd10599
fixed scatterplot/barchart accumulation error
tagzyassi Jun 22, 2024
4268ebe
Remove print debugs and fix error
eneapane Jun 23, 2024
b5ee9fe
highlight currently selected nugget in visualizer
nils-bz Jun 24, 2024
488e3c7
ignore __pycache__ folder in wannadb_ui
nils-bz Jun 24, 2024
a4c5fb2
remove pycache folder
nils-bz Jun 24, 2024
1c089a0
rm pycache
Dongtaes Jun 25, 2024
66cab25
implement possibility to use T-SNE dimension reduction
nils-bz Jun 25, 2024
a145e0c
add static annotation indicating corresponding nugget to items in 3D …
nils-bz Jun 30, 2024
c65418d
Add list of most likely choices in the interactive matching widget
eneapane Jul 1, 2024
8e60985
Make last commit's code more pythonic
eneapane Jul 1, 2024
9af0422
adjust annotation boxes for scatterplot/barchart
tagzyassi Jul 3, 2024
d0c986e
change colormap of scatterplot
tagzyassi Jul 6, 2024
8f2d5a4
enhance 3D grid by adding distances to annotations and possibility to…
nils-bz Jul 10, 2024
fa9fc1d
Change buttons layout below 3D Grid
eneapane Jul 10, 2024
3657db4
add 3D grid enhancements to fullscreen grid
nils-bz Jul 15, 2024
fec2290
make bar chart horizontally scrollable
eneapane Jul 20, 2024
b661458
Adjust on-click bar chart
eneapane Jul 20, 2024
44ca5af
implement simple visualizer for document overview
nils-bz Jul 25, 2024
3123fcc
several grid related enhancements and complete refactoring of visuali…
nils-bz Jul 27, 2024
300f59e
fix several grid related errors
nils-bz Jul 27, 2024
f44f5b9
improve visualizations in document overview: threshold label displays…
nils-bz Jul 29, 2024
42bc0b6
implement first version of list visualizing changed best matches
nils-bz Jul 30, 2024
2b161a3
fix issue with not correctly highlighted confirmed matches
nils-bz Jul 30, 2024
e44e224
improve tooltips related to newly added nuggets shown to the user
nils-bz Jul 30, 2024
767f05a
Track the user usage of the visual gadgets, preparation for the study
eneapane Aug 4, 2024
ba02670
fix data not being set initially for scatterPlot and barChart
nils-bz Aug 5, 2024
76097a8
add possibility to en-/disable visualizations
nils-bz Aug 5, 2024
e31b4c6
add lists indicating which nuggets moved below/above threshold due to…
nils-bz Aug 7, 2024
5326ee2
Improve logging, and add logs to .gitignore
eneapane Aug 7, 2024
78d602a
Small fix
eneapane Aug 7, 2024
c3cb2d4
Merge branch 'visualizations' into study
eneapane Aug 7, 2024
5f3a5ce
track match/no_match button
Dongtaes Aug 15, 2024
4e69cfd
Logging ready for show bar chart, show scatter plot, and embedding vi…
eneapane Aug 7, 2024
3b65b9d
fix issue with match update list
nils-bz Aug 15, 2024
19e0d7f
add dimension reducer to preprocess script
nils-bz Aug 16, 2024
28d0ee3
remove duplicated nuggets in preprocessing phase
nils-bz Aug 17, 2024
4307b03
fix several issues with changes lists and some small UI improvements
nils-bz Aug 17, 2024
36f8d22
add possibility to switch between 3 levels of visualizations
nils-bz Aug 17, 2024
8e7e07d
added accessibility button with IBM color palette
Dongtaes Aug 17, 2024
69f91ed
Track Show Suggestions in 3D butoon
eneapane Aug 24, 2024
223910f
track tooltips in logs/user_report.txt
eneapane Aug 24, 2024
0be66c0
add json file for jupyter processing
eneapane Aug 24, 2024
fcf52da
more relevant information on which tooltip was activated, no informat…
eneapane Aug 24, 2024
7576d1d
Merge IBM Buttons and vis. branch
Dongtaes Aug 27, 2024
1187195
Bug Fix of the Accessibility Button
Dongtaes Aug 27, 2024
8f53f5a
add simple legend for 3D grid
nils-bz Sep 8, 2024
b46ef0f
Add tutorial for using bar chart
eneapane Sep 11, 2024
90391d9
Fix y-axis annotation
eneapane Sep 11, 2024
70ce693
Remove scatter plot
eneapane Sep 11, 2024
d1e3dbd
Show tutorial for bar chart only once per application usage
eneapane Sep 11, 2024
b895ce9
Fix bar chart with suggestions
eneapane Sep 13, 2024
1c4f9ce
implement information popups, splash screen and corresponding help menu
nils-bz Sep 14, 2024
da654bc
adjust resource folder
nils-bz Sep 14, 2024
b8c38a8
move helper model classes to proper location
nils-bz Sep 14, 2024
95fc8b9
replace png with svg
nils-bz Sep 19, 2024
0a9609f
increase nugget size in grid
nils-bz Sep 20, 2024
070cbc1
fix wrong colors in legend
nils-bz Sep 20, 2024
9a05279
fix custom-match workflow
nils-bz Sep 20, 2024
d81e4b2
display similarity instead of distance in nugget list of document view
nils-bz Sep 24, 2024
88be1a6
improve var name
nils-bz Sep 24, 2024
7fae912
add documentation for PointLegend class
nils-bz Sep 24, 2024
0256faf
Add comments for bar chart
eneapane Sep 26, 2024
27c32e4
add remaining documentation to visualizations.py
nils-bz Sep 28, 2024
ce438d5
start documenting data_insights.py
nils-bz Sep 28, 2024
35edb3f
add remaining documentation to data_insights.py
nils-bz Sep 28, 2024
726a882
add further documentation
nils-bz Sep 29, 2024
41d9838
add further documentation
nils-bz Sep 29, 2024
43e29a9
add further documentation
nils-bz Sep 29, 2024
2e7f653
Add comments study.py
eneapane Sep 30, 2024
95bc71d
Refactor text
eneapane Sep 30, 2024
9a39b2d
fix enabling / disabling of accessible color palette
nils-bz Sep 30, 2024
af3e376
change color indicating threshold change and appearance of attribute …
nils-bz Sep 30, 2024
6cb8bd0
Merge branch 'visualizations' of https://github.com/DataManagementLab…
nils-bz Sep 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
/venv/
/.idea/
/__pycache__/
**/__pycache__/
/wannadb_ui/__pycache__/
/.pytest_cache/

/models/
Expand All @@ -10,4 +11,5 @@
.bson
/evaluation/datasets/aviation/documents
/evaluation/datasets/nobel/documents
/evaluation/results/
/evaluation/results/
/logs/
2 changes: 2 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import logging
import sys

from PyQt6.QtCore import Qt
from PyQt6.QtWidgets import QApplication

from wannadb.resources import ResourceManager
Expand All @@ -14,6 +15,7 @@

with ResourceManager() as resource_manager:
# set up PyQt application
QApplication.setAttribute(Qt.ApplicationAttribute.AA_ShareOpenGLContexts)
app = QApplication(sys.argv)

window = MainWindow()
Expand Down
10 changes: 8 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ murmurhash==1.0.9
# thinc
nltk==3.8
# via sentence-transformers
numpy==1.21.4
numpy==1.26.4
# via
# aset (setup.py)
# blis
Expand Down Expand Up @@ -148,7 +148,7 @@ scikit-learn==1.0.1
# via
# aset (setup.py)
# sentence-transformers
scipy==1.7.2
scipy==1.13.1
# via
# aset (setup.py)
# scikit-learn
Expand Down Expand Up @@ -231,5 +231,11 @@ wasabi==0.10.1
# spacy
# thinc

pyqtgraph==0.13.7

PyOpenGL==3.1.7

PyOpenGL_accelerate==3.1.7

# The following packages are considered to be unsafe in a requirements file:
# setuptools
7 changes: 5 additions & 2 deletions scripts/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
from wannadb.configuration import Pipeline
from wannadb.data.data import Document, DocumentBase
from wannadb.interaction import EmptyInteractionCallback
from wannadb.preprocessing.dimension_reduction import PCAReducer
from wannadb.preprocessing.embedding import BERTContextSentenceEmbedder, RelativePositionEmbedder, SBERTTextEmbedder, SBERTLabelEmbedder
from wannadb.preprocessing.extraction import StanzaNERExtractor, SpacyNERExtractor
from wannadb.preprocessing.label_paraphrasing import OntoNotesLabelParaphraser, SplitAttributeNameLabelParaphraser
from wannadb.preprocessing.normalization import CopyNormalizer
from wannadb.preprocessing.other_processing import ContextSentenceCacher
from wannadb.preprocessing.other_processing import ContextSentenceCacher, DuplicatedNuggetsCleaner
from wannadb.resources import ResourceManager
from wannadb.statistics import Statistics
from wannadb.status import EmptyStatusCallback
Expand Down Expand Up @@ -68,7 +69,9 @@ def main() -> None:
SBERTLabelEmbedder("SBERTBertLargeNliMeanTokensResource"),
SBERTTextEmbedder("SBERTBertLargeNliMeanTokensResource"),
BERTContextSentenceEmbedder("BertLargeCasedResource"),
RelativePositionEmbedder()
RelativePositionEmbedder(),
DuplicatedNuggetsCleaner(),
PCAReducer()
])

document_base = DocumentBase(documents, [])
Expand Down
220 changes: 220 additions & 0 deletions wannadb/change_captor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
"""
Class providing model classes which can be utilized to capture the changes due a user feedback and propagate them to the
UI.
These changes are computed after every feedback of the user.
"""

from typing import Optional, Union, List

from PyQt6.QtGui import QColor

from wannadb.data.data import InformationNugget
from wannadb_ui.common import ThresholdPosition, AddedReason


class BestMatchUpdate:
"""
Instances of this class represent an update of the best match of a document.

Each instance provide the old best match and the new best match of a document as well as the count specifying how
often similar changes of best guesses happened.
Another best match change is considered as similar if it happened in the same feedback round and the new best guess
is equal.

Methods
-------
old_best_match()
Returns the old best match of the related document.
new_best_match()
Returns the new best match of the related document.
count()
Returns the count of similar best match changes happened in the same feedback round.
"""

def __init__(self, old_best_match: str, new_best_match: str, count: int):
"""
Parameters
----------
old_best_match: str
The old best match of the related document.
new_best_match: str
The new best match of the related document.
count: int
The count of similar best match changes happened in the same feedback round.
"""

self._old_best_match: str = old_best_match
self._new_best_match: str = new_best_match
self._count: int = count

@property
def old_best_match(self) -> str:
return self._old_best_match

@property
def new_best_match(self) -> str:
return self._new_best_match

@property
def count(self) -> int:
return self._count


class ThresholdPositionUpdate:
"""
Instances of this class represent an update of the position of a nugget's distance relative to the threshold.

Each instance provide the text of the nugget whose position changed, the old position (above or below), the new
position (above or below), the old and new distance of the nugget as well as a count indicating how often similar
changes happened in the same feedback round.
A change is considered as similar if it happened in the same feedback round, the text represented by the nugget is
equal, and it has the same type of the update (above -> below or below -> above).

As mentioned, an instance of this class can cover multiple changes if the text of the nuggets with a change are
equal.
In this case the distance related properties are None as we don't refer to a single nugget.
"""

def __init__(self,
nugget_text: str,
old_position: Optional[ThresholdPosition], new_position: ThresholdPosition,
old_distance: Optional[float], new_distance: Optional[float],
count: int):
"""
Parameters
----------
nugget_text: str
Text of the nuggets whose position relative to the threshold changed.
old_position: ThresholdPosition
Previous position of the covered nuggets relative to the threshold (above or below).
new_position: ThresholdPosition
New position of the covered nuggets relative to the threshold (above or below).
old_distance: float
Old distance associated with the nugget. If multiple nuggets are covered by this instance, this will be
None.
new_distance: float
New distance associated with the nugget. If multiple nuggets are covered by this instance, this will be
None.
count: int
Number of similar changes happened in the same feedback round.
"""

self._best_guess: str = nugget_text
self._old_position: Optional[ThresholdPosition] = old_position
self._new_position: ThresholdPosition = new_position
self._old_distance: float = old_distance
self._new_distance: float = new_distance
self._count: int = count

@property
def nugget_text(self) -> str:
return self._best_guess

@property
def old_position(self) -> Optional[ThresholdPosition]:
return self._old_position

@property
def new_position(self) -> ThresholdPosition:
return self._new_position

@property
def old_distance(self) -> Optional[float]:
return self._old_distance

@property
def new_distance(self) -> Optional[float]:
return self._new_distance

@property
def count(self) -> int:
return self._count


class NewlyAddedNuggetContext:
"""
Instances of this class represent a newly added nugget to the document overview.
Each instance provide information about the old and new distance of the nugget as well as the reason why the system
newly added the nugget.
"""

def __init__(self,
nugget: InformationNugget,
old_distance: Union[float, None],
new_distance: float,
added_reason: AddedReason):
"""
Parameters
----------
nugget: InformationNugget
Newly added nugget.
old_distance: float
Old distance associated with the nugget.
new_distance: float
New distance associated with the nugget.
added_reason: AddedReason
Reason for the nugget being newly added.
"""

self._nugget = nugget
self._old_distance = old_distance
self._new_distance = new_distance
self._added_reason = added_reason

@property
def nugget(self):
return self._nugget

@property
def old_distance(self):
return self._old_distance

@property
def new_distance(self):
return self._new_distance

@property
def added_reason(self):
return self._added_reason


class NuggetUpdatesContext:
"""
Wrapper class wrapping multiple types of nugget related updates.
Nugget related updates refer to `NewlyAddedNuggetContext`, `ThresholdPositionUpdate` and `BestMatchUpdate`. Each
instance holds a list of updates for all of these 3 update types.
"""

def __init__(self,
newly_added_nugget_contexts: List[NewlyAddedNuggetContext],
best_match_updates: List[BestMatchUpdate],
threshold_position_updates: List[ThresholdPositionUpdate]):
"""
Parameters
----------
newly_added_nugget_contexts: List[NewlyAddedNuggetContext]
List of all `NewlyAddedNuggetContext` instances wrapped by this instance.
best_match_updates: List[BestMatchUpdate]
List of all `BestMatchUpdate` instances wrapped by this instance.
threshold_position_updates: List[ThresholdPositionUpdate]
List of all `ThresholdPositionUpdate` instances wrapped by this instance.
"""

self._newly_added_nugget_contexts: List[NewlyAddedNuggetContext] = newly_added_nugget_contexts
self._best_match_updates: List[BestMatchUpdate] = best_match_updates
self._threshold_position_updates: List[ThresholdPositionUpdate] = threshold_position_updates

@property
def newly_added_nugget_contexts(self) -> List[NewlyAddedNuggetContext]:
return self._newly_added_nugget_contexts

@property
def best_match_updates(self) -> List[BestMatchUpdate]:
return self._best_match_updates

@property
def threshold_position_updates(self) -> List[ThresholdPositionUpdate]:
return self._threshold_position_updates



Loading