Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update #1

Merged
merged 193 commits into from
May 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
193 commits
Select commit Hold shift + click to select a range
83e663d
Fix link for Multiple Maps t-SNE
Yekut Sep 27, 2018
43bfbf1
Merge pull request #17 from ZiyaoWei/master
cgpotts Nov 3, 2018
f840f79
Exclude the main data folder
cgpotts Mar 20, 2019
95a27a6
Remove units we won't have time for this year
cgpotts Mar 20, 2019
22048ef
Remove older bake-offs
cgpotts Mar 20, 2019
08be73d
Remove older Tensorflow code
cgpotts Mar 20, 2019
6129648
Remove older homeworks
cgpotts Mar 20, 2019
46a27f7
New PyTorch modules
cgpotts Mar 20, 2019
6a6103a
Updated NumPy ML classes
cgpotts Mar 20, 2019
9c3e919
New homeworks/bakeoffs
cgpotts Mar 20, 2019
945e109
Version update
cgpotts Mar 20, 2019
9fdd64c
Switch to Apache license
cgpotts Mar 20, 2019
c018c81
Overview/admin updates
cgpotts Mar 20, 2019
0f56eb8
Improved notebook and supporting code
cgpotts Mar 20, 2019
91a910f
Test updates
cgpotts Mar 20, 2019
fbb735e
NumPy TreeNN update
cgpotts Mar 20, 2019
f8075c7
Merge pull request #18 from cgpotts/pre2019-revisions
cgpotts Mar 20, 2019
203326c
Modern TensorFlow DL classes
cgpotts Mar 22, 2019
5a4c0ed
Expanded tests, with fixes
cgpotts Mar 22, 2019
2b0eb70
SST unit improvements
cgpotts Mar 22, 2019
a5efc41
Support for incremental testing and error accumulation
cgpotts Mar 24, 2019
3bc2b65
Updated tests for incremental testing
cgpotts Mar 24, 2019
186f07b
Updated methods notebook to use PyTorch model
cgpotts Mar 24, 2019
0f8678a
Updated bake-off 1 instructions
cgpotts Mar 24, 2019
cebde30
Updated bake-off instructions
cgpotts Mar 25, 2019
a478861
Updated bake-off winning criteria
cgpotts Mar 25, 2019
53f3055
Diagram for distributed representations as classifier features
cgpotts Mar 25, 2019
dfd758e
Additional NLI datasets
cgpotts Mar 25, 2019
d37e286
Newer references on hypothesis-only baselines
cgpotts Mar 25, 2019
ad83dcc
Return the featurizer with experimental results
cgpotts Mar 26, 2019
1aab57d
Illustrative use of the MultiNLI annotations
cgpotts Mar 26, 2019
940bca2
random_state argument for sst.experiment
cgpotts Mar 26, 2019
0f3e965
Test for sst.experiment
cgpotts Mar 26, 2019
e516fad
Minor homework updates
cgpotts Mar 29, 2019
848a6e1
SST unit improvements
cgpotts Mar 29, 2019
3a47a49
PyTorch module documentation
cgpotts Mar 29, 2019
caa40b2
NumPy module documentation tweaks
cgpotts Mar 29, 2019
24d34e5
New PyTorch subtree supervision module
cgpotts Mar 29, 2019
2b06c1a
Removes some unused functions
cgpotts Mar 29, 2019
a889b15
Small change to experiment reporting
cgpotts Mar 29, 2019
aaae2a7
Improved course set-up instructions
cgpotts Mar 29, 2019
b194c54
Updated tests
cgpotts Mar 29, 2019
29edc69
Gradient checking for the NumPy models
cgpotts Mar 29, 2019
e46ce79
Expanded model tests
cgpotts Mar 31, 2019
66580c3
Allow for raw vector inputs (e.g., BERT and ELMo inputs)
cgpotts Mar 31, 2019
3147aa4
Improved documentation
cgpotts Mar 31, 2019
5826f43
Restore missing table of contents
cgpotts Mar 31, 2019
f4f3c9b
Minor tweaks and re-run
cgpotts Mar 31, 2019
c79e4dd
Bug fix due to @atticusg; TreeNN now passes gradient checks
cgpotts Mar 31, 2019
8246441
New practical intro to using contextual word reps
cgpotts Mar 31, 2019
0972552
Remove encoding check; Python 2 is not supported
cgpotts Mar 31, 2019
85423cd
Typo corrections in the o/e definition
cgpotts Apr 1, 2019
71a528c
Corrects the table of contents
cgpotts Apr 1, 2019
e5cd2e6
New data distribution link
cgpotts Apr 1, 2019
58659ee
Add files via upload
lucy3 Apr 3, 2019
b054c4a
Merge pull request #19 from lucy3/master
cgpotts Apr 3, 2019
2d2c422
Updated conda and git commands
ignaciocases Apr 3, 2019
c39633e
CUDA-enabled GPU
ignaciocases Apr 3, 2019
7e94e0f
CUDA-enabled GPU
ignaciocases Apr 3, 2019
94f38bf
Clarifies that gigawordnyt-advmod-matrix.csv.gz isn't part of the bak…
cgpotts Apr 4, 2019
975c0b9
GPU speed-ups from pin_memory and non_blocking (h/t @zijwang)
cgpotts Apr 5, 2019
78a3700
Small change to environment instructions
cgpotts Apr 5, 2019
375a73e
Merge pull request #21 from cgpotts/pr/20
cgpotts Apr 5, 2019
d7bfebc
Clarification about external vectors
cgpotts Apr 6, 2019
1d5950a
Corrects double-log bug in correlation_test (h/t Minfa!)
cgpotts Apr 7, 2019
c264cc4
exposes distfunc argument for full_word_similarity_evaluation
cgpotts Apr 8, 2019
5f3f75d
fix torch_tree_nn loss normalization error
zijwang Apr 8, 2019
e29d8aa
Remove unnecessary tree model hidden_dim args (h/t @zijwang)
cgpotts Apr 8, 2019
57a01cc
Merge pull request #22 from zijwang/fix-loss-error
cgpotts Apr 8, 2019
0ee1437
fix model.eval() and explicitly add model.train()
zijwang Apr 9, 2019
924b4b8
add fix_random_seed util function
zijwang Apr 9, 2019
2e37c94
Merge pull request #23 from zijwang/fix_model_eval
cgpotts Apr 9, 2019
681d297
More control over which seeds get set
cgpotts Apr 9, 2019
05a7bc5
Tests that the seeds get set as the user desires
cgpotts Apr 9, 2019
eead8e9
Merge pull request #25 from cgpotts/docs-and-user-control
cgpotts Apr 9, 2019
a289c82
Typo in autoencoder function name
ignaciocases Apr 9, 2019
2ffb517
Merge pull request #26 from ignaciocases/master
cgpotts Apr 9, 2019
18f9f1a
Use pin_memory=False to avoid GPU problems
cgpotts Apr 15, 2019
38915b3
Small clarification concerning what it means to be an original system
cgpotts Apr 15, 2019
01fe54e
Fix typos and broken examples in rel_ext_01_task.ipynb
wcmac Apr 16, 2019
391ec24
Merge branch 'master' of https://github.com/cgpotts/cs224u
wcmac Apr 16, 2019
c042b06
Keeps the full matrix off the GPU
cgpotts Apr 17, 2019
47be08f
A couple HW2 clarifications
cgpotts Apr 17, 2019
0b571d9
Merge pull request #27 from cgpotts/cp-autoencoder-gpu
cgpotts Apr 18, 2019
cd3820e
Handling vocab arguments in cross-validation, with tests
cgpotts Apr 20, 2019
84246b1
Bug fixes in the sentence-encoding RNN
cgpotts Apr 21, 2019
bf83930
Fix typos in rel_ext_01_task.ipynb
wcmac Apr 21, 2019
b4b5347
Merge branch 'master' of https://github.com/cgpotts/cs224u
wcmac Apr 21, 2019
bc6ed8b
Fix typos in rel_ext_02_experiments.ipynb
wcmac Apr 22, 2019
4f02dd9
Add missing elements to params list, and expand associated tests
cgpotts Apr 22, 2019
472d405
Local import of PyTorch and TensorFlow for env flexibility
cgpotts Apr 25, 2019
2af8b47
Add files via upload
lucy3 Apr 26, 2019
4201cf0
Merge pull request #28 from lucy3/master
cgpotts Apr 29, 2019
f6a5f11
Work-around to display weights for models using sparse coef_
cgpotts Apr 29, 2019
f636c4f
PyTorch tutorial from @ignaciocases
cgpotts Apr 29, 2019
af89e61
Typo corrections in sections on micro-F1 and average precision
cgpotts May 3, 2019
c76ecb0
Significant revision of the metrics notebook
cgpotts May 6, 2019
ee339a6
Adds slideshow structure
cgpotts May 6, 2019
2351fa4
Change "TensorFlow" to "PyTorch" when discussing early stopping
cgpotts May 6, 2019
d26b77e
New notes on project evaluation
cgpotts May 6, 2019
db7e211
Minor HW typo corrections and clarifications
cgpotts Jun 1, 2019
4fae732
Specify UTF-8 encoding explicitly when opening data files
dbarkar Jun 10, 2019
4762cc2
Merge pull request #29 from mastermind-/fix-open-encoding
cgpotts Jun 17, 2019
7db4160
Initial updates for 2020
cgpotts Oct 22, 2019
e0b28d1
Merge pull request #33 from cgpotts/scpd-winter-2020
cgpotts Oct 22, 2019
551a6c0
Formatting improvements to projects.md
cgpotts Oct 22, 2019
de0e305
Fix out-dated data file link
cgpotts Oct 30, 2019
20c2771
Minor assignment notebook updates
cgpotts Nov 11, 2019
98ce0d3
Updates related to setting random seeds
cgpotts Nov 18, 2019
aba8f6c
Minor typo fix
insop Dec 24, 2019
d8e9ef2
Fixed the markdown table for Observed/Expected
insop Dec 24, 2019
3d985a6
Fix the broken link for Turney and Pantel
insop Dec 24, 2019
f20efa8
Update the code comment for vocab_overlap_crosstab
insop Dec 25, 2019
669cbfb
minor typo fix
insop Dec 26, 2019
b514014
Merge pull request #35 from insop/insop/master
cgpotts Dec 26, 2019
0da0c64
Merge pull request #36 from insop/insop/hw1
cgpotts Dec 26, 2019
f867097
Potential typo, since predict_one_proba is not used
insop Dec 27, 2019
b6e0e84
Fix minor typoe
insop Dec 27, 2019
9b85b2c
Fix minor typo
insop Dec 27, 2019
fa0688e
Merge pull request #37 from insop/insop/master_hw2
cgpotts Dec 31, 2019
22b557c
Typo fix in `test_subword_enrichment`:
insop Dec 31, 2019
315d12b
Merge pull request #38 from insop/insop/master_hw1_2
cgpotts Jan 12, 2020
ce3c9c2
Corrected docstring for predict_one; h/t @insop
cgpotts Jan 12, 2020
c441440
Wordsim HW questions as functions, with included tests
cgpotts Jan 12, 2020
686df0f
rel_ext HW question 1 as a function, with a test
cgpotts Jan 12, 2020
5987d13
wordentail HW questions as functions, with tests
cgpotts Jan 12, 2020
a259f13
Updated title for the wordentail HW/bakeoff
cgpotts Jan 12, 2020
3824761
Resolve conflict from previous merge
insop Jan 13, 2020
ba304b2
Merge pull request #39 from insop/insop/resolve_conflict
cgpotts Jan 13, 2020
48d1856
Potential typo fix
insop Jan 15, 2020
ed041c0
Merge pull request #40 from insop/insop/sst_03
cgpotts Jan 17, 2020
9f32588
Negotiate TensorFlow v1 and v2 when setting random seeds
cgpotts Jan 18, 2020
d70ccd9
PyTorch models: warm_start option and serialization methods
cgpotts Jan 21, 2020
0db2923
PyTorch models: tests for the serialization methods
cgpotts Jan 21, 2020
83cb1c7
small doc fix
stas00 Jan 29, 2020
fd11554
Merge pull request #41 from stas00/tweaks
cgpotts Jan 31, 2020
46ec94c
Optional y arg to tree network fit methods to allow cross-validation
cgpotts Feb 2, 2020
3e2b4d3
Adds Gradescope env variable conditionals
cgpotts Feb 4, 2020
c5d12df
New vectorize=False option to support deep learning models
cgpotts Feb 5, 2020
b52bec1
Corrected test_run_ppmi_lsa_pipeline (h/t @AndrewLim1990)
cgpotts Feb 11, 2020
7beb0eb
[setup.ipynb] deactivate does not accept arguments
Feb 18, 2020
6308339
Merge pull request #43 from krsnaa/master
cgpotts Feb 19, 2020
6f14a8f
added display codes in the scope of autograder checks in the explanat…
e-budur Feb 25, 2020
30b07ba
Merge pull request #45 from e-budur/adding-displays-in-autograder-cod…
cgpotts Feb 25, 2020
016c8e2
Default required tokens for create_pretrained_embedding
cgpotts Feb 26, 2020
defa1be
Minor updates to improve result displays
cgpotts Feb 26, 2020
e86758b
Update `conda --version`
insop Feb 26, 2020
484e402
Merge pull request #46 from insop/insop/typo
cgpotts Feb 26, 2020
04dde53
revert to an assertion value of 0.57 in the function test_run_ppmi_ls…
arun-ghontale Feb 28, 2020
125f118
Merge pull request #1 from arun-ghontale/fix_assertion_value
arun-ghontale Feb 28, 2020
53d5547
Merge pull request #47 from arun-ghontale/fix_assertion_value
cgpotts Feb 28, 2020
78f834e
add pass statements to the last two if blocks
arun-ghontale Feb 29, 2020
58cf9e1
add pass statements to the last two if conditions
arun-ghontale Feb 29, 2020
ebc097f
Merge pull request #48 from arun-ghontale/add_pass_statements
cgpotts Feb 29, 2020
44f5e00
Fixes a docstring bug (h/t Jennifer from XCS224u)
cgpotts Feb 29, 2020
8461455
Remove a misleading (unused) attribute
cgpotts Feb 29, 2020
e277256
Initial colors module
cgpotts Feb 29, 2020
c3a54f6
Update broken link
insop Mar 1, 2020
e296584
Merge pull request #49 from insop/insop/update_wordsim_link
cgpotts Mar 1, 2020
940674a
Restores mysteriously missing distfunc arg for full_word_similarity_e…
cgpotts Mar 1, 2020
34622a2
Fix get_reader_pair_overlap sort bug (h/t @vuqpham)
cgpotts Mar 1, 2020
afe371a
Merge pull request #2 from cgpotts/master
e-budur Mar 6, 2020
ba8d5df
fixed header parameters in the readers of the SimVerb datasets.
e-budur Mar 6, 2020
4543f5f
Fix `convert_tag` minor check
insop Mar 7, 2020
631976f
Merge pull request #52 from insop/insop/minor_typo
cgpotts Mar 7, 2020
2150d08
Featurizing and fine-tuning with Hugging Face (BERT) and AllenNLP (ELMo)
cgpotts Mar 11, 2020
bfe49d3
Github-friendly intradocument hyperlinks
cgpotts Mar 13, 2020
3db119f
Fix incorrect year in the version string
cgpotts Mar 13, 2020
455fda5
Merge pull request #51 from e-budur/fixing-header-parameters-in-reade…
cgpotts Mar 15, 2020
fd0c75f
Control of sampling_rate for experiment train and test
cgpotts Mar 17, 2020
cc45c2e
Edits to text of rel_ext_01_task.ipynb
wcmac Mar 18, 2020
2e47ecd
Merge pull request #53 from cgpotts/2020-03-rel_ext_text-edits
cgpotts Mar 18, 2020
015a83b
Update np_rnn_classifier.py
wcmac Mar 19, 2020
2538c6a
Remove outdated NLI diagram
cgpotts Mar 21, 2020
4ef26f7
Updates to support the new Adversarial NLI dataset
cgpotts Mar 21, 2020
3db0d48
Minor edits to rel_ext notebooks
wcmac Mar 23, 2020
3bdb224
Merge pull request #54 from cgpotts/20200322-minor-edits
cgpotts Mar 23, 2020
7a692a7
Minor typo fix in hw_rel_ext.ipynb
wcmac Mar 24, 2020
f09d481
Improvements to glove2dict, get_vocab, create_pretrained_embedding
cgpotts Apr 1, 2020
1f89324
Proper handling of the optimizer for warm starts
cgpotts Apr 1, 2020
53b6218
Merge pull request #55 from cgpotts/elmo-bert
cgpotts Apr 1, 2020
148970d
Correct outdated data distribution links
cgpotts Apr 5, 2020
8a0e0df
Minor typo fix
wcmac Apr 9, 2020
2c377b9
Merge pull request #56 from cgpotts/typo-fix
wcmac Apr 9, 2020
6c6d0c2
Aligning weight update step with notebook description
cgpotts Apr 20, 2020
b400df1
Fix typo in nli_01_task_and_data.ipynb
wcmac Apr 27, 2020
37c3bbf
Error msg correction regarding vectorize=False limitations
cgpotts Apr 27, 2020
b08167e
Merge branch 'master' into 20200426-typo-fix
wcmac Apr 27, 2020
4415cfb
Merge pull request #57 from cgpotts/20200426-typo-fix
cgpotts Apr 27, 2020
36fe6ba
Typo fix in nli_02_models.ipynb
wcmac Apr 27, 2020
98fff80
Merge pull request #58 from cgpotts/20200427-typo-fix
wcmac Apr 27, 2020
7f01ab4
Fixing a typo in nli_02_models.ipynb
wcmac Apr 27, 2020
b0321d4
Merge pull request #59 from cgpotts/20200427-typo-fix-2
wcmac Apr 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ target/


# Data
data/*
trees/*
glove.6B/*
vsmdata/*
Expand Down
541 changes: 201 additions & 340 deletions LICENSE

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CS224u: Natural Language Understanding

Code for [the Stanford course](http://web.stanford.edu/class/cs224u/)
Code for [the Stanford course](http://web.stanford.edu/class/cs224u/). The code is written to run under Python 3.7; [setup.ipynb](setup.ipynb) provides additional details.

# Instructors

Expand Down
241 changes: 241 additions & 0 deletions colors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
from collections import defaultdict
import colorsys
import csv
import matplotlib.pyplot as plt
import matplotlib.patches as mpatch

__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"


TURN_BOUNDARY = " ### "


class ColorsCorpusReader:
"""Basic interface for the Stanford Colors in Context corpus:

https://cocolab.stanford.edu/datasets/colors.html

Parameters
----------
src_filename : str
Full path to the corpus file.
word_count : int or None
If int, then only examples with `word_count` words in their
'contents' field are included (as estimated by the number of
whitespqce tokens). If None, then all examples are returned.
normalize_colors : bool
The colors in the corpus are in HLS format with values
[0, 360], [0, 100], [0, 100]. If `normalize_colors=True`,
these are scaled into [0, 1], [0, 1], [0, 1].

Usage
-----
corpus = ColorsCorpusReader('filteredCorpus.csv')

for ex in corpus.read():
# ...

"""
def __init__(self, src_filename, word_count=None, normalize_colors=True):
self.src_filename = src_filename
self.word_count = word_count
self.normalize_colors = normalize_colors

def read(self):
"""The main interface to the corpus.

As in the paper, turns taken in the same game and round are
grouped together into a single `ColorsCorpusExample` instance
with the turn texts separated by `TURN_BOUNDARY`, formatted
as a string.

Yields
------
`ColorsCorpusExample` with the `normalize_colors` attribute set
as in `self.normalize_colors` in this class.

"""
grouped = defaultdict(list)
with open(self.src_filename) as f:
reader = csv.DictReader(f)
for row in reader:
if row['role'] == 'speaker' and self._word_count_filter(row):
grouped[(row['gameid'], row['roundNum'])].append(row)
for rows in grouped.values():
yield ColorsCorpusExample(
rows, normalize_colors=self.normalize_colors)

def _word_count_filter(self, row):
return self.word_count is None or \
row['contents'].count(" ") == (self.word_count-1)


class ColorsCorpusExample:
"""Interface to individual examples in the Stanford Colors in
Context corpus.

Parameters
----------
rows : list of dict
This contains all of the turns associated with a given game
and round. The assumption is that all of the key-value pairs
in these dicts are the same except for the 'contents' key.
normalize_colors : bool
The colors in the corpus are in HLS format with values
[0, 360], [0, 100], [0, 100]. If `normalize_colors=True`,
these are scaled into [0, 1], [0, 1], [0, 1].

Usage
-----
We assume that these instances are created by `ColorsCorpusReader`.
For an example of one being created directly, see
`test/test_colors.py::test_color_corpus_example`.

Note
----
There are values in the corpus that are present in `rows` but
not captured in attributes right now, to keep this code from
growing very complex. It should be straightforward to bring
in these additional attributes by subclassing this class.

"""
def __init__(self, rows, normalize_colors=True):
self.normalize_colors = normalize_colors
self.contents = TURN_BOUNDARY.join([r['contents'] for r in rows])
# Make sure our assumptions about these rows are correct:
self._check_row_alignment(rows)
row = rows[0]
self.gameid = row['gameid']
self.roundNum = int(row['roundNum'])
self.condition = row['condition']
self.outcome = row['outcome'] == 'true'
self.clickStatus = row['clickStatus']
self.color_data = []
for typ in ['click', 'alt1', 'alt2']:
self.color_data.append({
'type': typ,
'Status': row['{}Status'.format(typ)],
'rep': self._get_color_rep(row, typ),
'speaker': int(row['{}LocS'.format(typ)]),
'listener': int(row['{}LocL'.format(typ)])})
self.colors = self._get_reps_in_order('Status')
self.listener_context = self._get_reps_in_order('listener')
self.speaker_context = self._get_reps_in_order('speaker')

def parse_turns(self):
""""Turns the `contents` string into a list by splitting on
`TURN_BOUNDARY`.

Returns
-------
list of str

"""
return self.contents.split(TURN_BOUNDARY)

def display(self, typ='model'):
"""Prints examples to the screen in an intuitive format: the
utterance text appears first, following by the three color
patches, with the target identified by a black border in the
'speaker' and 'model' variants.

Parameters
----------
typ : str
Should be 'model', 'speaker', or 'listener'. This
determines the order the color patches are given. For
'speaker' and 'listener', this is the order in the corpus.
For 'model', it is a version with the two distractors
printed in their canonical order and the target given last.

Raises
------
ValueError
If `typ` isn't one of 'model', 'speaker', 'listener'.

Prints
------
text to standard output and three color patches as a
`matplotlib.pyplot` image. For notebook usage, this should
all embed nicely.

"""
print(self.contents)
if typ == 'model':
colors = self.colors
target_index = 2
elif typ == 'listener':
colors = self.listener_context
target_index = None
elif typ == 'speaker':
colors = self.speaker_context
target_index = self._get_target_index('speaker')
else:
raise ValueError('`typ` options: "model", "listener", "speaker"')

rgbs = [self._convert_hls_to_rgb(*c) for c in colors]

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(3, 1))

for i, c in enumerate(rgbs):
ec = c if (i != target_index or typ == 'listener') else "black"
patch = mpatch.Rectangle((0, 0), 1, 1, color=c, ec=ec, lw=8)
axes[i].add_patch(patch)
axes[i].axis('off')

def _get_color_rep(self, row, typ):
rep = []
for dim in ['H', 'L', 'S']:
colname = "{}Col{}".format(typ, dim)
rep.append(float(row[colname]))
if self.normalize_colors:
rep = self._scale_color(*rep)
return rep

def _convert_hls_to_rgb(self, h, l, s):
if not self.normalize_colors:
h, l, s = self._scale_color(h, l, s)
return colorsys.hls_to_rgb(h, l, s)

@staticmethod
def _scale_color(h, l, s):
return [h/360, l/100, s/100]

def _get_reps_in_order(self, field):
colors = [(d[field], d['rep']) for d in self.color_data]
return [rep for s, rep in sorted(colors)]

def _get_target_index(self, field):
for d in self.color_data:
if d['Status'] == 'target':
return d[field] - 1

@staticmethod
def _check_row_alignment(rows):
"""We expect all the dicts in `rows` to have the same
keys and values except for the keys associated with the
messages. This function tests this assumption holds.

"""
keys = set(rows[0].keys())
for row in rows[1:]:
if set(row.keys()) != keys:
raise RuntimeError(
"The dicts in the `rows` argument to `ColorsCorpusExample` "
"must have all the same keys.")
exempted = {'contents', 'msgTime',
'numRawWords', 'numRawChars',
'numCleanWords', 'numCleanChars'}
keys = keys - exempted
for row in rows[1: ]:
for key in keys:
if rows[0][key] != row[key]:
raise RuntimeError(
"The dicts in the `rows` argument to `ColorsCorpusExample` "
"must have all the same key values except for the keys "
"associated with the message. The key {} has values {} "
"and {}".format(key, rows[0][key], row[key]))

def __str__(self):
return self.contents
Loading