Skip to content

Commit 1362565

Browse files
author
Daniel Khashabi
authored
Merge pull request #688 from mssammon/master
update core-utilities readme with configurator info
2 parents 738c990 + ce376c6 commit 1362565

File tree

6 files changed

+158
-32
lines changed

6 files changed

+158
-32
lines changed

commasrl/pom.xml

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
<pluginRepository>
2121
<id>CogcompSoftware</id>
2222
<name>CogcompSoftware</name>
23-
<url>http://cogcomp.cs.illinois.edu/m2repo/</url>
23+
<url>http://macniece.seas.upenn.edu/m2repo/</url>
2424
</pluginRepository>
2525
</pluginRepositories>
2626

@@ -164,16 +164,4 @@
164164
</plugin>
165165
</plugins>
166166
</reporting>
167-
168-
<distributionManagement>
169-
<repository>
170-
<id>CogcompSoftware</id>
171-
<name>CogcompSoftware</name>
172-
<url>scp://legolas.cs.illinois.edu:/srv/data/cogcomp/html/m2repo</url>
173-
</repository>
174-
<site>
175-
<id>CogcompSoftwareDoc</id>
176-
<url>scp://legolas.cs.illinois.edu:/srv/data/cogcomp/html/software/doc/${project.artifactId}</url>
177-
</site>
178-
</distributionManagement>
179167
</project>

core-utilities/README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,64 @@ to specify only non-default configuration options when you instantiate
401401
one or more classes that use `ResourceManager` and `Configurator`
402402
to manage configuration options.
403403

404+
A configurator must inherit from `edu.illinois.cs.cogcomp.core.utilities.configuration.Configurator.java`
405+
and override the `getDefaultConfig()` method. Specify any configuration parameters as public static
406+
`edu.illinois.cs.cogcomp.core.utilities.configuration.Property` fields in the Configurator subclass you
407+
create. This specifies configuration parameter keys and default values, and makes the names available
408+
to clients.
409+
410+
The `getDefaultConfig()` method should simply populate a `ResourceManager` object with all default values --
411+
there are convenience methods in `Configurator` to make this easier.
412+
413+
When implementing your Annotator's simplest constructor, instantiate your configurator subclass and
414+
call its `getDefaultConfig()` method. Implement a second Constructor that takes a `ResourceManager`
415+
as its argument, instantiate your configurator, and call its `getConfig(ResourceManager)` method.
416+
417+
Here's the constructor code from ChunkAnnotator:
418+
```
419+
public ChunkerAnnotator(boolean lazilyInitialize) {
420+
this(lazilyInitialize, new ChunkerConfigurator().getDefaultConfig());
421+
}
422+
423+
public ChunkerAnnotator(boolean lazilyInitialize, ResourceManager rm) {
424+
super(ViewNames.SHALLOW_PARSE, new String[] {ViewNames.POS}, lazilyInitialize, new ChunkerConfigurator().getConfig(rm));
425+
}
426+
```
427+
428+
This allows you to only specify configuration parameters you wish to modify from their default values, using the keys specified
429+
in the relevant Configurator's Property fields. You can
430+
either directly populate a java `Properties` object with these key/value pairs, then instantiate a `ResourceManager` and call
431+
your Configurator's `getConfig()` method with that; or you can write these non-default values into a text file with each
432+
key/value pair on a new line, with the key and value separated by a tab character.
433+
434+
For the Chunker example, suppose you want to change the model path. In `ChunkerConfigurator` this is specified as:
435+
```
436+
public static final Property MODEL_PATH = new Property("modelPath", MODEL_DIR_PATH.value
437+
+ MODEL_NAME.value + ".lc");
438+
```
439+
440+
To directly populate a Properties object:
441+
```
442+
import edu.illinois.cs.cogcomp.core.resources.ResourceManager;
443+
import java.util.Properties;
444+
import edu.illinois.cs.cogcomp.chunker.main.ChunkerAnnotator;
445+
446+
Properties props = new Properties;
447+
props.setProperty(ChunkerConfigurator.MODEL_PATH.key, "/some/other/path");
448+
ChunkerAnnotator ca = new ChunkerAnnotator(new ResourceManager(props));
449+
```
450+
451+
To use a text file instead, create a text file (for this example, "config/altChunker.config") with the single line:
452+
```
453+
modelPath /some/other/path
454+
```
455+
456+
...and use it to instantiate a ResourceManager object:
457+
458+
```
459+
ChunkerAnnotator ca = new ChunkerAnnotator(new ResourceManager("config/altChunker.config"));
460+
```
461+
404462
### Serialization and Deserialization
405463

406464
To store `TextAnnotation` objects on disk, serialization/deserialization is supported in the following formats:

core-utilities/src/test/java/edu/illinois/cs/cogcomp/utilities/JsonSerializerTest.java

Lines changed: 90 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.Relation;
1919
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation;
2020
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.View;
21+
import edu.illinois.cs.cogcomp.core.io.IOUtils;
2122
import edu.illinois.cs.cogcomp.core.utilities.DummyTextAnnotationGenerator;
2223
import edu.illinois.cs.cogcomp.core.utilities.JsonSerializer;
2324
import edu.illinois.cs.cogcomp.core.utilities.SerializationHelper;
@@ -26,20 +27,21 @@
2627
import org.slf4j.Logger;
2728
import org.slf4j.LoggerFactory;
2829

30+
import java.io.IOException;
2931
import java.util.Arrays;
3032
import java.util.HashMap;
3133
import java.util.Map;
3234
import java.util.TreeMap;
3335

34-
import static org.junit.Assert.assertEquals;
35-
import static org.junit.Assert.assertNotNull;
36+
import static org.junit.Assert.*;
3637

3738
/**
3839
* Simple sanity tests for JsonSerializer.
3940
*
4041
* @author mssammon
4142
*/
4243
public class JsonSerializerTest {
44+
private static final String RHYME_VIEW_NAME = "rhyme";
4345
private static Logger logger = LoggerFactory.getLogger(JsonSerializerTest.class);
4446

4547
TextAnnotation ta = DummyTextAnnotationGenerator.generateAnnotatedTextAnnotation(new String[] {
@@ -122,7 +124,19 @@ public static void verifyDeserializedJsonString(String json, TextAnnotation ta)
122124

123125
@Test
124126
public void testSerializerWithCharOffsets() {
125-
View rhymeView = new View("rhyme", "test", ta, 0.4 );
127+
128+
addRhymeViewToTa(ta);
129+
130+
String taJson = SerializationHelper.serializeToJson(ta, true);
131+
logger.info(taJson);
132+
133+
JsonObject jobj = (JsonObject) new JsonParser().parse(taJson);
134+
JsonSerializerTest.verifySerializedJSONObject(jobj, ta);
135+
}
136+
137+
138+
private static void addRhymeViewToTa(TextAnnotation someTa) {
139+
View rhymeView = new View(RHYME_VIEW_NAME, "test", someTa, 0.4);
126140

127141
Map< String, Double > newLabelsToScores = new TreeMap< String, Double >();
128142
String[] labels = { "eeny", "meeny", "miny", "mo" };
@@ -131,32 +145,26 @@ public void testSerializerWithCharOffsets() {
131145
for ( int i = 0; i < labels.length; ++i )
132146
newLabelsToScores.put(labels[i], scores[i]);
133147

134-
Constituent first = new Constituent( newLabelsToScores, "rhyme", ta, 2, 4 );
148+
Constituent first = new Constituent(newLabelsToScores, RHYME_VIEW_NAME, someTa, 2, 4);
135149
rhymeView.addConstituent(first);
136150

137151
/**
138152
* no constraint on scores -- don't have to sum to 1.0
139153
*/
140154
for ( int i = labels.length -1; i > 0; --i )
141-
newLabelsToScores.put( labels[i], scores[3-i] );
155+
newLabelsToScores.put(labels[i], scores[3-i]);
142156

143-
Constituent second = new Constituent( newLabelsToScores, "rhyme", ta, 2, 4 );
157+
Constituent second = new Constituent(newLabelsToScores, RHYME_VIEW_NAME, someTa, 2, 4);
144158
rhymeView.addConstituent(second);
145159

146160
Map<String, Double> relLabelsToScores = new TreeMap<>();
147-
relLabelsToScores.put( "Yes", 0.8 );
148-
relLabelsToScores.put( "No", 0.2 );
161+
relLabelsToScores.put("Yes", 0.8);
162+
relLabelsToScores.put("No", 0.2);
149163

150164
Relation rel = new Relation( relLabelsToScores, first, second );
151165
rhymeView.addRelation(rel);
152166

153-
ta.addView("rhyme", rhymeView);
154-
155-
String taJson = SerializationHelper.serializeToJson(ta, true);
156-
logger.info(taJson);
157-
158-
JsonObject jobj = (JsonObject) new JsonParser().parse(taJson);
159-
JsonSerializerTest.verifySerializedJSONObject(jobj, ta);
167+
someTa.addView(RHYME_VIEW_NAME, rhymeView);
160168
}
161169

162170
@Test
@@ -171,4 +179,71 @@ public void testJsonSerializabilityWithOffsets() throws Exception {
171179

172180
JsonSerializerTest.verifyDeserializedJsonString(json, ta);
173181
}
182+
183+
/**
184+
* make sure that if an already serialized TextAnnotation object is modified and reserialized,
185+
* (and written to the same target file), that the file is updated correctly
186+
*/
187+
@Test
188+
public void testJsonSerializedTaUpdate() {
189+
190+
// make sure we aren't using a TA already updated with "rhyme" view
191+
TextAnnotation localTa = DummyTextAnnotationGenerator.generateAnnotatedTextAnnotation(new String[] {
192+
ViewNames.POS, ViewNames.NER_CONLL, ViewNames.SRL_VERB}, false, 3); // no noise
193+
194+
String serTestDir = "serTestDir";
195+
if(!IOUtils.exists(serTestDir))
196+
IOUtils.mkdir(serTestDir);
197+
else if (IOUtils.isFile(serTestDir))
198+
throw new IllegalStateException("ERROR: test directory " + serTestDir + " already exists as file.");
199+
else
200+
try {
201+
IOUtils.cleanDir(serTestDir);
202+
} catch (IOException e) {
203+
e.printStackTrace();
204+
throw new IllegalStateException("ERROR: test directory " + serTestDir + " could not be cleaned. Permissions?");
205+
}
206+
if (!IOUtils.getListOfFilesInDir(serTestDir).isEmpty())
207+
throw new IllegalStateException("ERROR: test directory " + serTestDir + " already contains files even after cleaning.");
208+
209+
String fileName = serTestDir + "/arbitrary.json";
210+
boolean forceOverwrite = true;
211+
boolean useJson = true;
212+
try {
213+
SerializationHelper.serializeTextAnnotationToFile(localTa, fileName, forceOverwrite, useJson);
214+
} catch (IOException e) {
215+
e.printStackTrace();
216+
fail("error trying to serialize json file " + fileName + ".");
217+
}
218+
219+
TextAnnotation taDeser = null;
220+
try {
221+
taDeser = SerializationHelper.deserializeTextAnnotationFromFile(fileName, useJson);
222+
} catch (Exception e) {
223+
e.printStackTrace();
224+
fail("error trying to deserialize json file " + fileName + ".");
225+
}
226+
assertTrue(taDeser.hasView(ViewNames.SRL_VERB));
227+
assertFalse(taDeser.hasView(RHYME_VIEW_NAME));
228+
addRhymeViewToTa(taDeser);
229+
assertTrue(taDeser.hasView(RHYME_VIEW_NAME));
230+
231+
try {
232+
SerializationHelper.serializeTextAnnotationToFile(taDeser, fileName, forceOverwrite, useJson);
233+
} catch (IOException e) {
234+
e.printStackTrace();
235+
fail("error trying to serialize json file " + fileName + " for second time.");
236+
}
237+
238+
TextAnnotation taDeserDeser = null;
239+
try {
240+
taDeserDeser = SerializationHelper.deserializeTextAnnotationFromFile(fileName, useJson);
241+
} catch (Exception e) {
242+
e.printStackTrace();
243+
fail("error trying to deserialize json file " + fileName + " for second time.");
244+
}
245+
246+
assertTrue(taDeserDeser.hasView(RHYME_VIEW_NAME));
247+
assertTrue(taDeserDeser.getView(RHYME_VIEW_NAME).getConstituents().size() > 0);
248+
}
174249
}

pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,19 +53,19 @@
5353
<repository>
5454
<id>CogcompSoftware</id>
5555
<name>CogcompSoftware</name>
56-
<url>http://cogcomp.org/m2repo/</url>
56+
<url>http://macniece.seas.upenn.edu/m2repo/</url>
5757
</repository>
5858
</repositories>
5959

6060
<distributionManagement>
6161
<repository>
6262
<id>CogcompSoftware</id>
6363
<name>CogcompSoftware</name>
64-
<url>scp://legolas.cs.illinois.edu:/srv/data/cogcomp/html/m2repo</url>
64+
<url>scp://macniece.seas.upenn.edu:/pool0/webserver/htdocs/public/m2repo</url>
6565
</repository>
6666
<site>
6767
<id>CogcompSoftwareDoc</id>
68-
<url>scp://legolas.cs.illinois.edu:/srv/data/cogcomp/html/software/doc/</url>
68+
<url>scp://macniece.seas.upenn.edu:/pool0/webserver/htdocs/public/software/doc/</url>
6969
</site>
7070
</distributionManagement>
7171

pos/scripts/generateParenPosFormat.pl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@
2929
use strict;
3030
use Carp;
3131

32+
33+
# wsj-dir must be the path to the 'parsed/' directory of a Penn Treebank
34+
# distribution.
3235
croak "Usage: $0 wsj-dir out-dir" unless @ARGV == 2;
3336

3437
my $wsjDir = shift;

pos/src/main/java/edu/illinois/cs/cogcomp/pos/POSConfigurator.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@
2121
public class POSConfigurator extends Configurator {
2222
/** A configurable prefix. */
2323
public static final Property CORPUS_PREFIX = new Property("corpusPrefix",
24-
"/shared/corpora/corporaWeb/written/eng/POS/");
24+
"/shared/experiments/mssammon/workspace-github/cogcomp-nlp/pos/corpus-paren-format/");
25+
// "/shared/corpora/corporaWeb/written/eng/POS/");
26+
2527
/** The file containing the training set. */
2628
public static final Property TRAINING_DATA = new Property("trainingData", CORPUS_PREFIX.value
2729
+ "00-18.br");

0 commit comments

Comments
 (0)