Skip to content

Commit

Permalink
documents column reordering
Browse files Browse the repository at this point in the history
  • Loading branch information
pruizf committed Oct 9, 2023
1 parent 2d7a9fd commit 4135ff7
Show file tree
Hide file tree
Showing 3 changed files with 109 additions and 110 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ For more details about the categorization, see [our paper](https://univoak.eu/is
| ---- | ---- | --- |
| 0 | speaker | Character name |
| 1 | gender | Character gender |
| 2 | author | author name |
| 3 | date | A date for the play |
| 4 | date_type | When it was written, first printed, or print date for the edition we used |
| 5 | social_class | Character social class, we estimated this based on information in the *dramatis personæ* |
| 6 | job | Character's profession as in the *dramatis personæ* |
| 7 | job_category | Professional category using our own taxonomy |
| 2 | social_class | Character social class, we estimated this based on information in the *dramatis personæ* |
| 3 | job | Character's profession as in the *dramatis personæ* |
| 4 | job_category | Professional category using our own taxonomy |
| 5 | author | author name |
| 6 | date | A date for the play |
| 7 | date_type | When it was written, first printed, or print date for the edition we used |
| 8 | segment_number | For emotion analysis, the plays get divided into homogeneous segments. This field can be ignored for other purposes. |
| 9 | play_short_name | Corresponds to the play's filename in the TEI directories (without *.xml*) |
| 10 | genre | We have comedy, drama, volksstueck, tale (*Märel*) |
Expand Down
206 changes: 103 additions & 103 deletions metadata_analysis.ipynb

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion pre_treatment/script/postprocess_character_speech_df.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@

# some speaker names had trailing whitespace
outdf['speaker'] = outdf.speaker.apply(lambda x:x.strip())
#outdf['date'] = outdf.date.astype(int)

# write out
outdf.to_csv(outdf_path, index=False)

0 comments on commit 4135ff7

Please sign in to comment.