-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to the ConceptNet build process and some readers #148
Conversation
Snakefile
Outdated
CORE_DATASET_NAMES += ["emoji/{}".format(lang) for lang in EMOJI_LANGUAGES] | ||
|
||
|
||
DATASET_NAMES = CORE_DATASET_NAMES + ["dbpedia/dbpedia_en"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code-review-fixes-20171117
branch, I moved this part to after setting TESTMODE variables, to ensure that in the test mode CORE_DATASET_NAMES += ["emoji/{}".format(lang) for lang in EMOJI_LANGUAGES]
is being set with EMOJI_LANGUAGES
equal to ['en', 'en_001']
as opposed to all languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops, I should have paid more attention to the merge conflict.
Snakefile
Outdated
CORE_DATASET_NAMES += ["conceptnet4/conceptnet4_flat_{}".format(num) for num in range(10)] | ||
CORE_DATASET_NAMES += ["ptt_petgame/part{}".format(num) for num in range(1, 13)] | ||
CORE_DATASET_NAMES += ["wiktionary/{}".format(lang) for lang in WIKTIONARY_LANGUAGES] | ||
CORE_DATASET_NAMES += ["emoji/{}".format(lang) for lang in EMOJI_LANGUAGES] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still incorrect, because CORE_DATASET_NAMES
gets updated with emoji files before EMOJI_LANGUAGES
is overwritten in line 107.
output: | ||
DATA + "/raw/{dirname}/{filename}" | ||
shell: | ||
"wget -nv {RAW_DATA_URL}/{wildcards.dirname}/{wildcards.filename} -O {output}" | ||
"unzip {input} raw/{wildcards.dirname}/{wildcards.filename} -d {DATA}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like every time a single new file is added to the process, one would have to re-download the entire conceptnet-raw-data-5.5.zip
package. Would it be possible to unbundle it?
/r/MannerOf
, not/r/IsA