Skip to content

Commit

Permalink
Refactor deepdive-initdb
Browse files Browse the repository at this point in the history
  • Loading branch information
feiranwang committed Sep 20, 2015
1 parent 05852e4 commit 2bec633
Show file tree
Hide file tree
Showing 8 changed files with 117 additions and 57 deletions.
14 changes: 10 additions & 4 deletions doc/doc/advanced/deepdiveapp.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,21 @@ deepdive help run
### Initializing Database

```bash
deepdive initdb
deepdive initdb [TABLE]
```

This command initializes the underlying database configured for the application by creating necessary tables and loading the initial data into them.
It makes sure the following:
If `TABLE` is not given, it makes sure the following:

1. The configured database is created.
2. The tables defined in `schema.sql` are created.
3. The data that exist under `input/` are loaded into the tables with the help of `load.sh`.
2. The tables defined in `schema.sql` (for deepdive application) or `app.ddlog` (for ddlog application) are created.
3. The data that exists under `input/` is loaded into the tables with the help of `init.sh`.

If `TABLE` is given, it will make sure the following:

1. The configured database is created.
2. The given table is created.
3. The data that exists under `input/` is loaded into the `TABLE` with the help of `init_TABLE.sh`.


### Running Pipelines
Expand Down
57 changes: 32 additions & 25 deletions shell/deepdive-initdb
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
#!/usr/bin/env bash
# deepdive-initdb -- Initializes the underlying database for the DeepDive application
# > deepdive initdb
# Initializes the whole database.
#
# > deepdive initdb TABLE
# Initializes the given table.
##
set -eu

Expand All @@ -14,30 +18,33 @@ cd "$DEEPDIVE_APP"
# make sure database is created based on the database type
db-init "$@"

# make sure the necessary tables are all created
if [[ -e app.ddlog ]]; then
if [[ $# -gt 0 ]]; then
tmp=$(mktemp -d "${TMPDIR:-/tmp}"/deepdive-initdb.XXXXXXX)
trap 'rm -rf "$tmp"' EXIT
schema_json="$tmp"/schema.json
ddlog export-schema app.ddlog > "$schema_json"
for t in "$@"; do
deepdive-sql "DROP TABLE IF EXISTS $t CASCADE"
ddlog_initdb $schema_json $t | deepdive-sql
done
else
# TODO export schema.sql from ddlog instead of running initdb pipeline
deepdive-run initdb
generate_schema_json() {
tmp=$(mktemp -d "${TMPDIR:-/tmp}"/deepdive-initdb.XXXXXXX)
trap 'rm -rf "$tmp"' EXIT
schema_json="$tmp"/schema.json
ddlog export-schema app.ddlog > "$schema_json"
}

# if a list of table names given, initialize corresponding tables
if [[ $# -gt 0 ]]; then
[[ -e app.ddlog ]] || error "deepdive initdb TABLE is only available for ddlog applications"
generate_schema_json
for t in "$@"; do
schema_json_to_sql $schema_json $t | deepdive-sql
if [[ -x input/init_$t.sh ]]; then
input/init_$t.sh
fi
done
else # no arguments given, init database
if [[ -e app.ddlog ]]; then
generate_schema_json
schema_json_to_sql $schema_json | deepdive-sql
elif [[ -e schema.sql ]]; then
db-prompt <schema.sql
fi
! [[ -x input/init.sh ]] || {
# XXX set the legacy environment variables
export APP_HOME=$DEEPDIVE_APP
input/init.sh "$@"
}
fi
# run all DDL statements in schema.sql if available
if [[ -e schema.sql ]]; then
db-prompt <schema.sql
fi

# load the input data
! [[ -x input/init.sh && $# -eq 0 ]] || {
# XXX set the legacy environment variables
export APP_HOME=$DEEPDIVE_APP
input/init.sh "$@"
}
1 change: 1 addition & 0 deletions shell/driver.mysql/schema_json_to_sql
12 changes: 8 additions & 4 deletions shell/driver.postgresql/db-init
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@
##
set -eu

{
dropdb $DBNAME || true
createdb $DBNAME
} >/dev/null
if [[ $# -gt 0 ]]; then
createdb $DBNAME || true >/dev/null
else
{
dropdb $DBNAME || true
createdb $DBNAME
} >/dev/null
fi
37 changes: 37 additions & 0 deletions shell/driver.postgresql/schema_json_to_sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#! /usr/bin/env python
# Generate create table statement given a ddlog exported schema and a table name.
# Usage: ddlog_initdb SCHEMA.JSON TABLE_NAME

import json, sys

def generate_create_table_sql(schema, table):
columns_json = schema["relations"][table]["columns"]
# variable relation
if "variable_type" in schema["relations"][table]:
columns = range(len(columns_json) + 2)
columns[-2] = "id bigint"
label_type = "boolean" if schema["relations"][table]["variable_type"] == "boolean" else "int"
columns[-1] = "label " + label_type
else:
columns = range(len(columns_json))
for k, v in columns_json.iteritems():
columns[v["index"]] = "%s %s" %(k, v["type"])
return "DROP TABLE IF EXISTS %s CASCADE; CREATE TABLE %s(%s);" %(table, table, ", ".join(columns))

def main():
# load schema.json
with open(sys.argv[1]) as schema_file:
schema = json.load(schema_file)
# initialize all tables
if len(sys.argv) <= 2:
print ' '.join([generate_create_table_sql(schema, table) for table in schema["relations"].keys()])
else:
table = sys.argv[2]
# the given table is not in the schema, do nothing
if table not in schema["relations"]:
print ""
else:
print generate_create_table_sql(schema, table)

if __name__ == "__main__":
main()
1 change: 0 additions & 1 deletion stage.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ stage util/active.sh util/
stage util/calibration.py util/
stage util/calibration.plg util/
stage util/pgtsv_to_json util/
stage util/ddlog_initdb util/

# DDlog compiler
stage util/ddlog bin/
Expand Down
29 changes: 29 additions & 0 deletions test/postgresql/deepdive_initdb.bats
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,35 @@ setup() {
cd "$BATS_TEST_DIRNAME"/spouse_example
}

@test "$DBVARIANT schema_json_to_sql a single table" {
cd ddlog || skip
tmp=$(mktemp -d "${TMPDIR:-/tmp}"/deepdive-initdb.XXXXXXX)
schema_json="$tmp"/schema.json
ddlog export-schema app.ddlog > "$schema_json"
expected='DROP TABLE IF EXISTS articles CASCADE; CREATE TABLE articles(article_id text, text text);'
[[ $(schema_json_to_sql $schema_json articles) = "$expected" ]]
}

@test "$DBVARIANT schema_json_to_sql without arguments" {
cd ddlog || skip
tmp=$(mktemp -d "${TMPDIR:-/tmp}"/deepdive-initdb.XXXXXXX)
schema_json="$tmp"/schema.json
ddlog export-schema app.ddlog > "$schema_json"
expected='DROP TABLE IF EXISTS articles CASCADE;'
expected+=' CREATE TABLE articles(article_id text, text text);'
expected+=' DROP TABLE IF EXISTS people_mentions CASCADE;'
expected+=' CREATE TABLE people_mentions(sentence_id text, start_position int, length int, text text, mention_id text);'
expected+=' DROP TABLE IF EXISTS has_spouse_features CASCADE;'
expected+=' CREATE TABLE has_spouse_features(relation_id text, feature text);'
expected+=' DROP TABLE IF EXISTS has_spouse CASCADE;'
expected+=' CREATE TABLE has_spouse(relation_id text, id bigint, label boolean);'
expected+=' DROP TABLE IF EXISTS has_spouse_candidates CASCADE;'
expected+=' CREATE TABLE has_spouse_candidates(person1_id text, person2_id text, sentence_id text, description text, relation_id text, is_true boolean);'
expected+=' DROP TABLE IF EXISTS sentences CASCADE;'
expected+=' CREATE TABLE sentences(document_id text, sentence text, words text[], lemma text[], pos_tags text[], dependencies text[], ner_tags text[], sentence_offset int, sentence_id text);'
[[ $(schema_json_to_sql $schema_json) = "$expected" ]]
}

@test "$DBVARIANT initdb from ddlog" {
cd ddlog || skip
deepdive initdb articles
Expand Down
23 changes: 0 additions & 23 deletions util/ddlog_initdb

This file was deleted.

0 comments on commit 2bec633

Please sign in to comment.