Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ddlog update #376

Merged
merged 5 commits into from
Sep 24, 2015
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add support for command: deepdive initdb TABLE
  • Loading branch information
feiranwang committed Sep 4, 2015
commit 70e8342c2bafcdf7f69f5f6d448acaa643945c42
17 changes: 14 additions & 3 deletions shell/deepdive-initdb
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,27 @@ db-init "$@"

# make sure the necessary tables are all created
if [[ -e app.ddlog ]]; then
# TODO export schema.sql from ddlog instead of running initdb pipeline
deepdive-run initdb
if [[ $# -gt 0 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having these argument count checks buried deep inside, I think it's much clearer to define initdb's behavior entirely differently when arguments are specified. Please see my comment on the PR for reorganizing.

tmp=$(mktemp -d "${TMPDIR:-/tmp}"/deepdive-initdb.XXXXXXX)
trap 'rm -rf "$tmp"' EXIT
schema_json="$tmp"/schema.json
ddlog export-schema app.ddlog > "$schema_json"
for t in "$@"; do
deepdive-sql "DROP TABLE IF EXISTS $t CASCADE"
ddlog_initdb $schema_json $t | deepdive-sql
done
else
# TODO export schema.sql from ddlog instead of running initdb pipeline
deepdive-run initdb
fi
fi
# run all DDL statements in schema.sql if available
if [[ -e schema.sql ]]; then
db-prompt <schema.sql
fi

# load the input data
! [[ -x input/init.sh ]] || {
! [[ -x input/init.sh && $# -eq 0 ]] || {
# XXX set the legacy environment variables
export APP_HOME=$DEEPDIVE_APP
input/init.sh "$@"
Expand Down
1 change: 1 addition & 0 deletions stage.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ stage util/active.sh util/
stage util/calibration.py util/
stage util/calibration.plg util/
stage util/pgtsv_to_json util/
stage util/ddlog_initdb util/

# DDlog compiler
stage util/ddlog bin/
Expand Down
15 changes: 15 additions & 0 deletions test/postgresql/deepdive_initdb.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bats
# Tests for initdb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's more important to test the python script for generating CREATE TABLE statements?


. "$BATS_TEST_DIRNAME"/env.sh >&2

setup() {
cd "$BATS_TEST_DIRNAME"/spouse_example
}

@test "$DBVARIANT initdb from ddlog" {
cd ddlog || skip
deepdive initdb articles
[[ $(deepdive sql eval "SELECT * FROM articles" format=csv header=1) = 'article_id,text' ]]
deepdive sql "INSERT INTO articles VALUES ('foo', 'bar')"
}
23 changes: 23 additions & 0 deletions util/ddlog_initdb
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#! /usr/bin/env python
# Generate create table statement given a ddlog exported schema and a table name.
# Usage: ddlog_initdb SCHEMA.JSON TABLE_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give a better name to this script. How about schema_json_to_sql?

I was originally thinking this SQL generator should go under each driver, e.g., to handle DISTRIBUTED BY in GP and so on. If you agree, I think it'll be a matter of just moving this to shell/driver.postgresql/ and keeping a symlink or clone under driver.mysql/.


import json, sys

def main():
# load schema.json
with open(sys.argv[1]) as schema_file:
schema = json.load(schema_file)
table = sys.argv[2]
# the given table is not in the schema, do nothing
if table not in schema["relations"]:
print ""
else:
columns_json = schema["relations"][table]["columns"]
columns = range(len(columns_json))
for k, v in columns_json.iteritems():
columns[v["index"]] = "%s %s" %(k, v["type"])
print "CREATE TABLE %s(%s)\n" %(table, ", ".join(columns))

if __name__ == "__main__":
main()