Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ddlog update #376

Merged
merged 5 commits into from
Sep 24, 2015
Merged

Ddlog update #376

merged 5 commits into from
Sep 24, 2015

Conversation

feiranwang
Copy link
Contributor

  1. Fix delta deriver for incremental function call, revise syntax.
  2. Add support for globally auto-set parallelism for extractors.
  3. deepdive initdb TABLE command for initializing a single table.

@chrismre
Copy link
Contributor

@feiranwang Is this waiting @netj approval? He should be assigned :)

@@ -91,6 +91,15 @@ fullConfig=$run_dir/deepdive.conf
ddlog compile "${ddlogFiles[@]}"
export PIPELINE= # XXX ddlog shouldn't emit this
: ${Pipeline:=endtoend}

# set PARALLELISM env var, use max parallelism if the variable is not set
if [[ $(uname) = 'Linux' ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if nproc or sysctl is available makes more sense than relying on uname. You could chain the options with something like:

: ${PARALLELISM:=$({
    # Linux typically has coreutils which includes nproc
    nproc ||
    # OS X
    sysctl -n hw.ncpu ||
    # fall back to 1
    echo 1
} 2>/dev/null)}

@netj
Copy link
Contributor

netj commented Sep 13, 2015

Doesn't deepdive initdb TABLE still drop and create the whole database before creating the given table, affecting others? Here's what I think users expect from the initdb command:

  • When there are arguments, DD should drop/create/load just those specified tables. Assuming it's a DDlog app, DD should drop/create the tables from the DDlog schema, then optionally load data to the new tables from an assumed path under input/ by some naming convention. It's an error if it's not a DDlog app.
  • When no argument is given, DD should drop/create all known tables. If it's a DDlog app, all tables defined in the schema should be created then loaded as if the names were all given manually. If it's not DDlog, it should rely on schema.sql and input/init.sh to initialize the database. For this last non-DDlog case, DD should perhaps do a dropdb to be backward compatible.

In any case, DD should first make sure the database is created.

@@ -16,8 +16,19 @@ db-init "$@"

# make sure the necessary tables are all created
if [[ -e app.ddlog ]]; then
# TODO export schema.sql from ddlog instead of running initdb pipeline
deepdive-run initdb
if [[ $# -gt 0 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having these argument count checks buried deep inside, I think it's much clearer to define initdb's behavior entirely differently when arguments are specified. Please see my comment on the PR for reorganizing.

@netj
Copy link
Contributor

netj commented Sep 13, 2015

Nice updates. Please see my comments.

@feiranwang
Copy link
Contributor Author

Thanks! Will update accordingly.

@feiranwang
Copy link
Contributor Author

@netj Updated. Thanks!

@feiranwang feiranwang force-pushed the ddlog_update branch 2 times, most recently from 30fda2e to 2bec633 Compare September 20, 2015 10:40
@netj
Copy link
Contributor

netj commented Sep 24, 2015

Looks good, merging.

netj added a commit that referenced this pull request Sep 24, 2015
@netj netj merged commit 3f5c360 into master Sep 24, 2015
@netj netj deleted the ddlog_update branch September 24, 2015 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants