Skip to content

Exports SQL Server and Postgres database objects as schema files. Exported objects include tables, views, stored procedures, and functions, plus also database properties including database roles and logins. The program can also export table data to create SQL files with an Insert Into statement for each row.

License

Notifications You must be signed in to change notification settings

PNNL-Comp-Mass-Spec/DB-Schema-Export-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DB Schema Export Tool

The DB Schema Export Tool exports SQL Server or PostgreSQL database objects as schema files. Exported objects include tables, views, stored procedures, functions, and synonyms, plus also database properties including database roles and logins.

Uses integrated authentication or a named user to connect to a Microsoft SQL Server database.

Uses a named user, password, and port to connect to a PostgreSQL database.

Continuous Integration

The latest version of the application is available for one month on the AppVeyor CI server

Build status

Features

In addition to creating schema files for tables, views, stored procedures, functions, and synonyms, the program can also export table data to create SQL files with an Insert statement for each row.

The DB Schema Export Tool has both a command line interface and a GUI mode. Starting the program without any parameters will show the GUI.

The command line interface is useful for automatically exporting schema files on a regular basis. Example command (wrapped here for readability):

DB_Schema_Export_Tool.exe
  C:\Cached_DBSchema
  /Server:Proteinseqs
  /DBList:Manager_Control,Protein_Sequences
  /Sync:"F:\Projects\Database_Schema\DMS"
  /Git /Commit
  /L /LogDir:Logs
  /Data:ProteinSeqs_Data_Tables.txt

Console switches

DB_Schema_Export_Tool.exe
 SchemaFileDirectory /Server:ServerName
 /DB:Database /DBList:CommaSeparatedDatabaseName
 [/DBUser:username] [/DBPass:password]
 [/PgUser:username] [/PgPass:password] [/PgPort:5432]
 [/DirectoryPrefix:PrefixText] [/NoSubdirectory] [/CreateDBDirectories]
 [/DataTables:TableDataToExport.txt] [/Map:ColumnMapping.txt]
 [/DataSort:PgDumpTableDataSortOrder.txt]
 [/TableFilterList] [/SchemaSkipList]
 [/ColumnFilter:FileName] [/DateFilter:FileName]
 [/NameFilter:FilterSpec]
 [/DataExportOrder:FileName]
 [/Schema:SchemaName] [/ExistingDDL:SchemaFileName]
 [/NoAutoData] [/ExportAllData] [/MaxRows:1000] [/NoData]
 [/DisableStatementLogging:true] [/MinLogDuration:5000]
 [/DeleteExtraRows] [/ForceTruncate:TableList]
 [/DisableTriggers] 
 [/ScriptLoad] [/ScriptUser:username] [/ScriptDB:Database] 
 [/ScriptHost:HostName] [/ScriptPort:Port]
 [/SnakeCase] [/PgDump] [/KeepPgDumpFile]
 [/PgInsert] [/PgInsertChunkSize:50000]
 [/ServerInfo] [/NoSchema]
 [/Sync:TargetDirectoryPath] [/Git] [/Svn] [/Hg] [/Commit]
 [/L] [/LogFile:BaseName] [/LogDir:LogDirectoryPath]
 [/Preview] [/Stats] [/Trace]
 [/ParamFile:ParamFileName.conf] [/CreateParamFile]

SchemaFileDirectory is the path to the directory where the schema files will be saved (aka the output directory)

  • Optionally use named parameter /OutputDir:DirectoryPath

To process a single database, use /Server and /DB

  • The server is assumed to be Microsoft SQL Server
  • However, if /PgUser is provided (or PgUser is defined in the parameter file), will treat as a PostgreSQL server

Use /DBList to process several databases (separate names with commas)

Use /DBUser and /DBPass to specify a username and password for connecting to a SQL Server instance

  • If not defined, will use Windows Integrated Authentication

Use /PgUser to define the username to use to connect to a PostgreSQL database

Use /PgPass to define the password for the PostgreSQL user

  • As described below, you can create a file named pgpass.conf at %APPDATA%\postgresql to track passwords for PostgreSQL users
  • The format is server:port:database:username:password
  • pg_dump.exe will obtain the password from this file if it exists
  • On Linux, the file is stored at ~/.pgpass

Use /PgPort to specify the port to use for PostgreSQL

  • Defaults to 5432

By default, a subdirectory named DBSchema__DatabaseName will be created below SchemaFileDirectory/

  • Customize this prefix text using /DirectoryPrefix

Use /NoSubdirectory to disable auto creating a subdirectory for the database being exported

  • Note: subdirectories will always be created if you use /DBList and specify more than one database

Use /CreateDBDirectories:False to disable creating a subdirectory for the schema files for each database

Use /DataTables or /Data to define a text file with table names (one name per line) for which the data should be exported. In addition to table names defined in /Data, there are default tables which will have their data exported; disable the defaults using /NoAutoData

  • Also supports a multi-column, tab-delimited format
    • Put <skip> in the TargetTableName column to indicate that the table should not be included when using /ExportAllData
  • The file can also track renamed views
  • File format
SourceTableName TargetSchemaName TargetTableName PgInsert KeyColumn(s)
T_Analysis_State_Name public t_analysis_job_state true job_state_id
T_DatasetRatingName public t_dataset_rating_name true dataset_rating_id
T_Log_Entries public t_log_entries false
T_Job_Events cap t_job_Events false
T_Job_State_Name cap t_job_state_name true job
T_Users public t_users true user_id
x_T_MgrState public <skip>
V_Cell_Culture public v_biomaterial
V_Cell_Culture_Date public v_biomaterial_date

Tables with PgInsert set to true will have data insert commands formatted as PostgreSQL compatible INSERT INTO statements using the ON CONFLICT (key_column) DO UPDATE SET syntax

  • This is only applicable when exporting data from SQL Server
  • Use the KeyColumn(s) column to specify the primary key (or pair of unique keys) used for determining a conflict
    • Any columns that are not key columns will be updated when a conflict is found
    • If PgInsert is true, but the KeyColumn(s) column is empty, the program will look for an identity column on the source table
  • Example generated SQL:
INSERT INTO mc.t_mgr_types (mt_type_id, mt_type_name, mt_active)
OVERRIDING SYSTEM VALUE
VALUES
  (1, 'Analysis', 1),
  (2, 'Archive', 0),
  (3, 'Archive Verify', 0),
  (4, 'Capture', 1),
  (5, 'Space', 1),
  (6, 'Data Import', 1)
ON CONFLICT (mt_type_id)
DO UPDATE SET
  mt_type_name = EXCLUDED.mt_type_name,
  mt_active = EXCLUDED.mt_active;

-- Set the sequence's current value to the maximum current ID
SELECT setval('mc.t_mgr_types_mt_type_id_seq', (SELECT MAX(mt_type_id) FROM mc.t_mgr_types));

-- View the current ID for the sequence
SELECT currval('mc.t_mgr_types_mt_type_id_seq');

Use /Map or /ColumnMap to define a tab-delimited text file mapping source column names to target column names, by table

  • Use keyword <skip> in the TargetColumnName column to indicate that a source column should not be included in the output file
  • If /SnakeCase is used to auto-change column names, mappings in this file will override the snake case auto-conversion
  • File format:
SourceTableName SourceColumnName TargetColumnName
T_Analysis_Job AJ_jobID job
T_Analysis_Job AJ_start start
T_Analysis_Job AJ_finish finish
t_users name_with_prn <skip>

Use /DataSort (or /PgDumpDataSort or /PgDumpTableDataSortOrder) to define a tab-delimited text file that defines the columns to use for sorting data exported using PgDump

  • The export file will be re-written with data sorted by the specified column numbers (1-based)
  • Table names should include the schema, e.g. "public.t_event_target" or "sw.t_job_state_name"
  • Sort_Columns should list the column number (or comma-separated list of column numbers) to sort the data on
    • "1" means to sort using the first column
    • "2, 3" means to sort using the second column, then the third column
  • Values in Sort_Numeric should be True, False, or an empty string
    • True means to assume the data in the first sort column is numeric (integer or double)
    • False is ignored; if any sort column has numbers for every row, it will be sorted as numeric
  • File format:
Table_Name Sort_Columns Sort_Numeric
cap.t_automatic_jobs 1, 2
cap.t_scripts 2
cap.t_scripts_history 3, 1
cap.t_step_tools 2, 1
cap.t_task_state_name 1 True
public.t_analysis_job_state 1 True
public.t_default_psm_job_parameters 2, 3, 4, 5, 6
public.t_user_operations_permissions 1, 2 True
sw.t_job_state_name 1 True
timetable.task 2, 1 True

Use /TableFilterList or /TableNameFilter to specify a table name (or comma separated list of names) to restrict table export operations

  • This is useful for exporting the data from just a single table (or a few tables)
  • This parameter does not support reading names from a file; it only supports actual table names
  • If a file was defined via the /DataTables parameter, the list of tables loaded from that file will be filtered by this list (matching both SourceTableName and TargetTableName)

Use /SchemaSkipList to specify a schema name (or comma separated list of names) of schema to skip

  • This is useful if using partitioned tables

Use /ColumnFilter or /TableDataColumnFilter to define a tab-delimited text file that defines columns to skip when exporting data from tables

  • The program auto-skips computed columns and timestamp columns when exporting data as PostgreSQL compatible INSERT INTO statements
  • File format:
Table Column Comment
T_Cached_Dataset_Links DS_RowVersion In PostgreSQL, tracked via data type xid; comes from t_dataset.xmin
T_Cached_Dataset_Links SPath_RowVersion In PostgreSQL, tracked via data type xid; comes from t_storage_path.xmin
T_EUS_Proposals Numeric_ID Computed column (redundant here since the program auto-skips computed columns)

Use /DateFilter or /TableDataDateFilter to define a tab-delimited text file that defines date filters to use when exporting data from tables

  • The data file will include the start date in the name, for example: mc.t_log_entries_Data_Since_2020-01-01.sql
  • File format:
SourceTableName DateColumnName MinimumDate
T_Event_Log Entered 2020-01-01
T_Log_Entries posting_time 2020-01-01
T_ParamValue last_affected 2020-01-01
T_ParamValue_OldManagers last_affected 2016-01-07

Use /NameFilter to define a filter to apply to exported tables, views, procedures, etc.

  • Will only export objects that contain this text in the name
    • This is only applicable when exporting objects from SQL Server
  • Use ^ to indicate the object names must start with the text, for example ^Auto to start with Auto
    • This only works if using a parameter file, since the Windows command line parser removes ^
  • Use $ to indicate the object names must end with the text, for example Report$ to end with Report
  • Other RegEx qualifiers are supported; matches are case insensitive
  • Multiple filters can be provided by separating with commas
    • At the command line, surround comma separated lists with double quotes
    • Will not split on commas if the object name filter has square brackets

Use /DataExportOrder to define a text file with table names (one name per line) defining the order that data should be exported from tables

  • Should have table names only, without schema
  • Tables not listed in this file will have their data exported alphabetically by table name
  • If the first line is Table_Name, will assume that it is a header line (and will thus ignore it)
  • If /ScriptLoad is used to create a shell script for loading data (see below), the data will be loaded in the order specified by the DataExportOrder file, followed by tables not listed in the file
    • This is useful for assuring that data is loaded into the database in an order that will guarantee that foreign keys are valid
  • For example, given a multi-campus university that tracks graduate students and professors using four tables:
graph TD;
    T_Campus-->T_Department;
    T_Department-->T_Professor;
    T_Department-->T_Student;
Loading

the DataExportOrder file should have the following:

T_Campus
T_Department
T_Professor
T_Student
  • Loading data in this order assures that when the department data is loaded, the Campus_ID for each department will refer to an existing row in the campus table
  • Next, when the professor and student data is loaded, the Department_ID for each person will refer to an existing row in the department table

Use /Schema or /DefaultSchema to define the default schema name to use when exporting data from tables

  • Entries in the /DataTables file will override this default schema

Use /ExistingDDL or /ExistingSchema to define a text file that should be parsed to rename columns, based on data loaded from the /ColumnMap file

  • Will create a new text file with CREATE TABLE blocks that include the new column names
    • Will skip columns with <skip>
  • Will skip tables (and views, procedures, etc.) that are defined in the /DataTables file but have <skip> in the TargetTableName column
    • Does not rename tables

Use /ExportAllData or /ExportAllTables to export data from every table in the database

  • Will skip tables (and views) that are defined in the /DataTables file but have <skip> in the TargetTableName column
  • Ignored if /NoTableData is true

Use /MaxRows to define the maximum number of data rows to export

  • Defaults to 1000
  • Use 0 to export all rows
  • If you use /ExportAllData but do not specify /MaxRows, the program will auto set /MaxRows to 0

Use /NoTableData or /NoData to prevent any table data from being exported

  • This parameter is useful when processing an existing DDL file with /ExistingDDL

By default, the table data load script will update the PostgreSQL system settings to set log_min_duration_statement to -1, which will disable logging statements that execute longer than a given threshold

  • If your system has log_min_duration_statement defined, and if the insert queries take longer than the threshold, the entire insert query (including all of the data values) will be written to the log file
    • If you are importing a large amount of data, this could lead to very large PostgreSQL log file sizes
  • By using /DisableStatementLogging:True (the default) you avoid the possibility of long-running INSERT INTO queries bloating up the log files
    • The database user must be a superuser to update the log_min_duration_statement system setting, since that setting affects all current and future connections in all databases on the server
  • After importing all of the data, the script will set log_min_duration_statement to 5000 (meaning to log statements that run longer than 5 seconds)
    • This can be customized using /MinLogDuration (or MinLogDurationAfterLoad= in a parameter file)
  • Use /DisableStatementLogging:False to disabling adjusting the value of log_min_duration_statement

Use /MinLogDuration to customize the value to set log_min_duration_statement to once all of the data has been imported

  • The default is /MinLogDuration:5000

Use /DeleteExtraRows to create files with commands to delete extra rows in database tables (provided the table has one or more primary keys)

  • If /ScriptLoad is enabled (or the parameter file has ScriptLoad=True), the shell script will use these files to delete extra rows prior to importing new data

  • If /DataExportOrder was used to specify the export order for table data, the shell script will delete extra rows from tables in the reverse order from that defined in the DataExportOrder file

    • The logic for this is that the purpose of the DataExportOrder file is to assure that
  • If a table has a single-column primary key, the delete queries will be of the form:

DELETE FROM "t_user_status" target
WHERE status BETWEEN 'Active' AND 'Obsolete' AND
      NOT EXISTS ( SELECT S.status
                   FROM "t_user_status" S
                   WHERE target.status = S.status AND
                         S.status in (
'Active','Inactive','Obsolete'
                         )
                  );
  • If a table has a multi-column based primary key, the delete queries will be of the form:
DELETE FROM t_internal_std_composition target
WHERE target.component_id BETWEEN 10 AND 72 AND
      NOT EXISTS ( SELECT S.component_id
                         ,S.mix_id
                   FROM t_internal_std_composition S
                   WHERE target.component_id = S.component_id AND
                         target.mix_id = S.mix_id AND
                         (
S.component_id = 10 AND S.mix_id = 1 OR
S.component_id = 10 AND S.mix_id = 3 OR
S.component_id = 11 AND S.mix_id = 1 OR
S.component_id = 11 AND S.mix_id = 3 OR
S.component_id = 13 AND S.mix_id = 1 OR
S.component_id = 13 AND S.mix_id = 3 OR
S.component_id = 21 AND S.mix_id = 5 OR
S.component_id = 22 AND S.mix_id = 5
                         )
                  );

Use /ForceTruncate to specify a table name (or comma separated list of names) that have a primary key, but you would rather truncate the table data instead of using either of the DELETE FROM queries shown above

  • This is useful for large tables where the DELETE FROM queries run very slowly
  • Names in this list can match either source table names or target table names
  • Ignored if /DeleteExtraRows is false

Use /DisableTriggers (or /IncludeDisableTriggerCommands) to include trigger disable / enable commands in data export files

  • Ignored if PgInsert is true since session_replication_role is set to "replica" prior to inserting/deleting data

Use /ScriptLoad or /Script to create a bash script file for loading exported table data into a PostgreSQL database

Use /ScriptUser to define the username to use when calling psql in the bash script

  • If not defined, will use the username being used to run the DB Schema Export Tool

Use /ScriptDB to define the target database name to use when calling psql in the bash script

  • The default is /ScriptDB:dms

Use /ScriptHost to define the host name to use when calling psql in the bash script

  • The default is /ScriptHost:localhost

Use /ScriptPort to define the port name to use when calling psql in the bash script

  • The default is /ScriptPort:5432
  • The port number will only be included in the bash script if it is not the default

Use /SnakeCase to auto change column names from Upper_Case and UpperCase to lower_case when exporting data from tables

  • Also used for table names when exporting data from tables
  • Entries in the /DataTables and /ColumnMap files will override auto-generated snake_case names

Use /PgDump or /PgDumpData to specify that exported data should use COPY commands instead of INSERT INTO statements

  • With SQL Server databases, table data will be exported using pg_dump compatible COPY commands when /PgDump is used
  • With PostgreSQL data, table data will be exported using the pg_dump application
  • With SQL Server data and PostgreSQL data, if /PgDump is not provided, data is exported with INSERT INTO statements
  • With SQL Server data, if /PgDump is provided, but the /DataTables file has true in the PgInsert column, INSERT INTO statements will be used for that table

Use /KeepPgDumpFile to not delete the _AllObjects_.sql created for each database

  • By default, this is false, meaning the _AllObjects_.sql file will be deleted (since it is no longer needed because a separate .sql file has been created for each object) ** If any unhandled scripting commands are encountered, the _AllObjects_.sql file will not be deleted, even if KeepPgDumpFile is false
  • In addition, when this is true, store the original PgDump table data files in directory Replaced_PgDump_Files

The /PgInsert applies when exporting data from SQL Server database tables

  • While reading the TableDataToExport.txt file, default the PgInsert column to true when /PgInsert is provided, meaning table data will be exported via INSERT INTO statements, as mentioned above
  • Ignored for PostgreSQL data

Use PgInsertChunkSize to specify the number of values to insert at a time when PgInsert is true for a table

  • Only applicable when exporting data from SQL Server

Use /ServerInfo to export server settings, logins, and SQL Server Agent jobs

Use /NoSchema to skip exporting schema (tables, views, functions, etc.)

Use /Sync to copy new/changed files from the output directory to an alternative directory

  • This is advantageous to prevent file timestamps from getting updated every time the schema is exported

Use /Git to auto-update any new or changed files using Git

Use /Svn to auto-update any new or changed files using Subversion

Use /Hg to auto-update any new or changed files using Mercurial

Use /Commit to commit any updates to the repository

Use /L to log messages to a file

Use /LogFile to specify the base name for the log file

  • Log files will be named using the base name, followed by the current date

Use /LogDir or /LogDirectory to specify the directory to save the log file in

  • By default, the log file is created in the current working directory

Use /Preview to count the number of database objects that would be exported

Use /Stats to show (but not log) export stats

Use /Trace to show additional debug messages at the console

The processing options can be specified in a parameter file using /ParamFile:Options.conf or /Conf:Options.conf or /P:Options.conf

  • Define options using the format ArgumentName=Value
  • Lines starting with # or ; will be treated as comments
  • Additional arguments on the command line can supplement or override the arguments in the parameter file

Use /CreateParamFile to create an example parameter file

  • By default, the example parameter file content is shown at the console
  • To create a file named Options.conf, use /CreateParamFile:Options.conf

Software Dependencies

This program leverages the SQL Server Management Objects (SMO) Framework

  • All of the required DLLs should be included in the DBSchemaExportTool .zip file

In order to use the /Git, /Svn, or /Hg switches, you need the following software installed and the executables present at a specific location:

Git

Subversion:

Mercurial

PostgreSQL

Schema export from PostgreSQL databases utilizes the pg_dump utility

  • pg_dump.exe is available as part of the PostgreSQL for Windows installer from the EDB website
  • Run the installer
    • De-select installing PostgreSQL Server
  • Copy pg_dump.exe plus several DLLs from C:\Program Files\PostgreSQL\16\bin to the directory with the DB_Schema_Export_Tool executable
    • See batch file Copy_pg_dump_files.bat

File pgpass.conf at %APPDATA%\postgresql can be used to store the password for the PostgreSQL user

  • On Linux, the file is stored at ~/.pgpass
mkdir %APPDATA%\postgresql
echo prismweb3:5432:dms:dmsreader:dmspasswordhere > %APPDATA%\postgresql\pgpass.conf

Example call to pg_dump:

pg_dump.exe -h prismweb3 -p 5432 -U dmsreader -d dms --schema-only --format=p --file=_AllObjects_.sql

Note that on Windows the .sql file created by pg_dump.exe includes extra carriage returns, resulting in CR CR LF instead of CR LF

  • DB_Schema_Export_Tool.exe removes the extra carriage returns while parsing the SQL dump file

Contacts

Written by Matthew Monroe for the Department of Energy (PNNL, Richland, WA)
E-mail: matthew.monroe@pnnl.gov or proteomics@pnnl.gov
Website: https://github.com/PNNL-Comp-Mass-Spec/ or https://www.pnnl.gov/integrative-omics Source code: https://github.com/PNNL-Comp-Mass-Spec/DB-Schema-Export-Tool

License

The DB Schema Export Tool is licensed under the 2-Clause BSD License; you may not use this program except in compliance with the License. You may obtain a copy of the License at https://opensource.org/licenses/BSD-2-Clause

Copyright 2019 Battelle Memorial Institute

About

Exports SQL Server and Postgres database objects as schema files. Exported objects include tables, views, stored procedures, and functions, plus also database properties including database roles and logins. The program can also export table data to create SQL files with an Insert Into statement for each row.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published