Skip to content

Commit c3a3f37

Browse files
authored
add tpch-dbgen (#4658)
1 parent 70713cc commit c3a3f37

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+30331
-0
lines changed

ydb/library/benchmarks/gen/tpch-dbgen/LICENSE

Lines changed: 320 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# @(#)PORTING.NOTES 2.1.8.1
2+
3+
Table of Contents
4+
==================
5+
1. General Program Structure
6+
2. Naming Conventions and Variable Usage
7+
3. Porting Procedures
8+
4. Compilation Options
9+
5. Customizing QGEN
10+
6. Further Enhancements
11+
7. Known Porting Problems
12+
8. Reporting Problems
13+
14+
1. General Program Structure
15+
16+
The code provided with TPC-H and TPC-R benchmarks includes a database
17+
population generator (DBGEN) and a query template translator(QGEN). It
18+
is written in ANSI-C, and is meant to be easily portable to a broad variety
19+
of platforms. The program is composed of five source files and some
20+
support and header files. The main modules are:
21+
22+
build.c: each table in the database schema is represented by a
23+
routine mk_XXXX, which populates a structure
24+
representing one row in table XXXX.
25+
See Also: dss_types.h, bm_utils.c, rnd.*
26+
print.c: each table in the database schema is represented by a
27+
routine pr_XXXX, which prints the contents of a
28+
structure representing one row in table XXX.
29+
See Also: dss_types.h, dss.h
30+
driver.c: this module contains the main control functions for
31+
DBGEN, including command line parsing, distribution
32+
management, database scaling and the calls to mk_XXXX
33+
and pr_XXXX for each table generated.
34+
qgen.c: this module contains the main control functions for
35+
QGEN, including query template parsing.
36+
varsub.c: each query template includes one or more parameter
37+
substitution points; this routine handles the
38+
parameter generation for the TPC-H/TPC-R benchmark.
39+
40+
The support utilities provide a generalized set of functions for data
41+
generation and include:
42+
43+
bm_utils.c: data type generators, string management and
44+
portability routines.
45+
46+
rnd.*: a general purpose random number generator used
47+
throughout the code.
48+
49+
dss.h:
50+
shared.h: a set of '#defines' for limits, formats and fixed
51+
values
52+
dsstypes.h: structure definitions for each table definition
53+
54+
2. Naming Conventions and Variable Usage
55+
56+
Since DBGEN will be maintained by a large number of people, it is
57+
particularly important to observe the coding, variable naming and usage
58+
conventions detailed here.
59+
60+
#define
61+
--------
62+
All #define directives are found in header files (*.h). In general,
63+
the header files segregate variables and macros as follows:
64+
rnd.h -- anything exclusively referenced by rnd.c
65+
dss.h -- general defines for the benchmark, including *all*
66+
extern declarations (see below).
67+
shared.h -- defines related to the tuple definitions in
68+
dsstypes.h. Isolated to ease automatic processing needed by many
69+
direct load routines (see below).
70+
dsstypes.h -- structure definitons and typedef directives to
71+
detail the contents of each table's tuples.
72+
config.h -- any porting and configuration related defines should
73+
go here, to localize the changes necessary to move the suite
74+
from one machine to another.
75+
tpcd.h -- defines related to QGEN, rather than DBGEN
76+
77+
extern
78+
------
79+
DBGEN and QGEN make extensive use of extern declarations. This could
80+
probably stand to be changed at some point, but has made the rapid
81+
turnaround of prototypes easier. In order to be sure that each
82+
declaration was matched by exactly one definition per executatble,
83+
they are all declared as EXTERN, a macro dependent on DECLARER. In
84+
any module that defines DECLARER, all variables declared EXTERN will
85+
be defined as globals. DECLARER should be declared only in modules
86+
containing a main() routine.
87+
88+
Naming Conventions
89+
------------------
90+
defines
91+
o All defines use upper case
92+
o All defines use a table prefix, if appropriate:
93+
O_* relates to orders table
94+
L_* realtes to lineitem table
95+
P_* realtes to part table
96+
PS_* relates to partsupplier table
97+
C_* realtes to customer table
98+
S_* relates to supplier table
99+
N_* relates to nation table
100+
R_* realtes to region table
101+
T_* relates to time table
102+
o All defines have a usage prefix, if appropriate:
103+
*_TAG environment variable name
104+
*_DFLT environment variable default
105+
*_MAX upper bound
106+
*_MIN lower bound
107+
*_LEN average length
108+
*_SD random number seed (see rnd.*)
109+
*_FMT printf format string
110+
*_SCL divisor (for scaled arithmetic)
111+
*_SIZE tuple length
112+
113+
3. Porting Procedures
114+
115+
The code provided should be easily portable to any machine providing an
116+
ANSI C compiler.
117+
-- Copy makefile.suite to makefile
118+
-- Edit the makefile to match the name of your C compiler
119+
and to include appropriate compilation options in the CFLAGS
120+
definition
121+
-- make.
122+
123+
Special care should be taken in modifying any of the monetary calcu-
124+
lations in DBGEN. These have proven to be particularly sensitive to
125+
portability problems. If you decide to create the routines for inline
126+
data load (see below), be sure to compare the resulting data to that
127+
generated by a flat file data generation to be sure that all numeric
128+
conversions have been correct.
129+
130+
If the compile generates errors, refer to "Compilation Options", below.
131+
The problem you are encountering may already have been addressed in the
132+
code.
133+
134+
If the compile is successful, but QGEN is not generating the appropriate
135+
query syntax for your environment, refer to "Customizing QGEN", below.
136+
137+
For other problems, refer to "Reporting Problems" at the end of this
138+
document.
139+
140+
4. Compilation Options
141+
142+
config.h and makefile.suite contain a number of compile time options intended
143+
to make the process of porting the code provided with TPC-H/TPC-R as easy as
144+
possible on a broad range of platforms. Most ports should consist of reviewing
145+
the possible settings described in config.h and modifying the makefile
146+
to employ them appropriately.
147+
148+
5. Customizing QGEN
149+
150+
QGEN relies on a number of vendor-specific conventions to generate
151+
appropriate query syntax. These are controlled by #defines in tpcd.h,
152+
and enabled by a #define in config.h. If you find that the syntax
153+
generated by QGEN is not sufficient for your environment you will need
154+
to modify these to files. It is strongly recomended that you not change
155+
the general organization of the files.
156+
157+
Currently defined options are:
158+
159+
VTAG -- marks a variable substitution point [:]
160+
QDIR_TAG -- environent variable which points to query templates
161+
[DSS_QUERY]
162+
GEN_QUERY_PLAN -- syntax to generate a query plan ["Set Explain On;"]
163+
START_TRAN -- syntax to begin a transaction ["Begin Work;"]
164+
END_TRAN -- syntax to end a transaction ["Commit Work;"]
165+
SET_OUTPUT -- syntax to redirect query output ["Output to"]
166+
SET_ROWCOUNT -- syntax to set the number of rows returned
167+
["{return %d rows}"]
168+
SET_DBASE -- syntax to connect to a database
169+
170+
6. Further Enhancements
171+
172+
load_stub.c provides entry points for two likely enhancements.
173+
174+
The ld_XXXX routines make it possible to load the
175+
database directly from DBGEN without first writing the database
176+
population out to the filesystem. This may prove particularly useful
177+
when loading larger database populations. Be particularly careful about
178+
monetary amounts. To assure portability, all monetary calcualtion are
179+
done using long integers (which hold money amounts as a number of
180+
pennies). These will need to be scaled to dollars and cents (by dividing
181+
by 100), before the values are presented to the DBMS.
182+
183+
The hd_XXXX routines allow header information to be written before the
184+
creation of the flat files. This should allow system which require
185+
formatting information in database load files to use DBGEN with only
186+
a small amount of custom code.
187+
188+
qgen.c defines the translation table for query templates in the
189+
routine qsub().
190+
191+
varsub.c defines the parameter substitutions in the routine varsub().
192+
193+
If you are porting DBGEN to a machine that is not supports a native word
194+
size larger that 32 bits, you may wish to modify the default values for
195+
BITS_PER_LONG and MAX_LONG. These values are used in the generation of
196+
the sparse primary keys in the order and lineitem tables. The code has
197+
been structured to run on any machine supporting a 32 bit long, but
198+
may be slightly more efficient on machines that are able to make use of
199+
a larger native type.
200+
201+
7. Known Porting Problems
202+
203+
The current codeline will not compile under SunOS 4.1. Solaris 2.4 and later
204+
are supported, and anyone wishing to use DBGEN on a Sun platform is
205+
encouraged to use one of these OS releases.
206+
207+
208+
8. Reporting Problems
209+
210+
The code provided with TPC-H/TPC-R has been written to be easily portable,
211+
and has been tested on a wide variety of platforms, If you have any
212+
trouble porting the code to your platform, please help us to correct
213+
the problem in a later release by sending the following information
214+
to the TPC D subcommittee:
215+
216+
Computer Make and Model
217+
Compiler Type and Revision Number
218+
Brief Description of the problem
219+
Suggested modification to correct the problem
220+
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
l|l|sum_qty |sum_base_price |sum_disc_price |sum_charge |avg_qty |avg_price |avg_disc |count_order
2+
A|F|37734107.00|56586554400.73|53758257134.87|55909065222.83|25.52|38273.13|0.05| 1478493
3+
N|F|991417.00|1487504710.38|1413082168.05|1469649223.19|25.52|38284.47|0.05| 38854
4+
N|O|74476040.00|111701729697.74|106118230307.61|110367043872.50|25.50|38249.12|0.05| 2920374
5+
R|F|37719753.00|56568041380.90|53741292684.60|55889619119.83|25.51|38250.85|0.05| 1478870
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
c_custkey |c_name |revenue |c_acctbal |n_name |c_address |c_phone |c_comment
2+
57040|Customer#000057040 |734235.25|632.87|JAPAN |Eioyzjf4pp |22-895-641-3466|sits. slyly regular requests sleep alongside of the regular inst
3+
143347|Customer#000143347 |721002.69|2557.47|EGYPT |1aReFYv,Kw4 |14-742-935-3718|ggle carefully enticing requests. final deposits use bold, bold pinto beans. ironic, idle re
4+
60838|Customer#000060838 |679127.31|2454.77|BRAZIL |64EaJ5vMAHWJlBOxJklpNc2RJiWE |12-913-494-9813| need to boost against the slyly regular account
5+
101998|Customer#000101998 |637029.57|3790.89|UNITED KINGDOM |01c9CILnNtfOQYmZj |33-593-865-6378|ress foxes wake slyly after the bold excuses. ironic platelets are furiously carefully bold theodolites
6+
125341|Customer#000125341 |633508.09|4983.51|GERMANY |S29ODD6bceU8QSuuEJznkNaK |17-582-695-5962|arefully even depths. blithely even excuses sleep furiously. foxes use except the dependencies. ca
7+
25501|Customer#000025501 |620269.78|7725.04|ETHIOPIA | W556MXuoiaYCCZamJI,Rn0B4ACUGdkQ8DZ |15-874-808-6793|he pending instructions wake carefully at the pinto beans. regular, final instructions along the slyly fina
8+
115831|Customer#000115831 |596423.87|5098.10|FRANCE |rFeBbEEyk dl ne7zV5fDrmiq1oK09wV7pxqCgIc|16-715-386-3788|l somas sleep. furiously final deposits wake blithely regular pinto b
9+
84223|Customer#000084223 |594998.02|528.65|UNITED KINGDOM |nAVZCs6BaWap rrM27N 2qBnzc5WBauxbA |33-442-824-8191| slyly final deposits haggle regular, pending dependencies. pending escapades wake
10+
54289|Customer#000054289 |585603.39|5583.02|IRAN |vXCxoCsU0Bad5JQI ,oobkZ |20-834-292-4707|ely special foxes are quickly finally ironic p
11+
39922|Customer#000039922 |584878.11|7321.11|GERMANY |Zgy4s50l2GKN4pLDPBU8m342gIw6R |17-147-757-8036|y final requests. furiously final foxes cajole blithely special platelets. f
12+
6226|Customer#000006226 |576783.76|2230.09|UNITED KINGDOM |8gPu8,NPGkfyQQ0hcIYUGPIBWc,ybP5g, |33-657-701-3391|ending platelets along the express deposits cajole carefully final
13+
922|Customer#000000922 |576767.53|3869.25|GERMANY |Az9RFaut7NkPnc5zSD2PwHgVwr4jRzq |17-945-916-9648|luffily fluffy deposits. packages c
14+
147946|Customer#000147946 |576455.13|2030.13|ALGERIA |iANyZHjqhyy7Ajah0pTrYyhJ |10-886-956-3143|ithely ironic deposits haggle blithely ironic requests. quickly regu
15+
115640|Customer#000115640 |569341.19|6436.10|ARGENTINA |Vtgfia9qI 7EpHgecU1X |11-411-543-4901|ost slyly along the patterns; pinto be
16+
73606|Customer#000073606 |568656.86|1785.67|JAPAN |xuR0Tro5yChDfOCrjkd2ol |22-437-653-6966|he furiously regular ideas. slowly
17+
110246|Customer#000110246 |566842.98|7763.35|VIETNAM |7KzflgX MDOq7sOkI |31-943-426-9837|egular deposits serve blithely above the fl
18+
142549|Customer#000142549 |563537.24|5085.99|INDONESIA |ChqEoK43OysjdHbtKCp6dKqjNyvvi9 |19-955-562-2398|sleep pending courts. ironic deposits against the carefully unusual platelets cajole carefully express accounts.
19+
146149|Customer#000146149 |557254.99|1791.55|ROMANIA |s87fvzFQpU |29-744-164-6487| of the slyly silent accounts. quickly final accounts across the
20+
52528|Customer#000052528 |556397.35|551.79|ARGENTINA |NFztyTOR10UOJ |11-208-192-3205| deposits hinder. blithely pending asymptotes breach slyly regular re
21+
23431|Customer#000023431 |554269.54|3381.86|ROMANIA |HgiV0phqhaIa9aydNoIlb |29-915-458-2654|nusual, even instructions: furiously stealthy n

0 commit comments

Comments
 (0)