Skip to content

Commit 51ba870

Browse files
Merge branch 'master' into feature_streaming_enhancments
2 parents 5f51db1 + 93caee4 commit 51ba870

File tree

13 files changed

+54
-27
lines changed

13 files changed

+54
-27
lines changed

.github/workflows/onrelease.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ jobs:
3030
- name: Install
3131
run: pip install pipenv
3232

33+
- name: Install dependencies
34+
run: pipenv install --dev
35+
3336
- name: Build dist
3437
run: pipenv run python setup.py sdist bdist_wheel
3538

.github/workflows/push.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ jobs:
3737
- name: Install
3838
run: pip install pipenv
3939

40+
- name: Install dependencies
41+
run: pipenv install --dev
42+
4043
- name: Run tests
4144
run: make test
4245

CHANGELOG.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Change History
44
All notable changes to the Databricks Labs Data Generator will be documented in this file.
55

6-
### Unreleased
6+
### Version 0.3.2
77

88
#### Changed
99
* Additional migration of tests to use of `pytest`
@@ -14,6 +14,7 @@ All notable changes to the Databricks Labs Data Generator will be documented in
1414
* Changed build labelling to comply with PEP440
1515

1616
#### Fixed
17+
* Fixed compatibility of build with older versions of runtime that rely on `pyparsing` version 2.4.7
1718

1819
#### Added
1920
* Added support for additional streaming source types and for use of custom streaming sources
@@ -22,7 +23,8 @@ All notable changes to the Databricks Labs Data Generator will be documented in
2223
* Parsing of SQL expressions to determine column dependencies
2324

2425
#### Notes
25-
* This does not change actual order of column building - but adjusts which phase columns are built in
26+
* The enhancements to build ordering does not change actual order of column building -
27+
but adjusts which phase columns are built in
2628

2729

2830
### Version 0.3.1

CONTRIBUTING.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ warrant that you have the legal authority to do so.
1414
# Building the code
1515

1616
## Package Dependencies
17-
See the contents of the file `python/require.txt` to see the Python package dependencies
17+
See the contents of the file `python/require.txt` to see the Python package dependencies.
18+
Dependent packages are not installed automatically by the `dbldatagen` package.
1819

1920
## Python compatibility
2021

Pipfile

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,6 @@ verify_ssl = true
66
[dev-packages]
77
pytest = "*"
88
pytest-cov = "*"
9-
10-
numpy = "1.22.0"
11-
pyspark = "3.1.3"
12-
pyarrow = "1.0.1"
13-
pandas = "1.1.3"
14-
pyparsing = ">=2.4.7,<3.0.9"
15-
169
sphinx = ">=2.0.0,<3.1.0"
1710
nbsphinx = "*"
1811
numpydoc = "0.8"
@@ -21,6 +14,16 @@ ipython = "7.31.1"
2114
pydata-sphinx-theme = "*"
2215
recommonmark = "*"
2316
sphinx-markdown-builder = "*"
17+
bumpversion = "*"
18+
19+
[packages]
20+
numpy = "==1.22.0"
21+
pyspark = "==3.1.3"
22+
pyarrow = "==4.0.1"
23+
wheel = "==0.38.4"
24+
pandas = "==1.2.4"
25+
setuptools = "==65.6.3"
26+
pyparsing = "==2.4.7"
2427

2528
[requires]
26-
python_version = "3.8"
29+
python_version = ">=3.8.10"

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ details of use and many examples.
6060

6161
Release notes and details of the latest changes for this specific release
6262
can be found in the Github repository
63-
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.3.2a0/CHANGELOG.md)
63+
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.3.2/CHANGELOG.md)
6464

6565
# Installation
6666

@@ -126,6 +126,21 @@ examples.
126126

127127
The Github repository also contains further examples in the examples directory
128128

129+
## Spark and Databricks Runtime Compatibility
130+
The `dbldatagen` package is intended to be compatible with recent LTS versions of the Databricks runtime including
131+
older LTS versions at least from 10.4 LTS and later. It also aims to be compatible with Delta Live Table runtimes
132+
including `current` and `preview`.
133+
134+
While we dont specifically drop support for older runtimes, changes in Pyspark APIs or
135+
APIs from dependent packages such as `numpy`, `pandas`, `pyarrow` and `pyparsing` make cause issues with older
136+
runtimes.
137+
138+
Installing `dbldatagen` explicitly does not install releases of dependent packages so as to preserve the curated
139+
set of packages installed in any Databricks runtime environment.
140+
141+
When building on local environments, the `Pipfile` and requirements files are used to determine the versions
142+
tested against for releases and unit tests.
143+
129144
## Project Support
130145
Please note that all projects released under [`Databricks Labs`](https://www.databricks.com/learn/labs)
131146
are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements

dbldatagen/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def get_version(version):
3333
return version_info
3434

3535

36-
__version__ = "0.3.2a0" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
36+
__version__ = "0.3.2" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
3737
__version_info__ = get_version(__version__)
3838

3939

dbldatagen/schema_parser.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -273,18 +273,18 @@ def _cleanseSQL(cls, sql_string):
273273

274274
# skip over quoted identifiers even if they contain quotes
275275
quoted_ident = pp.QuotedString(quoteChar="`", escQuote="``")
276-
quoted_ident.set_parse_action(lambda s, loc, toks: f"`{toks[0]}`")
276+
quoted_ident.setParseAction(lambda s, loc, toks: f"`{toks[0]}`")
277277

278278
stringForm1 = pp.Literal('r') + pp.QuotedString(quoteChar="'")
279279
stringForm2 = pp.Literal('r') + pp.QuotedString(quoteChar='"')
280280
stringForm3 = pp.QuotedString(quoteChar="'", escQuote=r"\'")
281281
stringForm4 = pp.QuotedString(quoteChar='"', escQuote=r'\"')
282282
stringForm = stringForm1 ^ stringForm2 ^ stringForm3 ^ stringForm4
283-
stringForm.set_parse_action(lambda s, loc, toks: "' '")
283+
stringForm.setParseAction(lambda s, loc, toks: "' '")
284284

285285
parser = quoted_ident ^ stringForm
286286

287-
transformed_string = parser.transform_string(sql_string)
287+
transformed_string = parser.transformString(sql_string)
288288

289289
return transformed_string
290290

@@ -312,7 +312,7 @@ def columnsReferencesFromSQLString(cls, sql_string, filter=None):
312312
ident = pp.Word(pp.alphas, pp.alphanums + "_") | pp.QuotedString(quoteChar="`", escQuote="``")
313313
parser = ident
314314

315-
references = parser.search_string(cleansed_sql_string)
315+
references = parser.searchString(cleansed_sql_string)
316316

317317
results = set([item for sublist in references for item in sublist])
318318

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
author = 'Databricks Inc'
2929

3030
# The full version, including alpha/beta/rc tags
31-
release = "0.3.2a0" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
31+
release = "0.3.2" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
3232

3333

3434
# -- General configuration ---------------------------------------------------

python/.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.2a0
2+
current_version = 0.3.2
33
commit = False
44
tag = False
55
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+){0,1}(?P<release>\D*)(?P<build>\d*)

python/dev_require.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# The following packages are used in building the test data generator framework.
22
# All packages used are already installed in the Databricks runtime environment for version 6.5 or later
3-
numpy==1.19.2
3+
numpy==1.22.0
44
pandas==1.2.4
55
pickleshare==0.7.5
66
py4j==0.10.9
7-
pyarrow==4.0.0
8-
pyspark>=3.1.2
7+
pyarrow==4.0.1
8+
pyspark>=3.1.3
99
python-dateutil==2.8.1
1010
six==1.15.0
11-
pyparsing>=2.4.7, <= 3.0.9
11+
pyparsing==2.4.7
1212

1313
# The following packages are required for development only
1414
wheel==0.36.2

python/require.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ numpy==1.22.0
44
pandas==1.2.5
55
pickleshare==0.7.5
66
py4j==0.10.9
7-
pyarrow==4.0.0
8-
pyspark>=3.1.2
7+
pyarrow==4.0.1
8+
pyspark>=3.1.3
99
python-dateutil==2.8.1
1010
six==1.15.0
11-
pyparsing>=2.4.7, <= 3.0.9
11+
pyparsing==2.4.7
1212

1313
# The following packages are required for development only
1414
wheel==0.36.2

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131

3232
setuptools.setup(
3333
name="dbldatagen",
34-
version="0.3.2a0",
34+
version="0.3.2",
3535
author="Ronan Stokes, Databricks",
3636
description="Databricks Labs - PySpark Synthetic Data Generator",
3737
long_description=long_description,

0 commit comments

Comments
 (0)