Open
Description
Expected Behavior
I have a nested schema for the data set and want to set the value template patterns for the attributes bankAcctId,bankProduct,bankProduct,storeGroup,association,merchantId,terminalId using withColumnSpec
to generate the synthetic data.
my_schema = StructType(
[
StructField(
"bank",
StructType(
[
StructField("bankAcctId", StringType()),
StructField("bankProduct", StringType()),
]
),
),
StructField(
"merchDetails",
StructType(
[
StructField("storeGroup", StringType()),
StructField("association", StringType()),
StructField("merchantId", StringType()),
StructField(
"terminal",
StructType(
[
StructField("terminalId", StringType()),
StructField("cardholderActivatedTerm", StringType()),
StructField(
"posInteractionTerminalEntryMode", StringType()
),
]
),
),
]
),
),
]
)
I tried the below code snippet to build the synthetic data
testDataSpec = (
dg.DataGenerator(spark, name="test_data_set1", rows=row_count, partitions=4)
.withIdOutput()
.withSchema(my_schema)
)
testDataSpec = (
testDataSpec.withColumnSpec("bank.bankAcctId", template=r"\\n-\\n")
.withColumnSpec("merchDetails.storeGroup", template=r"\\n-\\n")
)
dfTestData = testDataSpec.build()
The code execution was failed with error
dbldatagen.utils.DataGenError: DataGenError(msg=' column `bank.bankAcctId` must refer to defined column', baseException=None)
I looking for some direction or example on how to use it.
Your Environment
Running it on mac m1 pro ( macOS venture 13.5)
dbldatagen
version used:0.3.5