Skip to content

How to set template and min,max value for a nested schema attribute #229

Open
@galaxy79

Description

@galaxy79

Expected Behavior

I have a nested schema for the data set and want to set the value template patterns for the attributes bankAcctId,bankProduct,bankProduct,storeGroup,association,merchantId,terminalId using withColumnSpec to generate the synthetic data.

my_schema = StructType(
    [
        StructField(
            "bank",
            StructType(
                [
                    StructField("bankAcctId", StringType()),
                    StructField("bankProduct", StringType()),
                ]
            ),
        ),
        StructField(
            "merchDetails",
            StructType(
                [
                    StructField("storeGroup", StringType()),
                    StructField("association", StringType()),
                    StructField("merchantId", StringType()),
                    StructField(
                        "terminal",
                        StructType(
                            [
                                StructField("terminalId", StringType()),
                                StructField("cardholderActivatedTerm", StringType()),
                                StructField(
                                    "posInteractionTerminalEntryMode", StringType()
                                ),
                            ]
                        ),
                    ),
                ]
            ),
        ),
    ]
)

I tried the below code snippet to build the synthetic data

testDataSpec = (
    dg.DataGenerator(spark, name="test_data_set1", rows=row_count, partitions=4)
    .withIdOutput()
    .withSchema(my_schema)
)

testDataSpec = (
    testDataSpec.withColumnSpec("bank.bankAcctId", template=r"\\n-\\n")
    .withColumnSpec("merchDetails.storeGroup", template=r"\\n-\\n")
)
dfTestData = testDataSpec.build()

The code execution was failed with error

dbldatagen.utils.DataGenError: DataGenError(msg=' column `bank.bankAcctId` must refer to defined column', baseException=None)

I looking for some direction or example on how to use it.

Your Environment

Running it on mac m1 pro ( macOS venture 13.5)

  • dbldatagen version used:0.3.5

Metadata

Metadata

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions