Skip to content

labs.run_model_bpa_bulk() does not create the report #836

@MatheusFelicio08

Description

@MatheusFelicio08

Describe the bug
labs.run_model_bpa_bulk(workspace='Workspace_name')

  • Function does not complete execution. it stops when finishing adding data to lakehouse
  • used Python 3.11 on PowerBI Notebook, Py Spark did not work at all.

To Reproduce
--Version 0.12.0

Image

1- run %pip install semantic-link-labs
2- import lib

Image

3- run labs.run_model_bpa_bulk(workspace='Operação Cloud Dados')

Error:

ValueError Traceback (most recent call last)
Cell In[6], line 1
----> 1 labs.run_model_bpa_bulk(workspace='Operação Cloud Dados')

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/model_bpa_bulk.py:178, in run_model_bpa_bulk(rules, extended, language, workspace, skip_models, skip_models_in_workspace)
169 print(
170 f"{icons.in_progress} Saving the Model BPA results of the '{wksp}' workspace to the '{output_table}' within the lakehouse attached to this notebook..."
171 )
173 schema = {
174 key.replace(" ", "
"): value
175 for key, value in icons.bpa_schema.items()
176 }
--> 178 save_as_delta_table(
179 dataframe=df,
180 delta_table_name=output_table,
181 write_mode="append",
182 schema=schema,
183 merge_schema=True,
184 )
185 print(
186 f"{icons.green_dot} Saved BPA results to the '{output_table}' delta table."
187 )
189 print(f"{icons.green_dot} Bulk BPA scan complete.")

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/_helper_functions.py:1003, in save_as_delta_table(dataframe, delta_table_name, write_mode, merge_schema, schema, lakehouse, workspace)
1000 if merge_schema:
1001 write_args["schema_mode"] = "merge"
-> 1003 write_deltalake(**write_args)
1004 else:
1005 writer = spark_df.write.mode(write_mode).format("delta")

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/writer.py:326, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
324 elif engine == "pyarrow":
325 if schema_mode == "merge":
--> 326 raise ValueError(
327 "schema_mode 'merge' is not supported in pyarrow engine. Use engine=rust"
328 )
329 # We need to write against the latest table version
331 num_indexed_cols, stats_cols = get_num_idx_cols_and_stats_columns(
332 table._table if table is not None else None, configuration
333 )

ValueError: schema_mode 'merge' is not supported in pyarrow engine. Use engine=rust

Steps to reproduce/Fix the behavior:
If o run this code on the notebook the step completes successfully:

file_path = "/home/trusted-service-user/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/_helper_functions.py"

with open(file_path, "r") as file:
lines = file.readlines()

with open(file_path, "w") as file:
for line in lines:
if 'write_args["schema_mode"] = "merge"' in line:
file.write(line)
file.write(' write_args["engine"] = "rust"\n')
else:
file.write(line)

#restart the kernel and run all the commands again

Expected behavior
The function run without aditional scripts.

Additional Steps
After running the script above "fixed", the next step would be: labs.create_model_bpa_semantic_model()

But it gets error:

🟢 The 'ModelBPA' semantic model was created within the 'Operação Cloud Dados' workspace.
🟢 The 'BPAResults' table has been added to the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'modelbparesults' partition has been added to the 'BPAResults' table in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Capacity_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Capacity_Id' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Workspace_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Workspace_Id' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Dataset_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Dataset_Id' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Configured_By' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Rule_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Category' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Severity' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Object_Type' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Object_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Description' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'URL' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'RunId' column has been added to the 'BPAResults' table as a 'Int64' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.

AttributeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 labs.create_model_bpa_semantic_model()

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/_model_bpa_bulk.py:268, in create_model_bpa_semantic_model(dataset, lakehouse, lakehouse_workspace)
266 table_exists = True
267 if not table_exists:
--> 268 add_table_to_direct_lake_semantic_model(
269 dataset=dataset,
270 table_name=t_name,
271 lakehouse_table_name="modelbparesults",
272 workspace=lakehouse_workspace_id,
273 refresh=False,
274 )
275 with connect_semantic_model(
276 dataset=dataset, readonly=False, workspace=lakehouse_workspace_id
277 ) as tom:
278 # Fix column names
279 for c in tom.all_columns():

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/directlake/_update_directlake_partition_entity.py:214, in add_table_to_direct_lake_semantic_model(dataset, table_name, lakehouse_table_name, refresh, workspace)
212 dType = r["Data Type"]
213 dt = _convert_data_type(dType)
--> 214 tom.add_data_column(
215 table_name=table_name,
216 column_name=lakeCName,
217 source_column=lakeCName,
218 data_type=dt,
219 )
220 print(
221 f"{icons.green_dot} The '{lakeCName}' column has been added to the '{table_name}' table as a '{dt}' data type in the '{dataset_name}' semantic model within the '{workspace_name}' workspace."
222 )
224 if refresh:

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/tom/_model.py:562, in TOMWrapper.add_data_column(self, table_name, column_name, source_column, data_type, format_string, hidden, description, display_folder, data_category, key, summarize_by, lineage_tag, source_lineage_tag)
558 import Microsoft.AnalysisServices.Tabular as TOM
559 import System
561 data_type = (
--> 562 data_type.capitalize()
563 .replace("Integer", "Int64")
564 .replace("Datetime", "DateTime")
565 )
566 if summarize_by is None:
567 summarize_by = "Default"

AttributeError: 'NoneType' object has no attribute 'capitalize'

I feel its not creating all data on the lakehouse table:

Image

It finishes if I run this fix:

from sempy_labs.tom import connect_semantic_model

Parâmetros do seu ambiente

dataset_name = "ModelBPA"
workspace_id = "4751DEF9-98CA-4A6F-AE71-C7910959740E" # substitua pelo ID real
table_name = "BPAResults"
column_name = "Timestamp"

Teste isolado para adicionar a coluna Timestamp

with connect_semantic_model(dataset=dataset_name, readonly=False, workspace=workspace_id) as tom:
try:
# Força o tipo DateTime para Timestamp
tom.add_data_column(
table_name=table_name,
column_name=column_name,
source_column=column_name,
data_type="String"
)
print(f"✅ Coluna '{column_name}' adicionada com sucesso como 'DateTime'.")
except Exception as e:
print(f"❌ Erro ao adicionar a coluna '{column_name}': {e}")

But the report show no data, as if fields are missing:

Image

Desktop (please complete the following information):

  • OS: W11 Pro
  • Browser Chrome
  • Version 140

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions