-
Notifications
You must be signed in to change notification settings - Fork 153
Description
Describe the bug
labs.run_model_bpa_bulk(workspace='Workspace_name')
- Function does not complete execution. it stops when finishing adding data to lakehouse
- used Python 3.11 on PowerBI Notebook, Py Spark did not work at all.
To Reproduce
--Version 0.12.0
1- run %pip install semantic-link-labs
2- import lib
3- run labs.run_model_bpa_bulk(workspace='Operação Cloud Dados')
Error:
ValueError Traceback (most recent call last)
Cell In[6], line 1
----> 1 labs.run_model_bpa_bulk(workspace='Operação Cloud Dados')
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/model_bpa_bulk.py:178, in run_model_bpa_bulk(rules, extended, language, workspace, skip_models, skip_models_in_workspace)
169 print(
170 f"{icons.in_progress} Saving the Model BPA results of the '{wksp}' workspace to the '{output_table}' within the lakehouse attached to this notebook..."
171 )
173 schema = {
174 key.replace(" ", ""): value
175 for key, value in icons.bpa_schema.items()
176 }
--> 178 save_as_delta_table(
179 dataframe=df,
180 delta_table_name=output_table,
181 write_mode="append",
182 schema=schema,
183 merge_schema=True,
184 )
185 print(
186 f"{icons.green_dot} Saved BPA results to the '{output_table}' delta table."
187 )
189 print(f"{icons.green_dot} Bulk BPA scan complete.")
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/_helper_functions.py:1003, in save_as_delta_table(dataframe, delta_table_name, write_mode, merge_schema, schema, lakehouse, workspace)
1000 if merge_schema:
1001 write_args["schema_mode"] = "merge"
-> 1003 write_deltalake(**write_args)
1004 else:
1005 writer = spark_df.write.mode(write_mode).format("delta")
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/writer.py:326, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
324 elif engine == "pyarrow":
325 if schema_mode == "merge":
--> 326 raise ValueError(
327 "schema_mode 'merge' is not supported in pyarrow engine. Use engine=rust"
328 )
329 # We need to write against the latest table version
331 num_indexed_cols, stats_cols = get_num_idx_cols_and_stats_columns(
332 table._table if table is not None else None, configuration
333 )
ValueError: schema_mode 'merge' is not supported in pyarrow engine. Use engine=rust
Steps to reproduce/Fix the behavior:
If o run this code on the notebook the step completes successfully:
file_path = "/home/trusted-service-user/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/_helper_functions.py"
with open(file_path, "r") as file:
lines = file.readlines()
with open(file_path, "w") as file:
for line in lines:
if 'write_args["schema_mode"] = "merge"' in line:
file.write(line)
file.write(' write_args["engine"] = "rust"\n')
else:
file.write(line)
#restart the kernel and run all the commands again
Expected behavior
The function run without aditional scripts.
Additional Steps
After running the script above "fixed", the next step would be: labs.create_model_bpa_semantic_model()
But it gets error:
🟢 The 'ModelBPA' semantic model was created within the 'Operação Cloud Dados' workspace.
🟢 The 'BPAResults' table has been added to the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'modelbparesults' partition has been added to the 'BPAResults' table in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Capacity_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Capacity_Id' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Workspace_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Workspace_Id' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Dataset_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Dataset_Id' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Configured_By' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Rule_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Category' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Severity' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Object_Type' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Object_Name' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'Description' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'URL' column has been added to the 'BPAResults' table as a 'String' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
🟢 The 'RunId' column has been added to the 'BPAResults' table as a 'Int64' data type in the 'ModelBPA' semantic model within the 'Operação Cloud Dados' workspace.
AttributeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 labs.create_model_bpa_semantic_model()
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/_model_bpa_bulk.py:268, in create_model_bpa_semantic_model(dataset, lakehouse, lakehouse_workspace)
266 table_exists = True
267 if not table_exists:
--> 268 add_table_to_direct_lake_semantic_model(
269 dataset=dataset,
270 table_name=t_name,
271 lakehouse_table_name="modelbparesults",
272 workspace=lakehouse_workspace_id,
273 refresh=False,
274 )
275 with connect_semantic_model(
276 dataset=dataset, readonly=False, workspace=lakehouse_workspace_id
277 ) as tom:
278 # Fix column names
279 for c in tom.all_columns():
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log..get_wrapper..log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/directlake/_update_directlake_partition_entity.py:214, in add_table_to_direct_lake_semantic_model(dataset, table_name, lakehouse_table_name, refresh, workspace)
212 dType = r["Data Type"]
213 dt = _convert_data_type(dType)
--> 214 tom.add_data_column(
215 table_name=table_name,
216 column_name=lakeCName,
217 source_column=lakeCName,
218 data_type=dt,
219 )
220 print(
221 f"{icons.green_dot} The '{lakeCName}' column has been added to the '{table_name}' table as a '{dt}' data type in the '{dataset_name}' semantic model within the '{workspace_name}' workspace."
222 )
224 if refresh:
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/tom/_model.py:562, in TOMWrapper.add_data_column(self, table_name, column_name, source_column, data_type, format_string, hidden, description, display_folder, data_category, key, summarize_by, lineage_tag, source_lineage_tag)
558 import Microsoft.AnalysisServices.Tabular as TOM
559 import System
561 data_type = (
--> 562 data_type.capitalize()
563 .replace("Integer", "Int64")
564 .replace("Datetime", "DateTime")
565 )
566 if summarize_by is None:
567 summarize_by = "Default"
AttributeError: 'NoneType' object has no attribute 'capitalize'
I feel its not creating all data on the lakehouse table:
It finishes if I run this fix:
from sempy_labs.tom import connect_semantic_model
Parâmetros do seu ambiente
dataset_name = "ModelBPA"
workspace_id = "4751DEF9-98CA-4A6F-AE71-C7910959740E" # substitua pelo ID real
table_name = "BPAResults"
column_name = "Timestamp"
Teste isolado para adicionar a coluna Timestamp
with connect_semantic_model(dataset=dataset_name, readonly=False, workspace=workspace_id) as tom:
try:
# Força o tipo DateTime para Timestamp
tom.add_data_column(
table_name=table_name,
column_name=column_name,
source_column=column_name,
data_type="String"
)
print(f"✅ Coluna '{column_name}' adicionada com sucesso como 'DateTime'.")
except Exception as e:
print(f"❌ Erro ao adicionar a coluna '{column_name}': {e}")
But the report show no data, as if fields are missing:
Desktop (please complete the following information):
- OS: W11 Pro
- Browser Chrome
- Version 140
Additional context
Add any other context about the problem here.