fix: Resolve the issue of conflicts between columns added during the analysis process and the original data columns in the Spark version. #1518

frelion · 2023-12-09T06:14:06Z

In the Spark version, the program will add some auxiliary columns to the dataframe during runtime, such as Count, Std, etc.

If the original data to be analyzed already contains these columns, it may result in column name conflicts.

solution:
Before the program analysis, add the suffix "_customer" to all columns of the DataFrame.
Remove the suffix when displaying the results.

PeterlitsZo

Look good to me. It's OK. It can solve the problem.

…sis process and the original data columns in the Spark version.

codecov · 2024-05-06T17:07:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.25%. Comparing base (2d9a24b) to head (925bcee).
Report is 25 commits behind head on develop.

❗ Current head 925bcee differs from pull request most recent head 2d7f8bb. Consider uploading reports for the commit 2d7f8bb to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1518      +/-   ##
===========================================
+ Coverage    90.08%   90.25%   +0.17%     
===========================================
  Files          195      195              
  Lines         6383     6383              
===========================================
+ Hits          5750     5761      +11     
+ Misses         633      622      -11

Flag	Coverage Δ
py3.8-ubuntu-22.04-pandas	`90.25% <ø> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

PeterlitsZo reviewed Dec 9, 2023

View reviewed changes

frelion added 2 commits March 26, 2024 10:10

Resolve the issue of conflicts between columns added during the analy…

7fdc07a

…sis process and the original data columns in the Spark version.

remove trailing whitespace

925bcee

fabclmnt force-pushed the fix/spark_column_conflict branch from 93555c1 to 925bcee Compare March 26, 2024 17:10

fabclmnt approved these changes May 6, 2024

View reviewed changes

Merge branch 'develop' into fix/spark_column_conflict

2d7f8bb

fabclmnt merged commit ddcb388 into ydataai:develop May 6, 2024
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Resolve the issue of conflicts between columns added during the analysis process and the original data columns in the Spark version. #1518

fix: Resolve the issue of conflicts between columns added during the analysis process and the original data columns in the Spark version. #1518

frelion commented Dec 9, 2023

PeterlitsZo left a comment

codecov bot commented May 6, 2024

fix: Resolve the issue of conflicts between columns added during the analysis process and the original data columns in the Spark version. #1518

fix: Resolve the issue of conflicts between columns added during the analysis process and the original data columns in the Spark version. #1518

Conversation

frelion commented Dec 9, 2023

PeterlitsZo left a comment

Choose a reason for hiding this comment

codecov bot commented May 6, 2024

Codecov Report