Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35343][PYTHON] Make the conversion from/to pandas data-type-based for non-ExtensionDtypes #32592

Closed

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented May 19, 2021

What changes were proposed in this pull request?

Make the conversion from/to pandas (for non-ExtensionDtype) data-type-based.
NOTE: Ops class per ExtensionDtype and its data-type-based from/to pandas will be implemented in a separate PR as https://issues.apache.org/jira/browse/SPARK-35614.

Why are the changes needed?

The conversion from/to pandas includes logic for checking data types and behaving accordingly.
That makes code hard to change or maintain.
Since we have introduced the Ops class per non-ExtensionDtype data type, we ought to make the conversion from/to pandas data-type-based for non-ExtensionDtypes.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Keyword: SPARK-35337

@SparkQA
Copy link

SparkQA commented May 19, 2021

Test build #138714 has finished for PR 32592 at commit a8c7f8f.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43236/

@SparkQA
Copy link

SparkQA commented May 19, 2021

Test build #138717 has finished for PR 32592 at commit b2a6442.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43236/

@SparkQA
Copy link

SparkQA commented May 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43239/

@SparkQA
Copy link

SparkQA commented May 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43239/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139177 has finished for PR 32592 at commit 6af8165.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43696/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43696/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139184 has finished for PR 32592 at commit 69b47f0.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -87,7 +92,6 @@ def __init__(self, dtype: Dtype, spark_type: DataType):
self.spark_type = spark_type

@property
@abstractmethod
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractmethod is removed in order to pass mypy checks.
Otherwise,

mypy checks failed:
python/pyspark/pandas/internal.py:1050: error: Cannot instantiate abstract class 'DataTypeOps' with abstract attribute 'pretty_name'
python/pyspark/pandas/internal.py:1441: error: Cannot instantiate abstract class 'DataTypeOps' with abstract attribute 'pretty_name'

Reference mypy issue: python/mypy#1843.

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139186 has finished for PR 32592 at commit 6f0bc5e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43704/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43705/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43706/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43704/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43705/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43706/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Test build #139185 has finished for PR 32592 at commit 5966987.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Test build #139190 has finished for PR 32592 at commit ba1207f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43710/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43710/

@xinrong-meng xinrong-meng changed the title [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based [WIP][SPARK-35343][PYTHON] Make the conversion from/to pandas (for non-ExtensionDtype) data-type-based. Jun 2, 2021
@xinrong-meng xinrong-meng changed the title [WIP][SPARK-35343][PYTHON] Make the conversion from/to pandas (for non-ExtensionDtype) data-type-based. [WIP][SPARK-35343][PYTHON] Make the conversion from/to pandas data-type-based for non-ExtensionDtypes Jun 2, 2021
@SparkQA
Copy link

SparkQA commented Jun 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43882/

@SparkQA
Copy link

SparkQA commented Jun 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43882/

@SparkQA
Copy link

SparkQA commented Jun 5, 2021

Test build #139360 has finished for PR 32592 at commit de83d86.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xinrong-meng xinrong-meng requested a review from ueshin June 7, 2021 16:36
@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43948/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43952/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43952/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139426 has finished for PR 32592 at commit 555209b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Member

ueshin commented Jun 7, 2021

Thanks! merging to master.

@ueshin ueshin closed this in 04a8d2c Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants