Skip to content

twitter_classifier/Collect.scala: Pending TODO can be completed: SPARK-3390 was fixed in Spark 1.2.0. #50

Closed
@MiguelPeralvo

Description

@MiguelPeralvo

Line 42 of reference-apps/twitter_classifier/scala/src/main/scala/com/databricks/apps/twitter_classifier/Collect.scala can now be safely removed, as SPARK-3390 was fixed in pull request #2364 for Apache 1.2.0.

If you use Spark 1.2.0, this is the code that can be removed:

.filter(!_.contains("boundingBoxCoordinates")) // TODO(vida): Remove this workaround when SPARK-3390 is fixed.

If you remove it for Spark 1.1.0, Collect.java won't break when run, but ExamineAndTrain.scala will do, with a "scala.MatchError: StructType(List())" exception. It will be caused by the "boundingBoxCoordinates" json entries, as Spark 1.1.0 doesn't handle them properly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions