Description
What kind an issue is this?
Bug report.
Maybe related with #792 but no the same.
Issue description
saveToEs is inserting case classes null fields as null, instead of ignoring them.
The behavior should be in align with spark-sql, where null fields are ignored.
From spark-sql documentation: By default, elasticsearch-hadoop will ignore null values in favor of not writing any field at all. Since a DataFrame is meant to be treated as structured tabular data, you can enable writing nulls as null valued fields for DataFrame Objects only by toggling the es.spark.dataframe.write.null setting to true.
Steps to reproduce
Code:
case class WithNulls(desc: String, field2: Option[String], field3: String, inner: Option[WithNullsInner])
case class WithNullsInner(field4: String, field5: String)
val conf = new SparkConf()
.setAppName("Testing null serialization.")
.setMaster("local[2]")
.set("es.index.auto.create", "true")
.set("es.nodes", "localhost:9200")
new SparkContext(conf).parallelize(
List(
WithNulls("all fields", Some("field2_1"), "field3_1", Some(WithNullsInner("field4_1", "field5_1"))),
WithNulls("None and nulls", None, null, None )
)
).saveToEs("testidx/withnulls")
Strack trace:
Current response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "testidx",
"_type": "withnulls",
"_id": "AVxEYKgiaEb7vk_5fa7O",
"_score": 1,
"_source": {
"desc": "None and nulls",
"field2": null,
"field3": null,
"inner": null
}
},
{
"_index": "testidx",
"_type": "withnulls",
"_id": "AVxEYKgkaEb7vk_5fa7P",
"_score": 1,
"_source": {
"desc": "all fields",
"field2": "field2_1",
"field3": "field3_1",
"inner": {
"field4": "field4_1",
"field5": "field5_1"
}
}
}
]
}
}
Expected response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "testidx",
"_type": "withnulls",
"_id": "AVxEYKgiaEb7vk_5fa7O",
"_score": 1,
"_source": {
"desc": "None and nulls"
}
},
{
"_index": "testidx",
"_type": "withnulls",
"_id": "AVxEYKgkaEb7vk_5fa7P",
"_score": 1,
"_source": {
"desc": "all fields",
"field2": "field2_1",
"field3": "field3_1",
"inner": {
"field4": "field4_1",
"field5": "field5_1"
}
}
}
]
}
}
Version Info
OS: : Ubuntu 16.04.2 LTS
JVM : openjdk version "1.8.0_131"
Hadoop/Spark: Spark 2.1.0
ES-Hadoop : 5.3
ES : tested in 2.2 and 5.3