Open
Description
What kind an issue is this?
- Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved. - Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Feature description
Allow reading into dataframe/rdd from indexes with dynamic field mapping configurations.
For example, I have index with dynamic_date_formats
, created like this:
PUT my_index
{
"mappings": {
"dynamic_date_formats": ["MM/dd/yyyy"]
}
}
PUT my_index/_doc/1
{
"create_date": "09/25/2015"
}
While trying to read from the index above, the exception is fired:
spark.read.format("org.elasticsearch.spark.sql").load("my_index")
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: invalid map received dynamic_date_formats=[MM/dd/yyyy]
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseField(FieldParser.java:146)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMapping(FieldParser.java:88)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseIndexMappings(FieldParser.java:69)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMappings(FieldParser.java:40)
at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:321)
at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:307)
at org.elasticsearch.hadoop.rest.RestRepository.getMappings(RestRepository.java:293)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAndGeoFields(SchemaUtils.scala:103)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:91)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:233)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:233)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:417)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
... 49 elided
Version Info
OS: : Ubuntu 20.04
JVM : OpenJDK 64-Bit Server VM, Java 1.8.0_275
Hadoop/Spark: Spark 2.4.7, Scala 2.11.12
ES-Hadoop : elasticsearch-spark-20_2.11-7.7.0
ES : ElasticSearch 7.7