Closed
Description
I have a problem with Elasticsearch .
I want to read data from Elasticsearch index with Pyspark . my data look like as below:
user_id: 123,
features: {
hashtags: [
{
text: "hello",
count: 2
},
{
text: "world",
count: 1
}
]
}
...
and when data loaded it seems Elasticsearch return empty list of objects. my dataframe after read is look like as below:
+----------+-------------------+
| features| user_id|
+----------+-------------------+
|{[{}, {}]}| 123|
| {[{}]}| 384|
| {[{}]}| 94|
|{[{}, {}]}| 880|
+----------+-------------------+
I read data from elastic with using this configuration:
tweets = sqlContext.read.format("org.elasticsearch.spark.sql") \
.option("es.nodes", "localhost") \
.option("es.port", "9200") \
.option("es.read.field.as.array.include", "features.hashtags")\
.option("es.read.field.include", "user_id, features.hashtags")\
.option("es.resource", "twitter")\
.load().limit(10)
can you help me for resolve it?
Metadata
Metadata
Assignees
Labels
No labels