Skip to content

Apache Spark based command line tools for ElasticSearch

Notifications You must be signed in to change notification settings

hakanilter/elastic-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elastic Tools

Elastic Tools is an Apache Spark based command line tool which designed to execute export / import operations in a scalable and flexible fashion for ElasticSearch.

It supports different storage types like File System, HDFS, AWS S3, Azure Blob Storage and storage formats like Parquet, JSON and CSV by its nature.

Supported features:

  • Export (ElasticSearch -> Storage)

    Export command reads data from given ElasticSearch index(es) and writes to a storage with a batch Spark job.

    Option Description
    path Path of the storage system (required)
    es.index Name of the ElasticSearch index (required)
    es.nodes ElasticSearch hostname (required)
    es.port ElasticSearch port number (default: 9200)
    es.mapping.id Name of the id field in given dataset (default: id)
    format The storage format, parquet, json or csv (default: parquet)
    partitions Number of partitions to be used during the write operation (default: 10)
    compression Supported compression formats are gzip, bzip2, snappy (default: none)

    Example:

    $ ./elastic-tool export "es.index=test/doc" "es.nodes=localhost" "path=/tmp/elastic"
    Starting export job with options: Map(es.index -> test/doc, es.nodes -> localhost, path -> /tmp/elastic)
    2017-11-17 00:17:39 WARN  SparkContext:66 - Support for Scala 2.10 is deprecated as of Spark 2.1.0
    2017-11-17 00:17:39 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    elastic options: Map(es.index -> test/doc, es.nodes.wan.only -> true, es.mapping.id -> id, pushdown -> true, es.nodes -> localhost)
    [Stage 1:>                                                         (0 + 4) / 10]
    Backup completed in 36862 ms    
    
  • Import (ElasticSearch <- Storage)

    Import command reads data from a storage and writes to ElasticSearch.

    Option Description
    path Path of the storage system (required)
    es.index Name of the ElasticSearch index (required)
    es.nodes ElasticSearch hostname (required)
    es.port ElasticSearch port number (default: 9200)
    es.mapping.id Name of the id field in given dataset (default: id)
    format The storage format, parquet or json (default: parquet)
    partitions Number of partitions to be used during the read operation (default: 10)

    Example:

    $ ./elastic-tool import "es.index=test/doc" "es.nodes=localhost" "path=/tmp/elastic"
    Starting restore job with options: Map(es.index -> test/doc, es.nodes -> localhost, path -> /tmp/elastic)
    2017-11-17 00:19:09 WARN  SparkContext:66 - Support for Scala 2.10 is deprecated as of Spark 2.1.0
    2017-11-17 00:19:09 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    elastic options: Map(es.index -> test/doc, es.nodes.wan.only -> true, es.mapping.id -> id, pushdown -> true, es.nodes -> localhost)
    Restore completed in 67540 ms
    

About

Apache Spark based command line tools for ElasticSearch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published