Skip to content

Latest commit

 

History

History
364 lines (234 loc) · 15.7 KB

export-pg-from-queries.md

File metadata and controls

364 lines (234 loc) · 15.7 KB
NAME
        neptune-export.sh export-pg-from-queries - Export property graph to CSV
        or JSON from Gremlin queries.

SYNOPSIS
        neptune-export.sh export-pg-from-queries
                [ --alb-endpoint <applicationLoadBalancerEndpoint> ]
                [ {-b | --batch-size} <batchSize> ] [ --clone-cluster ]
                [ --clone-cluster-correlation-id <cloneCorrelationId> ]
                [ --clone-cluster-instance-type <cloneClusterInstanceType> ]
                [ --clone-cluster-replica-count <replicaCount> ]
                [ {--cluster-id | --cluster | --clusterid} <clusterId> ]
                [ {-cn | --concurrency} <concurrency> ]
                {-d | --dir} <directory> [ --disable-ssl ]
                [ {-e | --endpoint} <endpoint>... ] [ --export-id <exportId> ]
                [ {-f | --queries-file} <queriesFile> ] [ --format <format> ]
                [ --include-type-definitions ] [ --janus ]
                [ --lb-port <loadBalancerPort> ] [ --log-level <log level> ]
                [ --max-content-length <maxContentLength> ] [ --merge-files ]
                [ --nlb-endpoint <networkLoadBalancerEndpoint> ]
                [ {-o | --output} <output> ] [ {-p | --port} <port> ]
                [ --partition-directories <partitionDirectories> ]
                [ --per-label-directories ] [ --profile <profiles>... ]
                [ {-q | --queries | --query | --gremlin} <queries>... ]
                [ {--region | --stream-region} <region> ]
                [ --serializer <serializer> ]
                [ --stream-large-record-strategy <largeStreamRecordHandlingStrategy> ]
                [ --stream-name <streamName> ] [ {-t | --tag} <tag> ]
                [ --timeout-millis <timeoutMillis> ] [ --two-pass-analysis ]
                [ --use-iam-auth ] [ --use-ssl ]

OPTIONS
        --alb-endpoint <applicationLoadBalancerEndpoint>
            Application load balancer endpoint (optional: use only if
            connecting to an IAM DB enabled Neptune cluster through an
            application load balancer (ALB) – see https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-application-load-balancer).

            This option may occur a maximum of 1 times


            This option is part of the group 'load-balancer' from which only
            one option may be specified


        -b <batchSize>, --batch-size <batchSize>
            Batch size (optional, default 64). Reduce this number if your
            queries trigger CorruptedFrameExceptions.

            This option may occur a maximum of 1 times


        --clone-cluster
            Clone an Amazon Neptune cluster.

            This option may occur a maximum of 1 times


        --clone-cluster-correlation-id <cloneCorrelationId>
            Correlation ID to be added to a correlation-id tag on the cloned
            cluster.

            This option may occur a maximum of 1 times


        --clone-cluster-instance-type <cloneClusterInstanceType>
            Instance type for cloned cluster (by default neptune-export will
            use the same instance type as the source cluster).

            This options value is restricted to the following set of values:
                db.r4.large
                db.r4.xlarge
                db.r4.2xlarge
                db.r4.4xlarge
                db.r4.8xlarge
                db.r5.large
                db.r5.xlarge
                db.r5.2xlarge
                db.r5.4xlarge
                db.r5.8xlarge
                db.r5.12xlarge
                db.t3.medium
                r4.large
                r4.xlarge
                r4.2xlarge
                r4.4xlarge
                r4.8xlarge
                r5.large
                r5.xlarge
                r5.2xlarge
                r5.4xlarge
                r5.8xlarge
                r5.12xlarge
                t3.medium

            This option may occur a maximum of 1 times


        --clone-cluster-replica-count <replicaCount>
            Number of read replicas to add to the cloned cluster (default, 0).

            This option may occur a maximum of 1 times


            This options value must fall in the following range: 0 <= value <= 15


        --cluster-id <clusterId>, --cluster <clusterId>, --clusterid
        <clusterId>
            ID of an Amazon Neptune cluster. If you specify a cluster ID,
            neptune-export will use all of the instance endpoints in the
            cluster in addition to any endpoints you have specified using the
            endpoint options.

            This option may occur a maximum of 1 times


            This option is part of the group 'endpoint or clusterId' from which
            at least one option must be specified


        -cn <concurrency>, --concurrency <concurrency>
            Concurrency – the number of parallel queries used to run the export
            (optional, default 4).

            This option may occur a maximum of 1 times


        -d <directory>, --dir <directory>
            Root directory for output.

            This option may occur a maximum of 1 times


            This options value must be a path to a directory. The provided path
            must be readable and writable.


        --disable-ssl
            Disables connectivity over SSL.

            This option may occur a maximum of 1 times


        -e <endpoint>, --endpoint <endpoint>
            Neptune endpoint(s) – supply multiple instance endpoints if you
            want to load balance requests across a cluster.

            This option is part of the group 'endpoint or clusterId' from which
            at least one option must be specified


        --export-id <exportId>
            Export id

            This option may occur a maximum of 1 times


        -f <queriesFile>, --queries-file <queriesFile>
            Path to JSON queries file (file path, or 'https' or 's3' URI).

            This option may occur a maximum of 1 times


        --format <format>
            Output format (optional, default 'csv').

            This options value is restricted to the following set of values:
                json
                csv
                csvNoHeaders
                neptuneStreamsJson
                neptuneStreamsSimpleJson

            This option may occur a maximum of 1 times


        --include-type-definitions
            Include type definitions from column headers (optional, default
            'false').

            This option may occur a maximum of 1 times


        --janus
            Use JanusGraph serializer.

            This option may occur a maximum of 1 times


        --lb-port <loadBalancerPort>
            Load balancer port (optional, default 80).

            This option may occur a maximum of 1 times


            This options value represents a port and must fall in one of the
            following port ranges: 1-1023, 1024-49151


        --log-level <log level>
            Log level (optional, default 'error').

            This options value is restricted to the following set of values:
                trace
                debug
                info
                warn
                error

            This option may occur a maximum of 1 times


        --max-content-length <maxContentLength>
            Max content length (optional, default 50000000).

            This option may occur a maximum of 1 times


        --merge-files
            Merge files for each vertex or edge label (currently only supports
            CSV files for export-pg).

            This option may occur a maximum of 1 times


        --nlb-endpoint <networkLoadBalancerEndpoint>
            Network load balancer endpoint (optional: use only if connecting to
            an IAM DB enabled Neptune cluster through a network load balancer
            (NLB) – see https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-network-load-balancer).

            This option may occur a maximum of 1 times


            This option is part of the group 'load-balancer' from which only
            one option may be specified


        -o <output>, --output <output>
            Output target (optional, default 'file').

            This options value is restricted to the following set of values:
                files
                stdout
                devnull
                stream

            This option may occur a maximum of 1 times


        -p <port>, --port <port>
            Neptune port (optional, default 8182).

            This option may occur a maximum of 1 times


            This options value represents a port and must fall in one of the
            following port ranges: 1-1023, 1024-49151


        --partition-directories <partitionDirectories>
            Partition directory path (e.g. 'year=2021/month=07/day=21').

            This option may occur a maximum of 1 times


        --per-label-directories
            Create a subdirectory for each distinct vertex or edge label.

            This option may occur a maximum of 1 times


        --profile <profiles>
            Name of an export profile.

        -q <queries>, --queries <queries>, --query <queries>, --gremlin
        <queries>
            Gremlin queries (format: name="semi-colon-separated list of
            queries" OR "semi-colon-separated list of queries").

        --region <region>, --stream-region <region>
            AWS Region in which your Amazon Kinesis Data Stream is located.

            This option may occur a maximum of 1 times


        --serializer <serializer>
            Message serializer – (optional, default 'GRAPHBINARY_V1D0').

            This options value is restricted to the following set of values:
                GRAPHSON
                GRAPHSON_V1D0
                GRAPHSON_V2D0
                GRAPHSON_V3D0
                GRAPHBINARY_V1D0
                GRYO_V1D0
                GRYO_V3D0
                GRYO_LITE_V1D0

            This option may occur a maximum of 1 times


        --stream-large-record-strategy <largeStreamRecordHandlingStrategy>
            Strategy for dealing with records to be sent to Amazon Kinesis that
            are larger than 1 MB.

            This options value is restricted to the following set of values:
                dropAll
                splitAndDrop
                splitAndShred

            This option may occur a maximum of 1 times


        --stream-name <streamName>
            Name of an Amazon Kinesis Data Stream.

            This option may occur a maximum of 1 times


        -t <tag>, --tag <tag>
            Directory prefix (optional).

            This option may occur a maximum of 1 times


        --timeout-millis <timeoutMillis>
            Query timeout in milliseconds (optional).

            This option may occur a maximum of 1 times


        --two-pass-analysis
            Perform two-pass analysis of query results (optional, default
            'false').

            This option may occur a maximum of 1 times


        --use-iam-auth
            Use IAM database authentication to authenticate to Neptune
            (remember to set the SERVICE_REGION environment variable).

            This option may occur a maximum of 1 times


        --use-ssl
            Enables connectivity over SSL. This option is
            deprecated: neptune-export will always connect via SSL unless you
            use --disable-ssl to explicitly disable connectivity over SSL.

            This option may occur a maximum of 1 times


EXAMPLES
        bin/neptune-export.sh export-pg-from-queries -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -q person="g.V().hasLabel('Person').has('birthday', lt('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday');g.V().hasLabel('Person').has('birthday', gte('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday')" -q post="g.V().hasLabel('Post').has('imageFile').range(0, 250000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(250000, 500000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(500000, 750000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(750000, -1).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id())" --concurrency 6

            Parallel export of Person data in 2 shards, sharding on the
            'birthday' property, and Post data in 4 shards, sharding on range,
            using 6 threads

        bin/neptune-export.sh export-pg-from-queries -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -q person="g.V().hasLabel('Person').has('birthday', lt('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday');g.V().hasLabel('Person').has('birthday', gte('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday')" -q post="g.V().hasLabel('Post').has('imageFile').range(0, 250000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(250000, 500000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(500000, 750000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(750000, -1).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id())" --concurrency 6 --format json

            Parallel export of Person data and Post data as JSON