CloudWatch tap for extracting log data from AWS Cloudwatch Logs Insights API.
Built with the Meltano Singer SDK.
catalog
state
discover
about
stream-maps
schema-flattening
Setting | Required | Default | Description |
---|---|---|---|
aws_access_key_id | False | None | The access key for your AWS account. |
aws_secret_access_key | False | None | The secret key for your AWS account. |
aws_session_token | False | None | The session key for your AWS account. This is only needed when you are using temporary credentials. |
aws_profile | False | None | The AWS credentials profile name to use. The profile must be configured and accessible. |
aws_endpoint_url | False | None | The complete URL to use for the constructed client. |
aws_region_name | False | None | The AWS region name (e.g. us-east-1) |
start_date | True | None | The earliest record date to sync |
end_date | False | None | The last record date to sync. This tap uses a 5 minute buffer to allow Cloudwatch logs to arrive in full. If you request data from current time it will automatically adjust your end_date to now - 5 mins. |
log_group_name | True | None | The log group on which to perform the query. |
query | True | None | The query string to use. For more information, see CloudWatch Logs Insights Query Syntax. |
batch_increment_s | False | 3600 | The size of the time window to query by, default 3,600 seconds (i.e. 1 hour). If the result set for a batch is greater than the max limit of 10,000 records then the tap will query the same window again where >= the most recent record received. This means that the same data is potentially being scanned >1 times but < 2 times, depending on the amount the results set went over the 10k max. For example a batch window with 15k records would scan the 15k once, receiving 10k results, then scan ~5k again to get the rest. The net result is the same data was scanned ~1.5 times for that batch. To avoid this you should set the batch window to avoid exceeding the 10k limit. |
stream_maps | False | None | Config object for stream maps capability. For more information check out Stream Maps. |
stream_map_config | False | None | User-defined config values to be used within map expressions. |
flattening_enabled | False | None | 'True' to enable schema flattening and automatically expand nested properties. |
flattening_max_depth | False | None | The max depth to flatten schemas. |
A full list of supported settings and capabilities is available by running: tap-cloudwatch --about
- The tap always leaves a 5 minute buffer from realtime to handle any late or out of order logs on the Cloudwatch side to guarantee all data is replicated.
Challenges related to this were first observed and discussed in #25.
It means that if you run the tap with no
end_date
configured it will attempt to retrieve data up until current time minus 5 mins. - Currently the tap uses a limit of 20 queries at a time. It sends a start_query API call then goes back to retrieve the data later once the query has completed.
This Singer tap will automatically import any environment variables within the working directory's
.env
if the --config=ENV
is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the .env
file.
You can easily run tap-cloudwatch
by itself or in a pipeline using Meltano.
tap-cloudwatch --version
tap-cloudwatch --help
tap-cloudwatch --config CONFIG --discover > ./catalog.json
Follow these instructions to contribute to this project.
pipx install poetry
poetry install
Create tests within the tap_cloudwatch/tests
subfolder and
then run:
poetry run tox -e pytest
Coverage reports are generated at tap_cloudwatch/tests/codecoverage/
.
You can also test the tap-cloudwatch
CLI interface directly using poetry run
:
poetry run tap-cloudwatch --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-cloudwatch
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-cloudwatch --version
# OR run a test `elt` pipeline:
meltano elt tap-cloudwatch target-jsonl
See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.
Using create_export_task to efficiently bulk export to S3 then read that data.