Skip to content

Commit bcae26c

Browse files
docs(clp-s): Describe more compression options; Update out-of-date description of archive-path option for decompression and search. (#1030)
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
1 parent 8ccccce commit bcae26c

File tree

1 file changed

+46
-13
lines changed

1 file changed

+46
-13
lines changed

docs/src/user-guide/core-clp-s.md

Lines changed: 46 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,32 @@ Usage:
1212
```
1313

1414
* `archives-dir` is the directory that archives should be written to.
15-
* `input-path` is any new-line-delimited JSON (ndjson) log file or directory containing such files.
16-
* `options` allow you to specify things like which field should be considered as the log event's
17-
timestamp (`--timestamp-key <field-path>`), or whether to fully parse array entries and encode
18-
them into dedicated columns (`--structurize-arrays`).
19-
* For a complete list, run `./clp-s c --help`
15+
* `input-path` is a filesystem path or URL to either:
16+
* a new-line-delimited JSON (ndjson) log file;
17+
* a KV-IR file; or
18+
* a directory containing such files.
19+
* `options` allow you to specify how data gets compressed into an archive. For example:
20+
* `--single-file-archive` specifies that single-file archives should be produced (i.e., each
21+
archive is a single file in `archives-dir`).
22+
* `--file-type <json|kv-ir>` specifies whether the input files are encoded as ndjson or KV-IR.
23+
* `--timestamp-key <field-path>` specifies which field should be treated as each log event's
24+
timestamp.
25+
* `--target-encoded-size <size>` specifies the threshold (in bytes) at which archives are split,
26+
where `size` is the total size of the dictionaries and encoded messages in an archive.
27+
* This option acts as a soft limit on memory usage for compression, decompression, and search.
28+
* This option significantly affects compression ratio.
29+
* `--structurize-arrays` specifies that arrays should be fully parsed and array entries should be
30+
encoded into dedicated columns.
31+
* `--auth <s3|none>` specifies the authentication method that should be used for network requests
32+
if the input path is a URL.
33+
* When S3 authentication is enabled, we issue a GET request following the [AWS Signature Version
34+
4 specification][aws-signature-v4]. This request uses the environment variables
35+
`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and, optionally, `AWS_SESSION_TOKEN` if it
36+
exists.
37+
* For more information on usage with S3, see our
38+
[dedicated guide](guides-using-object-storage/index).
39+
40+
For a complete list of options, run `./clp-s c --help`.
2041

2142
### Examples
2243

@@ -37,6 +58,14 @@ Specifying the timestamp-key will create a range-index for the timestamp column
3758
compression ratio and search performance.
3859
:::
3960

61+
**Compress a KV-IR file stored on S3 into a single-file archive:**
62+
63+
```shell
64+
AWS_ACCESS_KEY_ID='...' AWS_SECRET_ACCESS_KEY='...' \
65+
./clp-s c --single-file-archive --file-type kv-ir --auth s3 /mnt/data/archives \
66+
https://my-bucket.s3.us-east-2.amazonaws.com/kv-ir-log.clp
67+
```
68+
4069
**Set the target encoded size to 1 GiB and the compression level to 6 (3 by default)**
4170

4271
```shell
@@ -52,13 +81,14 @@ compression ratio and search performance.
5281
Usage:
5382

5483
```shell
55-
./clp-s x [<options>] <archives-dir> <output-dir>
84+
./clp-s x [<options>] <archives-path> <output-dir>
5685
```
5786

58-
* `archives-dir` is a directory containing archives.
87+
* `archives-path` is a directory containing archives, a path to an archive, or a URL pointing to a
88+
single-file archive.
5989
* `output-dir` is the directory that decompressed logs should be written to.
60-
* `options` allow you to specify things like a specific archive (from within `archives-dir`) to
61-
decompress (`--archive-id <archive-id>`).
90+
* `options` allow you to specify things like a specific archive (from within `archives-path`, if it
91+
is a directory) to decompress (`--archive-id <archive-id>`).
6292
* For a complete list, run `./clp-s x --help`
6393

6494
### Examples
@@ -74,13 +104,14 @@ Usage:
74104
Usage:
75105

76106
```shell
77-
./clp-s s [<options>] <archives-dir> <kql-query>
107+
./clp-s s [<options>] <archives-path> <kql-query>
78108
```
79109

80-
* `archives-dir` is a directory containing archives.
110+
* `archives-path` is a directory containing archives, a path to an archive, or a URL pointing to a
111+
single-file archive.
81112
* `kql-query` is a [KQL](reference-json-search-syntax) query.
82-
* `options` allow you to specify things like a specific archive (from within `archives-dir`) to
83-
search (`--archive-id <archive-id>`).
113+
* `options` allow you to specify things like a specific archive (from within `archives-path`, if it
114+
is a directory) to search (`--archive-id <archive-id>`).
84115
* For a complete list, run `./clp-s s --help`
85116

86117
### Examples
@@ -125,3 +156,5 @@ compressed data:**
125156
the same file.
126157
* In addition, there are a few limitations, related to querying arrays, described in the search
127158
syntax [reference](reference-json-search-syntax).
159+
160+
[aws-signature-v4]: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html

0 commit comments

Comments
 (0)