Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues in AptSources artifact and support deb822 format #2851

Merged
merged 20 commits into from
Aug 27, 2023

Conversation

misje
Copy link
Contributor

@misje misje commented Jul 29, 2023

Linux.Debian.AptSources

The Linux.Debian.AptSources artifact has a number of limitations, all of which
the changes in this pull request aims to resolve. The new artifact also provides
parsers that can be imported in other artifacts for inspecting every single
field in these configuration files.

Resolved issues

The artifact does not work on current Debian-based systems. One reason is an
increased usage of the option "signed-by", used to manually specify the gpg key
file location. The original regex only supports the option "arch" or no
options. The main reason is the that Release files have been replaced with
InRelease files. The format is similar, just with a gpg key embedded at the end
of the file. Note that these keys often contain a gpg header "Version: […] that
confuses the original VQL.

Fixes and enhancements:

  • Support for sources files (multi-line deb822 format)
  • Support options other than "arch" in list files
  • Support for any valid usage of whitespace in list files
  • Support for comments in (at the end of) list files
  • Support for protocols other than http/https in list files
  • Support for multiple values in fields/options (arch, lang, targets, components)
  • Parse InRelease files as well as Release files, and prevent the "Version:
    GnuPG" header line from interfering with the "Version" field
  • Group the query results by cache file, preventing duplicate entries without
    records

After improving the list file parser, I also added a parser for the new file
format, likely to replace the one-line list file format in the future: deb822.
This file format allows each entry to have multiple values (deb + deb-src,
multiple URIs and options), and each file may contain multiple entries
separated by blank lines. GPG keys may be embedded, and the format is very
lenient on whitespace usage and allowing multi-line values.

The parser should handle any kind of strange, but valid, formatting, like

Key: Value
Key2  : Value
Key3:
# Comment
# Comment
		  Value

and so on. Due to limitations in the go regex library (no support for
look-arounds) and the complexity of multi-value fields and options, the parser
VQL is complex and split in a number of functions.

Columns are dynamic in order to support every known (and future/unknown)
options. This has pros and cons, but it is future-proof. Every known option
name is normalised to the names referred to in the man page, which are
camel-cased (pascal-cased) and plural.

Considerations

The main motivation behind improving this artifact was to have a working apt
sources parser. Since this artifact is part of velociraptor and not the
exchange (velociraptor-docs), I've tried to make the parsers handle everything
I can throw at it. I am happy to include tests, but before I do that, I would
like to get some feedback.

I kept the original query, but I also included two others. I feel that a table
including every parsed field, like all options, is handy. Likewise, a flattened
version is also very useful to work on in a notebook. This results in two
rounds of parsing sources files (due to how flatting happens at an early
stage), and the third query parses cache files as well, like before. This
shouldn't have any noticeable effect on any normal system, but if there are
ways I can rewrite the VQL do only parse the files once, for both the combined
and flattened results, that would of course by much nicer. And perhaps
something belongs in notebook suggestions?

I see lots of potential in this artifact, but for starters, I'd like it to
work and be useful for importing in other artifacts. Working with
inrelease-path and signed-by options in future enhancements may be useful.

I find it a bit tricky to get an overview of all the supported features of
artifact YAML, so feel free to point out how I may improve the documentation,
ways to export functions (without including all "private" helper functions),
and how to improve the current three–query solution.

@CLAassistant
Copy link

CLAassistant commented Jul 29, 2023

CLA assistant check
All committers have signed the CLA.

@scudette
Copy link
Contributor

Thanks! this looks great. It would be great if we can make a test for it. Just create a test file which might be a snippet of a real file and store it in https://github.com/Velocidex/velociraptor/tree/master/artifacts/testdata/files

Then add a test in https://github.com/Velocidex/velociraptor/tree/master/artifacts/testdata/server/testcases say call it debian.in.yaml

then you can run test with

GOLDEN=debian make golden

This will make .out.yaml file and we can review it

@misje
Copy link
Contributor Author

misje commented Aug 2, 2023

Certainly, I'll work on the tests.

Since you don't have any objections to the queries for now, I'm moving on to real-life tests/collections. I'm about to review results, but I'd like to get help on deciphering the following error, only occurring on 0.6.8-2 clients (0.6.9 runs fine): "1:53: unexpected token "," (expected )". The log contains no errors or details, but the state is ERROR and the message is as previously mentioned.

@scudette
Copy link
Contributor

scudette commented Aug 2, 2023

This happens for an expression like select foo, * from ...

In 0.6.8 you can only put the * in the first position but in 0.6.9 it can go in any position so the added columns can go first

You can think if it is worth keeping the order for backwards support or not

misje and others added 19 commits August 23, 2023 22:06
- Support for sources files (multi-line deb822 format)
- Support options other than "arch" in list files
- Support for any valid usage of whitespace in list files
- Support for comments in (at the end of) list files
- Support for protocols other than http/https in list files
- Support for multiple values in fields/options (arch, lang, targets, components)
- Parse InRelease files as well as Release files, and prevent the "Version:
  GnuPG" header line from interfering with the "Version" field
- Group the query results by cache file, preventing duplicate entries without
  records
- Export parsing function
I missed a feature of both the one-line and deb822 formats, allowing one
to add or remove values from the default without overring the defaults.
The syntax is "[lang-=en]" in one-line and "Languages-Remove: en" in
deb822.

I doubt that these are used often, but I spotted a use in the wild, so
it should be supported.
Both oneline and deb822 formats support repeated keys. That is,
"[arch=amd64 arch=i386]" should result in both "amd64" and "i836" to be
included.
In particular, parse "cdrom" sources lines correctly. These are not
uncommon in most OSes after an install, albeit typically commented out.

Quoting URIs, suites and components seems to be supported in the apt
source code, but processing these fails later in apt. Support is not
documented. However, the use of [] as quotes is often used for the
"cdrom" protocol. Modify the regex so that these lines are parsed
correctly. Also support fully quoted URIs in "", but stop there.

Also support comments at the end of options, as the source code mentions
that this is supported (this is an easy adjustment of the regex).
Perhaps this is wrong, but "Maintainer" does not seem to suit the values
for Origin. Additionally, the field is called Origin, so perhaps this
was a mistake.
This isn't supported according to the man page, but the values are
parsed as if it were. I've seen examples of incorrect use in the wild,
so do the same as apt-get: allow it.
@scudette scudette merged commit d89cf7a into Velocidex:master Aug 27, 2023
1 check passed
scudette added a commit to scudette/velociraptor that referenced this pull request Aug 28, 2023
…x#2851)

# Linux.Debian.AptSources

The Linux.Debian.AptSources artifact has a number of limitations, all of
which
the changes in this pull request aims to resolve. The new artifact also
provides
parsers that can be imported in other artifacts for inspecting every
single
field in these configuration files.

## Resolved issues

The artifact does not work on current Debian-based systems. One reason
is an
increased usage of the option "signed-by", used to manually specify the
gpg key
file location. The original regex only supports the option "arch" or no
options. The main reason is the that Release files have been replaced
with
InRelease files. The format is similar, just with a gpg key embedded at
the end
of the file. Note that these keys often contain a gpg header "Version:
[…] that
confuses the original VQL.

**Fixes and enhancements**:

- Support for sources files (multi-line [deb822
format](https://manpages.debian.org/bookworm/apt/sources.list.5.en.html#DEB822-STYLE_FORMAT))
- Support options other than "arch" in list files
- Support for any valid usage of whitespace in list files
- Support for comments in (at the end of) list files
- Support for protocols other than http/https in list files
- Support for multiple values in fields/options (arch, lang, targets,
components)
- Parse InRelease files as well as Release files, and prevent the
"Version:
  GnuPG" header line from interfering with the "Version" field
- Group the query results by cache file, preventing duplicate entries
without
  records

After improving the list file parser, I also added a parser for the new
file
format, likely to replace the one-line list file format in the future:
deb822.
This file format allows each entry to have multiple values (deb +
deb-src,
multiple URIs and options), and each file may contain multiple entries
separated by blank lines. GPG keys may be embedded, and the format is
very
lenient on whitespace usage and allowing multi-line values.

The parser should handle any kind of strange, but valid, formatting,
like

```
Key: Value
Key2  : Value
Key3:
# Comment
# Comment
		  Value
```

and so on. Due to limitations in the go regex library (no support for
look-arounds) and the complexity of multi-value fields and options, the
parser
VQL is complex and split in a number of functions.

Columns are dynamic in order to support every known (and future/unknown)
options. This has pros and cons, but it is future-proof. Every known
option
name is normalised to the names referred to in the man page, which are
camel-cased (pascal-cased) and plural.

## Considerations

The main motivation behind improving this artifact was to have a working
apt
sources parser. Since this artifact is part of velociraptor and not the
exchange (velociraptor-docs), I've tried to make the parsers handle
everything
I can throw at it. I am happy to include tests, but before I do that, I
would
like to get some feedback.

I kept the original query, but I also included two others. I feel that a
table
including every parsed field, like all options, is handy. Likewise, a
flattened
version is also very useful to work on in a notebook. This results in
two
rounds of parsing sources files (due to how flatting happens at an early
stage), and the third query parses cache files as well, like before.
This
shouldn't have any noticeable effect on any normal system, but if there
are
ways I can rewrite the VQL do only parse the files once, for both the
combined
and flattened results, that would of course by much nicer. And perhaps
something belongs in notebook suggestions?

I see lots of potential in this artifact, but for starters, I'd like it
to
work and be useful for importing in other artifacts. Working with
inrelease-path and signed-by options in future enhancements may be
useful.

I find it a bit tricky to get an overview of all the supported features
of
artifact YAML, so feel free to point out how I may improve the
documentation,
ways to export functions (without including all "private" helper
functions),
and how to improve the current three–query solution.

---------

Co-authored-by: Mike Cohen <mike@velocidex.com>
scudette added a commit that referenced this pull request Aug 28, 2023
# Linux.Debian.AptSources

The Linux.Debian.AptSources artifact has a number of limitations, all of
which
the changes in this pull request aims to resolve. The new artifact also
provides
parsers that can be imported in other artifacts for inspecting every
single
field in these configuration files.

## Resolved issues

The artifact does not work on current Debian-based systems. One reason
is an
increased usage of the option "signed-by", used to manually specify the
gpg key
file location. The original regex only supports the option "arch" or no
options. The main reason is the that Release files have been replaced
with
InRelease files. The format is similar, just with a gpg key embedded at
the end
of the file. Note that these keys often contain a gpg header "Version:
[…] that
confuses the original VQL.

**Fixes and enhancements**:

- Support for sources files (multi-line [deb822
format](https://manpages.debian.org/bookworm/apt/sources.list.5.en.html#DEB822-STYLE_FORMAT))
- Support options other than "arch" in list files
- Support for any valid usage of whitespace in list files
- Support for comments in (at the end of) list files
- Support for protocols other than http/https in list files
- Support for multiple values in fields/options (arch, lang, targets,
components)
- Parse InRelease files as well as Release files, and prevent the
"Version:
  GnuPG" header line from interfering with the "Version" field
- Group the query results by cache file, preventing duplicate entries
without
  records

After improving the list file parser, I also added a parser for the new
file
format, likely to replace the one-line list file format in the future:
deb822.
This file format allows each entry to have multiple values (deb +
deb-src,
multiple URIs and options), and each file may contain multiple entries
separated by blank lines. GPG keys may be embedded, and the format is
very
lenient on whitespace usage and allowing multi-line values.

The parser should handle any kind of strange, but valid, formatting,
like

```
Key: Value
Key2  : Value
Key3:
# Comment
# Comment
		  Value
```

and so on. Due to limitations in the go regex library (no support for
look-arounds) and the complexity of multi-value fields and options, the
parser
VQL is complex and split in a number of functions.

Columns are dynamic in order to support every known (and future/unknown)
options. This has pros and cons, but it is future-proof. Every known
option
name is normalised to the names referred to in the man page, which are
camel-cased (pascal-cased) and plural.

## Considerations

The main motivation behind improving this artifact was to have a working
apt
sources parser. Since this artifact is part of velociraptor and not the
exchange (velociraptor-docs), I've tried to make the parsers handle
everything
I can throw at it. I am happy to include tests, but before I do that, I
would
like to get some feedback.

I kept the original query, but I also included two others. I feel that a
table
including every parsed field, like all options, is handy. Likewise, a
flattened
version is also very useful to work on in a notebook. This results in
two
rounds of parsing sources files (due to how flatting happens at an early
stage), and the third query parses cache files as well, like before.
This
shouldn't have any noticeable effect on any normal system, but if there
are
ways I can rewrite the VQL do only parse the files once, for both the
combined
and flattened results, that would of course by much nicer. And perhaps
something belongs in notebook suggestions?

I see lots of potential in this artifact, but for starters, I'd like it
to
work and be useful for importing in other artifacts. Working with
inrelease-path and signed-by options in future enhancements may be
useful.

I find it a bit tricky to get an overview of all the supported features
of
artifact YAML, so feel free to point out how I may improve the
documentation,
ways to export functions (without including all "private" helper
functions),
and how to improve the current three–query solution.

---------

Co-authored-by: Mike Cohen <mike@velocidex.com>
@misje misje deleted the improve-aptsources branch September 2, 2023 14:49
scudette pushed a commit that referenced this pull request Sep 6, 2023
#2851 was merged a bit too early. My mistake, I should have marked it as
a draft. A misplaced `WHERE URIs` prevented deb822 parsing to work
altogether. This is now fixed, and I've included some sources files to
the tests, along with another lists file that includes two common
options.

I've added these new files to the tests Sources and SourcesFlattened,
but not SourcesCacheFiles.

I've also fixed the VQL suggestions after realising that `get()` doesn't
work in WHERE as I assumed it would (see discussion
[here](https://discord.com/channels/624244632552734750/865024982399320114/1147470384359284746)
with a detailed explanation from Mike).
scudette pushed a commit that referenced this pull request Sep 10, 2023
#2851 was merged a bit too early. My mistake, I should have marked it as
a draft. A misplaced `WHERE URIs` prevented deb822 parsing to work
altogether. This is now fixed, and I've included some sources files to
the tests, along with another lists file that includes two common
options.

I've added these new files to the tests Sources and SourcesFlattened,
but not SourcesCacheFiles.

I've also fixed the VQL suggestions after realising that `get()` doesn't
work in WHERE as I assumed it would (see discussion
[here](https://discord.com/channels/624244632552734750/865024982399320114/1147470384359284746)
with a detailed explanation from Mike).
scudette pushed a commit that referenced this pull request Sep 10, 2023
#2851 was merged a bit too early. My mistake, I should have marked it as
a draft. A misplaced `WHERE URIs` prevented deb822 parsing to work
altogether. This is now fixed, and I've included some sources files to
the tests, along with another lists file that includes two common
options.

I've added these new files to the tests Sources and SourcesFlattened,
but not SourcesCacheFiles.

I've also fixed the VQL suggestions after realising that `get()` doesn't
work in WHERE as I assumed it would (see discussion
[here](https://discord.com/channels/624244632552734750/865024982399320114/1147470384359284746)
with a detailed explanation from Mike).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants