-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues in AptSources artifact and support deb822 format #2851
Conversation
Thanks! this looks great. It would be great if we can make a test for it. Just create a test file which might be a snippet of a real file and store it in https://github.com/Velocidex/velociraptor/tree/master/artifacts/testdata/files Then add a test in https://github.com/Velocidex/velociraptor/tree/master/artifacts/testdata/server/testcases say call it debian.in.yaml then you can run test with
This will make .out.yaml file and we can review it |
Certainly, I'll work on the tests. Since you don't have any objections to the queries for now, I'm moving on to real-life tests/collections. I'm about to review results, but I'd like to get help on deciphering the following error, only occurring on 0.6.8-2 clients (0.6.9 runs fine): "1:53: unexpected token "," (expected )". The log contains no errors or details, but the state is ERROR and the message is as previously mentioned. |
This happens for an expression like select foo, * from ... In 0.6.8 you can only put the * in the first position but in 0.6.9 it can go in any position so the added columns can go first You can think if it is worth keeping the order for backwards support or not |
- Support for sources files (multi-line deb822 format) - Support options other than "arch" in list files - Support for any valid usage of whitespace in list files - Support for comments in (at the end of) list files - Support for protocols other than http/https in list files - Support for multiple values in fields/options (arch, lang, targets, components) - Parse InRelease files as well as Release files, and prevent the "Version: GnuPG" header line from interfering with the "Version" field - Group the query results by cache file, preventing duplicate entries without records - Export parsing function
I missed a feature of both the one-line and deb822 formats, allowing one to add or remove values from the default without overring the defaults. The syntax is "[lang-=en]" in one-line and "Languages-Remove: en" in deb822. I doubt that these are used often, but I spotted a use in the wild, so it should be supported.
Both oneline and deb822 formats support repeated keys. That is, "[arch=amd64 arch=i386]" should result in both "amd64" and "i836" to be included.
In particular, parse "cdrom" sources lines correctly. These are not uncommon in most OSes after an install, albeit typically commented out. Quoting URIs, suites and components seems to be supported in the apt source code, but processing these fails later in apt. Support is not documented. However, the use of [] as quotes is often used for the "cdrom" protocol. Modify the regex so that these lines are parsed correctly. Also support fully quoted URIs in "", but stop there. Also support comments at the end of options, as the source code mentions that this is supported (this is an easy adjustment of the regex).
Perhaps this is wrong, but "Maintainer" does not seem to suit the values for Origin. Additionally, the field is called Origin, so perhaps this was a mistake.
This isn't supported according to the man page, but the values are parsed as if it were. I've seen examples of incorrect use in the wild, so do the same as apt-get: allow it.
f1134b7
to
e87515e
Compare
…x#2851) # Linux.Debian.AptSources The Linux.Debian.AptSources artifact has a number of limitations, all of which the changes in this pull request aims to resolve. The new artifact also provides parsers that can be imported in other artifacts for inspecting every single field in these configuration files. ## Resolved issues The artifact does not work on current Debian-based systems. One reason is an increased usage of the option "signed-by", used to manually specify the gpg key file location. The original regex only supports the option "arch" or no options. The main reason is the that Release files have been replaced with InRelease files. The format is similar, just with a gpg key embedded at the end of the file. Note that these keys often contain a gpg header "Version: […] that confuses the original VQL. **Fixes and enhancements**: - Support for sources files (multi-line [deb822 format](https://manpages.debian.org/bookworm/apt/sources.list.5.en.html#DEB822-STYLE_FORMAT)) - Support options other than "arch" in list files - Support for any valid usage of whitespace in list files - Support for comments in (at the end of) list files - Support for protocols other than http/https in list files - Support for multiple values in fields/options (arch, lang, targets, components) - Parse InRelease files as well as Release files, and prevent the "Version: GnuPG" header line from interfering with the "Version" field - Group the query results by cache file, preventing duplicate entries without records After improving the list file parser, I also added a parser for the new file format, likely to replace the one-line list file format in the future: deb822. This file format allows each entry to have multiple values (deb + deb-src, multiple URIs and options), and each file may contain multiple entries separated by blank lines. GPG keys may be embedded, and the format is very lenient on whitespace usage and allowing multi-line values. The parser should handle any kind of strange, but valid, formatting, like ``` Key: Value Key2 : Value Key3: # Comment # Comment Value ``` and so on. Due to limitations in the go regex library (no support for look-arounds) and the complexity of multi-value fields and options, the parser VQL is complex and split in a number of functions. Columns are dynamic in order to support every known (and future/unknown) options. This has pros and cons, but it is future-proof. Every known option name is normalised to the names referred to in the man page, which are camel-cased (pascal-cased) and plural. ## Considerations The main motivation behind improving this artifact was to have a working apt sources parser. Since this artifact is part of velociraptor and not the exchange (velociraptor-docs), I've tried to make the parsers handle everything I can throw at it. I am happy to include tests, but before I do that, I would like to get some feedback. I kept the original query, but I also included two others. I feel that a table including every parsed field, like all options, is handy. Likewise, a flattened version is also very useful to work on in a notebook. This results in two rounds of parsing sources files (due to how flatting happens at an early stage), and the third query parses cache files as well, like before. This shouldn't have any noticeable effect on any normal system, but if there are ways I can rewrite the VQL do only parse the files once, for both the combined and flattened results, that would of course by much nicer. And perhaps something belongs in notebook suggestions? I see lots of potential in this artifact, but for starters, I'd like it to work and be useful for importing in other artifacts. Working with inrelease-path and signed-by options in future enhancements may be useful. I find it a bit tricky to get an overview of all the supported features of artifact YAML, so feel free to point out how I may improve the documentation, ways to export functions (without including all "private" helper functions), and how to improve the current three–query solution. --------- Co-authored-by: Mike Cohen <mike@velocidex.com>
# Linux.Debian.AptSources The Linux.Debian.AptSources artifact has a number of limitations, all of which the changes in this pull request aims to resolve. The new artifact also provides parsers that can be imported in other artifacts for inspecting every single field in these configuration files. ## Resolved issues The artifact does not work on current Debian-based systems. One reason is an increased usage of the option "signed-by", used to manually specify the gpg key file location. The original regex only supports the option "arch" or no options. The main reason is the that Release files have been replaced with InRelease files. The format is similar, just with a gpg key embedded at the end of the file. Note that these keys often contain a gpg header "Version: […] that confuses the original VQL. **Fixes and enhancements**: - Support for sources files (multi-line [deb822 format](https://manpages.debian.org/bookworm/apt/sources.list.5.en.html#DEB822-STYLE_FORMAT)) - Support options other than "arch" in list files - Support for any valid usage of whitespace in list files - Support for comments in (at the end of) list files - Support for protocols other than http/https in list files - Support for multiple values in fields/options (arch, lang, targets, components) - Parse InRelease files as well as Release files, and prevent the "Version: GnuPG" header line from interfering with the "Version" field - Group the query results by cache file, preventing duplicate entries without records After improving the list file parser, I also added a parser for the new file format, likely to replace the one-line list file format in the future: deb822. This file format allows each entry to have multiple values (deb + deb-src, multiple URIs and options), and each file may contain multiple entries separated by blank lines. GPG keys may be embedded, and the format is very lenient on whitespace usage and allowing multi-line values. The parser should handle any kind of strange, but valid, formatting, like ``` Key: Value Key2 : Value Key3: # Comment # Comment Value ``` and so on. Due to limitations in the go regex library (no support for look-arounds) and the complexity of multi-value fields and options, the parser VQL is complex and split in a number of functions. Columns are dynamic in order to support every known (and future/unknown) options. This has pros and cons, but it is future-proof. Every known option name is normalised to the names referred to in the man page, which are camel-cased (pascal-cased) and plural. ## Considerations The main motivation behind improving this artifact was to have a working apt sources parser. Since this artifact is part of velociraptor and not the exchange (velociraptor-docs), I've tried to make the parsers handle everything I can throw at it. I am happy to include tests, but before I do that, I would like to get some feedback. I kept the original query, but I also included two others. I feel that a table including every parsed field, like all options, is handy. Likewise, a flattened version is also very useful to work on in a notebook. This results in two rounds of parsing sources files (due to how flatting happens at an early stage), and the third query parses cache files as well, like before. This shouldn't have any noticeable effect on any normal system, but if there are ways I can rewrite the VQL do only parse the files once, for both the combined and flattened results, that would of course by much nicer. And perhaps something belongs in notebook suggestions? I see lots of potential in this artifact, but for starters, I'd like it to work and be useful for importing in other artifacts. Working with inrelease-path and signed-by options in future enhancements may be useful. I find it a bit tricky to get an overview of all the supported features of artifact YAML, so feel free to point out how I may improve the documentation, ways to export functions (without including all "private" helper functions), and how to improve the current three–query solution. --------- Co-authored-by: Mike Cohen <mike@velocidex.com>
#2851 was merged a bit too early. My mistake, I should have marked it as a draft. A misplaced `WHERE URIs` prevented deb822 parsing to work altogether. This is now fixed, and I've included some sources files to the tests, along with another lists file that includes two common options. I've added these new files to the tests Sources and SourcesFlattened, but not SourcesCacheFiles. I've also fixed the VQL suggestions after realising that `get()` doesn't work in WHERE as I assumed it would (see discussion [here](https://discord.com/channels/624244632552734750/865024982399320114/1147470384359284746) with a detailed explanation from Mike).
#2851 was merged a bit too early. My mistake, I should have marked it as a draft. A misplaced `WHERE URIs` prevented deb822 parsing to work altogether. This is now fixed, and I've included some sources files to the tests, along with another lists file that includes two common options. I've added these new files to the tests Sources and SourcesFlattened, but not SourcesCacheFiles. I've also fixed the VQL suggestions after realising that `get()` doesn't work in WHERE as I assumed it would (see discussion [here](https://discord.com/channels/624244632552734750/865024982399320114/1147470384359284746) with a detailed explanation from Mike).
#2851 was merged a bit too early. My mistake, I should have marked it as a draft. A misplaced `WHERE URIs` prevented deb822 parsing to work altogether. This is now fixed, and I've included some sources files to the tests, along with another lists file that includes two common options. I've added these new files to the tests Sources and SourcesFlattened, but not SourcesCacheFiles. I've also fixed the VQL suggestions after realising that `get()` doesn't work in WHERE as I assumed it would (see discussion [here](https://discord.com/channels/624244632552734750/865024982399320114/1147470384359284746) with a detailed explanation from Mike).
Linux.Debian.AptSources
The Linux.Debian.AptSources artifact has a number of limitations, all of which
the changes in this pull request aims to resolve. The new artifact also provides
parsers that can be imported in other artifacts for inspecting every single
field in these configuration files.
Resolved issues
The artifact does not work on current Debian-based systems. One reason is an
increased usage of the option "signed-by", used to manually specify the gpg key
file location. The original regex only supports the option "arch" or no
options. The main reason is the that Release files have been replaced with
InRelease files. The format is similar, just with a gpg key embedded at the end
of the file. Note that these keys often contain a gpg header "Version: […] that
confuses the original VQL.
Fixes and enhancements:
GnuPG" header line from interfering with the "Version" field
records
After improving the list file parser, I also added a parser for the new file
format, likely to replace the one-line list file format in the future: deb822.
This file format allows each entry to have multiple values (deb + deb-src,
multiple URIs and options), and each file may contain multiple entries
separated by blank lines. GPG keys may be embedded, and the format is very
lenient on whitespace usage and allowing multi-line values.
The parser should handle any kind of strange, but valid, formatting, like
and so on. Due to limitations in the go regex library (no support for
look-arounds) and the complexity of multi-value fields and options, the parser
VQL is complex and split in a number of functions.
Columns are dynamic in order to support every known (and future/unknown)
options. This has pros and cons, but it is future-proof. Every known option
name is normalised to the names referred to in the man page, which are
camel-cased (pascal-cased) and plural.
Considerations
The main motivation behind improving this artifact was to have a working apt
sources parser. Since this artifact is part of velociraptor and not the
exchange (velociraptor-docs), I've tried to make the parsers handle everything
I can throw at it. I am happy to include tests, but before I do that, I would
like to get some feedback.
I kept the original query, but I also included two others. I feel that a table
including every parsed field, like all options, is handy. Likewise, a flattened
version is also very useful to work on in a notebook. This results in two
rounds of parsing sources files (due to how flatting happens at an early
stage), and the third query parses cache files as well, like before. This
shouldn't have any noticeable effect on any normal system, but if there are
ways I can rewrite the VQL do only parse the files once, for both the combined
and flattened results, that would of course by much nicer. And perhaps
something belongs in notebook suggestions?
I see lots of potential in this artifact, but for starters, I'd like it to
work and be useful for importing in other artifacts. Working with
inrelease-path and signed-by options in future enhancements may be useful.
I find it a bit tricky to get an overview of all the supported features of
artifact YAML, so feel free to point out how I may improve the documentation,
ways to export functions (without including all "private" helper functions),
and how to improve the current three–query solution.