-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kolide ATC] Construct KATC tables and add support for Firefox extension data #1763
[Kolide ATC] Construct KATC tables and add support for Firefox extension data #1763
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not absorbed all of this. It looks neat. Early feedback
ee/katc/structured_clone.go
Outdated
tagEndOfKeys uint32 = 0xffff0013 | ||
) | ||
|
||
// structuredCloneDeserialize deserializes a JS object that has been stored in IndexedDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it called structuredCloneDeserialize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Named based on the mozilla references below -- could name based on some combination of firefox
+ indexeddb values
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the context of indexdb it makes sense to use the mozilla names. But in the context of a DSL for KATC it's much less clear. Something along your handwave there feels better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, the difficulty in naming here for me is that this function and the corresponding one in ee/indexeddb/values.go
both implement StructuredDeserialize, which is a great, well-known name. But the functions are not interchangeable -- this one only works for Firefox data, and the ee/indexeddb
one only works for Chrome data. Maybe we just want something like structuredDeserializeFirefox
here, since that would be more immediately understandable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with deserializeFirefox
ee/katc/config.go
Outdated
} | ||
|
||
type katcTableConfig struct { | ||
Type katcTableType `json:"type"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
implies something about the type of table, but I think this refers to the source? I'm also not sold on Path
or Columns
, they feel specific to sqlite
.
I wonder if we can take a page from databases and treat this as a connect url string. sqlite:path
or similar. Each source type would need to define it's own schema there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think source
is a better name than type
! I'll rename.
I think path
should always apply -- e.g. path to JSON file, or indexeddb file, or sqlite file. I'm still waffling a bunch on columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe filepath
is better/clearer? I am also not opposed to doing a connect url string approach, that feels like it could be tidy 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it was a network call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could overload path
. It's an arbitrary string parsed by source. 🤷 mostly it comes down to whether you'd like source
and path
or source://path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More thoughts --
I like the idea of replacing path
with a connection_string
or dsn
. That does get a little weird with supporting wildcards -- we'd have to 1) parse the connection string to extract the filepath portion, then 2) glob based on that part, then 3) re-construct the connection string to use it. But we're doing the second two steps anyway, so might as well?
I think I like keeping source in a separate field regardless of what we decide for ^, since it makes config unmarshalling simpler and more self-contained -- but I will think on it more and see how it feels after updating path
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-construct the connection string to use it.
Do we really need that? I think it'll depend a lot on the type
in question. Like, imagine:
type: sqlite
dsn: file:///Users/%/...?options
If the dsn starts with file, URL parse it and hand it to glob. But I don't think we need to reconstruct, because I think the options are parsed by our katc type, not the underlying implementation. (practically speaking, maybe they're passed along, but I think that's going to be type specific)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'd do this, but I can imagine a curl style thing. https://host/path?validate=false
or something, that validate might signal to the type thing to disable tls validation, but it's not going to be part of the remote URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed type
to source
-- path
to dsn
to come later once I sort through some of the other parts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've ended up with source_type
(for sqlite
, leveldb
, etc) and source
(filepath with wildcard to db, network call, etc). I think this is a lot clearer in the code too, e.g. with the sourceData
struct.
Perhaps this is a silly suggestion, but do we want to consider the capability to copy/read/delete for DB's that are locked while open? Perhaps an optional boolean column eg. |
We should explore that use case. There may be other sqlite options that let us open locked files. |
@FritzX6 not a silly suggestion! I had thought about keeping that in the implementation details instead of the config (e.g. for leveldb+indexeddb we always want to copy the db, we shouldn't allow configuration otherwise) -- maybe there's a stronger way to require that. I will look into the |
…unmatched sources
db390cf
to
9868c16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
0499728
to
8926369
Compare
8926369
to
0c00c9e
Compare
Builds on #1761
This PR adds config parsing for KATC tables and constructing those tables.
As a proof-of-concept, it implements one table type and a couple data processing steps, with the end result of supporting querying Firefox extension data (sqlite file + data is compressed via snappy + data is serialized with StructuredClone).
I think will be easiest to review the changes in this order:
pkg/osquery/table/table.go
-- this is the entrypoint to theee/katc
package, and touches the changes made in the previous PRee/katc/config.go
-- this is the parsing and construction of the KATC tablesee/katc/table.go
-- this is the generation of results when a query is run against a KATC tableee/katc/sqlite.go
-- this is querying a specific backend (in this case, a sqlite database) on behalf of a KATC table (this function is called byee/katc/table.go
)ee/katc/snappy.go
-- this is a fairly simple row transform step (called byee/katc/table.go
on the sqlite results)ee/katc/deserialize_firefox.go
-- this is a significantly more complicated row transform step (called byee/katc/table.go
on the sqlite results that have already been snappy decompressed); ultimately it's deserializing a JS object into a Golang mapAn example KATC config:
Work TBD in separate PRs: