Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kolide ATC] Construct KATC tables and add support for Firefox extension data #1763

Merged
merged 24 commits into from
Jul 3, 2024

Conversation

RebeccaMahany
Copy link
Contributor

@RebeccaMahany RebeccaMahany commented Jun 26, 2024

Builds on #1761

This PR adds config parsing for KATC tables and constructing those tables.

As a proof-of-concept, it implements one table type and a couple data processing steps, with the end result of supporting querying Firefox extension data (sqlite file + data is compressed via snappy + data is serialized with StructuredClone).

I think will be easiest to review the changes in this order:

  1. pkg/osquery/table/table.go -- this is the entrypoint to the ee/katc package, and touches the changes made in the previous PR
  2. ee/katc/config.go -- this is the parsing and construction of the KATC tables
  3. ee/katc/table.go -- this is the generation of results when a query is run against a KATC table
  4. ee/katc/sqlite.go -- this is querying a specific backend (in this case, a sqlite database) on behalf of a KATC table (this function is called by ee/katc/table.go)
  5. ee/katc/snappy.go -- this is a fairly simple row transform step (called by ee/katc/table.go on the sqlite results)
  6. ee/katc/deserialize_firefox.go -- this is a significantly more complicated row transform step (called by ee/katc/table.go on the sqlite results that have already been snappy decompressed); ultimately it's deserializing a JS object into a Golang map

An example KATC config:

{
    "my_example_table": {
        "source_type": "sqlite",
        "source": "/some/%/path/to/db.sqlite",
        "platform": "darwin",
        "columns": ["someKeyInsideCompressedSerializedJSONObject"],
        "query": "SELECT data FROM object_data JOIN object_store ON (object_data.object_store_id = object_store.id) WHERE object_store.name=\"some-test-table\";",
        "row_transform_steps": ["snappy", "structured_clone"]
    }
}

Work TBD in separate PRs:

Copy link
Contributor

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not absorbed all of this. It looks neat. Early feedback

tagEndOfKeys uint32 = 0xffff0013
)

// structuredCloneDeserialize deserializes a JS object that has been stored in IndexedDB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called structuredCloneDeserialize ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named based on the mozilla references below -- could name based on some combination of firefox + indexeddb values instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the context of indexdb it makes sense to use the mozilla names. But in the context of a DSL for KATC it's much less clear. Something along your handwave there feels better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, the difficulty in naming here for me is that this function and the corresponding one in ee/indexeddb/values.go both implement StructuredDeserialize, which is a great, well-known name. But the functions are not interchangeable -- this one only works for Firefox data, and the ee/indexeddb one only works for Chrome data. Maybe we just want something like structuredDeserializeFirefox here, since that would be more immediately understandable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with deserializeFirefox

}

type katcTableConfig struct {
Type katcTableType `json:"type"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type implies something about the type of table, but I think this refers to the source? I'm also not sold on Path or Columns, they feel specific to sqlite.

I wonder if we can take a page from databases and treat this as a connect url string. sqlite:path or similar. Each source type would need to define it's own schema there.

Copy link
Contributor Author

@RebeccaMahany RebeccaMahany Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think source is a better name than type! I'll rename.

I think path should always apply -- e.g. path to JSON file, or indexeddb file, or sqlite file. I'm still waffling a bunch on columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe filepath is better/clearer? I am also not opposed to doing a connect url string approach, that feels like it could be tidy 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it was a network call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could overload path. It's an arbitrary string parsed by source. 🤷 mostly it comes down to whether you'd like source and path or source://path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts --

I like the idea of replacing path with a connection_string or dsn. That does get a little weird with supporting wildcards -- we'd have to 1) parse the connection string to extract the filepath portion, then 2) glob based on that part, then 3) re-construct the connection string to use it. But we're doing the second two steps anyway, so might as well?

I think I like keeping source in a separate field regardless of what we decide for ^, since it makes config unmarshalling simpler and more self-contained -- but I will think on it more and see how it feels after updating path!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-construct the connection string to use it.

Do we really need that? I think it'll depend a lot on the type in question. Like, imagine:

type: sqlite
dsn: file:///Users/%/...?options

If the dsn starts with file, URL parse it and hand it to glob. But I don't think we need to reconstruct, because I think the options are parsed by our katc type, not the underlying implementation. (practically speaking, maybe they're passed along, but I think that's going to be type specific)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'd do this, but I can imagine a curl style thing. https://host/path?validate=false or something, that validate might signal to the type thing to disable tls validation, but it's not going to be part of the remote URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed type to source -- path to dsn to come later once I sort through some of the other parts!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've ended up with source_type (for sqlite, leveldb, etc) and source (filepath with wildcard to db, network call, etc). I think this is a lot clearer in the code too, e.g. with the sourceData struct.

ee/katc/config.go Outdated Show resolved Hide resolved
ee/katc/sqlite.go Outdated Show resolved Hide resolved
ee/katc/sqlite.go Outdated Show resolved Hide resolved
ee/katc/sqlite.go Show resolved Hide resolved
ee/katc/sqlite.go Show resolved Hide resolved
@FritzX6
Copy link
Contributor

FritzX6 commented Jun 27, 2024

Perhaps this is a silly suggestion, but do we want to consider the capability to copy/read/delete for DB's that are locked while open? Perhaps an optional boolean column eg. read_from_copy that puts the DB in /tmp or elsewhere?

@directionless
Copy link
Contributor

Perhaps this is a silly suggestion, but do we want to consider the capability to copy/read/delete for DB's that are locked while open? Perhaps an optional boolean column eg. read_from_copy that puts the DB in /tmp or elsewhere?

We should explore that use case. There may be other sqlite options that let us open locked files. immutable specifically

@RebeccaMahany
Copy link
Contributor Author

RebeccaMahany commented Jun 27, 2024

@FritzX6 not a silly suggestion! I had thought about keeping that in the implementation details instead of the config (e.g. for leveldb+indexeddb we always want to copy the db, we shouldn't allow configuration otherwise) -- maybe there's a stronger way to require that. I will look into the immutable suggestion from seph also.

@RebeccaMahany RebeccaMahany changed the title [Kolide ATC] Construct KATC tables and add support for sqlite+snappy+structured clone [Kolide ATC] Construct KATC tables and add support for Firefox extension data Jun 28, 2024
@RebeccaMahany RebeccaMahany marked this pull request as ready for review July 1, 2024 15:24
ee/katc/table.go Outdated Show resolved Hide resolved
ee/katc/table.go Outdated Show resolved Hide resolved
Copy link
Contributor

@zackattack01 zackattack01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Contributor

@James-Pickett James-Pickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@CLAassistant
Copy link

CLAassistant commented Jul 2, 2024

CLA assistant check
All committers have signed the CLA.

@RebeccaMahany RebeccaMahany added this pull request to the merge queue Jul 3, 2024
Merged via the queue into kolide:main with commit d678648 Jul 3, 2024
29 checks passed
@RebeccaMahany RebeccaMahany deleted the becca/katc-construct branch July 3, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants