Skip to content

[Tracking] Opening multiple event sources in the same Falco instance #2074

Closed
@jasondellaluce

Description

Motivation

The plugin system allows Falco to open new kinds of event sources that go beyond the historical syscall use case. Recently, this has been leveraged to port the k8s audit log event source to a plugin (see: https://github.com/falcosecurity/plugins/tree/master/plugins/k8saudit, and #1952). One of the core limitation that comes from the plugin system implementation of the libraries, is that a given Falco instance is capable of opening only one event source. In the example above, this implies that Falco instances are not able to ingest both syscalls and k8s audit logs together. This can instead be accomplished by deploying two distinct Falco instances, one for each event source.

Feature Requirements

  • (R1) A single Falco instance should become able to open more than one event source at once and in parallel
  • (R2) There should be feature parity and performance parity between having 2+ source active in parallel in a single Falco instances and having 2+ single-source Falco instances with the same event sources

Proposed Solution

Release Goals

To be defined. This is out of reach for Falco 0.32.1. Falco 0.33.0

Terminology

  • Capture Mode: A configuration of sinsp inspectors that reads event from a trace file
  • Live Mode: A configuration of sinsp inspectors that reads event from one of the supported modes (kmod, ebpf, gvisor, plugin)

Design

  • (D1) The feature is implemented in Falco only, and mostly only affects the codebase of falcosecurity/falco. Both libsinsp and libscap will keep working in single-source mode
  • (D2) Falco manages multiple sinsp instances, one in each thread
  • (D3) Falco manages one or more instances of sinsp inspectors
    • If the # of inspectors is 1, everything runs in the main thread just like now
    • If the # of inspectors is 2+, each inspector runs in its own a separate thread (see (R1)). The whole event data path happens in parallel within each thread (event production, data enrichment, event-rule matching, and output formatting)
  • (D4) If in capture mode, Falco runs a only 1 inspector configured to read events from a trace file
  • (D5) If in live mode, Falco runs 1 inspector for each active event source
    • If an event source terminates due to EOF being reached, Falco waits for the other event sources to terminate too
    • If an event source terminates with an error, Falco forces the termination of all the other event sources
  • (D6) There is 1 instance of the Falco Rule Engine (just like now), and we leverage/enforce thread-safety guarantees to make sure it is safe and non-blocking for different threads to perform event-rule matching
  • (D7) There is 1 instance of the Falco Output Engine (just like now), and we leverage/enforce thread-safety guarantees to make sure it is safe for different threads to send alerts when an event-rule match is found
    • Non-blocking guarantees are less of a concern here, because the number of alerts is orders of magnitudes lower than the number of events

Technical Limitations of the Design

  • (L1) There cannot be 2+ event sources with the same name active at the same time
    • This would defeat the thread-safety guarantees of the Rule Engine, which are based on the notion of event source partitioning
    • Potential Workarounds (for the future, just in case):
      • Have more than one instances of the Rule Engine to handle the increased event source cardinality. For example, the second Rule Engine instance would cover all the second event source replicas, the third Rule Engine instance will handle the third replicas, and so on
      • We make the Rule Engine thread safe without the event source <-> thread 1-1 mapping assumptions. This is hardly achievable, because this would imply making the whole filtercheck system of libsinsp thread-safe too. Another naive solution would be to create one mutex for each event source to protect the access to the Rule Engine. In both scenarios, this would be hard to manage and performance would be sub-optimal
      • We have one Rule Engine for each source, which could become harder to manager. For example, rule files would need to be loaded by all the rule engines, which makes the initialization phase and hot-reloading slower too. However, this is something we can consider for the future.
  • (L2) Filterchecks cannot be shared across different event sources to guarantee thread-safety in the Rule Engine. The direct implication is that if an plugin with extractor capability is compatible with 2+ active event sources (e.g. json can extract from both aws_cloudtrail and k8s_audit), we need to create and initialize two different instances of the plugin (1 for each inspector)
    • Practically, this means that a given plugin instance will always extract fields coming from the same event source (a.k.a. subsequent calls to plugin_extract_fields will never receive events from two distinct event sources for the same initialized pluginstate)
    • This limitation can actually be turned as a by-design feature, because doing the contrary would violate (R2)
    • Potential Workarounds (for the future, just in case):
      • Make field extraction thread-safe (hardly doable, see points in (L1)

Technical Blockers

This is the list of things we mandatorily need to work on to see this initiative happen.

Nice to Have

Linked Discussions

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions