Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit log #1071

Open
Tracked by #3858
kamilkisiela opened this issue Jan 19, 2023 · 3 comments
Open
Tracked by #3858

Audit log #1071

kamilkisiela opened this issue Jan 19, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request that adds new things or value to Hive

Comments

@kamilkisiela
Copy link
Collaborator

kamilkisiela commented Jan 19, 2023

Background

Audit logs provide a full record of all user activities and system events, scoped under each organization.

Having this kind of log will allow organizations to monitor all actions (and background actions) for security and compliance purposes.

Implementation

To achieve this kind of logging, we would need to have an easy-to-use technical mechanism to integrate with the crucial flows.
Some of the flows might be triggered by a user (for example: CLI) or a machine (for example: CI/CD actions, GraphQL gateway), or by a background job (for example: Hive purge process)

Events

To get started with this task, we'll need the following events to be fully covered (also see Compliance section):

  • User is being invited to an org (when? who was invited? who is the invitee?)
  • User joined an org (when? approved? what was the referrer for the request?)
  • User tried to join an org with an expired/invalid invite (when? can we correlate to an invite?)
  • User has created a project/target (when? what was the project/target name?)
  • Admin created a new role
  • Admin assigned a new role to a user
  • Admin removed a user
  • Admin transferred ownership
  • CI made a schema check (when? what target/project?)
  • User did a schema publish (when? what target/project?)
  • Hive background job deleted old schema
  • Changes to project/org/target settings.
  • Changes to schema policy settings under an org or a project.

Storage

We expect the audit log to contain a lot of records, and we expect to allow users to have time-series views on that data, we want to use ClickHouse.

Compliance

To be compliant with requirements, we need to allow admins to:

  • View the list of events
  • Export to CSV based on a time range.

The following are nice-to-have, but can help with compliance:

  • Filter events by date/time
  • Filter by event type
  • Filter by actor

Technical Design

API

API needs to expose the list of events, including actor, date & time, and other significant information based on the event (based on the type of the event).

To achieve such a thing, we can use an approach similar to ActivityLog implementation (a GraphQL interface and implementing types).

Access to that part of the API needs to be allowed to the organization's admins.

UI

We can begin with a paginated list of recent events, and allow filtering based on event type and/or date/time range. By default, the list should show the N (30?) last events.

Definition of Done

Based on the definition above, this task can be split into multiple sub-tasks/standalone PRs:

  1. Storage: deals with recording of the logs and actions.
  2. API: Expose the data through the GraphQL API/
  3. UI: Show it to admins.
@n1ru4l n1ru4l added the enhancement New feature or request that adds new things or value to Hive label Nov 6, 2023
@theguild-bot theguild-bot mentioned this issue Jan 24, 2024
92 tasks
@capaj
Copy link
Contributor

capaj commented Jun 17, 2024

will compile a brief proposal for this. Mostly it will be about listing all important CRUD actions that happen inside these managers
image

and manually inserting into a single clickhouse table for these.

@capaj
Copy link
Contributor

capaj commented Jun 18, 2024

I wrote a brief proposal here: https://github.com/kamilkisiela/graphql-hive/blob/audit-log-wip/docs/proposals/proposal-audit-log.md

I even opened it here for a draft PR in case you would want to comment on specific lines in the markdown doc: #4990

let me know your feedback @kamilkisiela @n1ru4l

@TuvalSimha
Copy link
Collaborator

TuvalSimha commented Aug 25, 2024

Audit logs Workflow

Step 1: DB Implementation

  1. Wondering which DB is the best for this task: Clickhouse? Postgres? something else?
  2. Each org id will store one table for audit logs.
  3. The DB table will be like that:
CREATE TABLE audit_log (
  timestamp DateTime('UTC') CODEC(DoubleDelta, LZ4),
  user_id LowCardinality(String) CODEC(ZSTD(1)),
  user_email STRING,
  organization_id LowCardinality(String) CODEC(ZSTD(1)),
  project_id LowCardinality(String) CODEC(ZSTD(1)),
  project_name STRING,
  target_id LowCardinality(String) CODEC(ZSTD(1)),
  target_name STRING,
  schema_version_id LowCardinality(String) CODEC(ZSTD(1)),
  event_action LowCardinality(String) CODEC(ZSTD(1)),
  event_details JSON,
  event_human_readable STRING,
  INDEX idx_user_id user_id TYPE set(0) GRANULARITY 64,
  INDEX idx_user_email user_email TYPE set(0) GRANULARITY 64,
) ENGINE = MergeTree ()
ORDER BY (timestamp, organization_id, project_id, target_id)
TTL timestamp + INTERVAL 2 YEAR;  
  1. TTL: 24 months?
  2. AuditLogEvent Type

type AuditLogEvent = {
  user?: {
    id: string;
    email: string;
  };
  organizationId?: string | null;
  projectId?: string | null;
  targetId?: string | null;
  schemaVersionId?: string | null;
  eventAction: AuditLogEventAction;
  details: Record<string, any>;
}
  1. AuditLogEventAction type
    This is great opportunity to add more and more actions... someone have suggestions?
  enum AuditLogEventAction {
    **// USER**
    USER_INVITED
    USER_JOINED
    USER_REMOVED
    EXPIRED_INVITE_HIT
    
    **// ORGANIZATION**
    ORGANIZATION_SETTINGS_UPDATED
    ORG_TRANSFERRED
    
    **// PROJECT**
    PROJECT_CREATED
    PROJECT_SETTINGS_UPDATED
    PROJECT_DELETED
    
    **// TARGET**
    TARGET_CREATED
    TARGET_SETTINGS_UPDATED
    TARGET_DELETED
    
    **// SCHEMA**
    SCHEMA_POLICY_SETTINGS_UPDATED
    SCHEMA_CHECKED
    SCHEMA_PUBLISH
    SCHEMA_DELETED
    
     **// ROLE**
    ROLE_CREATED
    ROLE_ASSIGNED
    ROLE_DELETED
  }
  1. Make one class for DB with factory function for append row data for each action, this is should like that depending of the DB:
await this.auditLog.logAuditEvent({
userId: '690ae6ae-30e7-4e6c-8114-97e50e41aee5',
organizationId: 'da2dbbf8-6c03-4abf-964d-8a2d949da5cb',
action: 'joinOrganization',
})
  1. We should create a table for metadata like Jiri suggested?
CREATE TABLE audit_log_export (
  id UUID PRIMARY KEY,
  url TEXT,
  filters JSON,
  expires_at TIMESTAMP WITH TIME ZONE,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT now()
);

Step 2: GraphQL API

  1. Create GraphQL schema with filters, pagination, export the logs

  2. Filters:

    startDate: Date
    endDate: Date
    userId: String
    userEmail: String
    projectId: String
    targetId: String
    eventAction: String
  1. After the DB is ready, the GraphQL API will expose the data and resolve it.

Step 3: Audit logs table

  1. Shadcn table
  2. Display the pagination: display 50 logs per page
  3. Display Filters
  4. Export Button
  5. Display only for admin as tab in the org menu as path /audit-logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request that adds new things or value to Hive
Projects
None yet
Development

No branches or pull requests

4 participants