SysGuardd Architecture¶
Overview¶
SysGuardd is a runtime enforcement daemon for Linux hosts and Kubernetes nodes. It combines kernel-level event capture with deterministic policy decisions and active mitigation.
Core objective: - Observe process execution attempts in near real time. - Decide if execution is allowed using explicit policy. - Stop unauthorized binaries before they can run.
System Diagram¶
flowchart LR
A[Process Exec Attempt] --> B[eBPF Probe Layer]
B --> C[Lockless Ring Buffer]
C --> D[User-Space Event Ingestion]
D --> E[Policy Engine]
E -->|allow - severity:info| F[Audit Logger JSON Output]
E -->|deny| G[Mitigation Engine SIGKILL]
G --> F
F --> H[SIEM or Log Pipeline]
E -->|deny - severity:warning/critical| I[Alert Dispatcher]
G -->|enforce_failure - severity:critical| I
I -->|dedupe + rate-limit| J[Webhook Slack/Teams]
High-Level Data Flow¶
- A process launch event (for example
execve) is intercepted through eBPF. - Event metadata is emitted to user space through a lockless ring buffer.
- A monotonic
event_idis assigned andseverityis computed for the event. - A policy engine evaluates the executable and context against active rules.
- If unauthorized and in enforce mode, SysGuardd sends a termination signal (
SIGKILL). - Every event is logged as a structured JSON audit record including
event_id,severity,node_id, andpolicy_version. - Deny and enforce-failure events are non-blocking queued to the Alert Dispatcher.
- The Alert Dispatcher applies deduplication and rate limiting before forwarding to a webhook endpoint.
Main Components¶
1) Kernel Probe Layer¶
Responsibilities: - Attach eBPF programs at process execution boundaries. - Capture minimal, high-value metadata needed for policy checks.
Captured fields (target set): - PID - PPID - Executable path - Command arguments (truncated by safe limits) - Timestamp
Design considerations: - Keep probe logic small and verifiable. - Avoid expensive operations inside kernel space.
2) Transport Layer¶
Responsibilities: - Move event records from kernel to user space with low overhead.
Mechanism: - Memory-mapped lockless ring buffer.
Design considerations: - Preserve ordering where possible. - Handle backpressure with bounded drops and metrics.
3) Policy Engine¶
Responsibilities: - Evaluate process events against enforcement policy. - Return allow or deny decisions deterministically.
Policy model (initial): - Explicit deny list for executable paths and patterns. - Optional parent-process constraints. - Default action defined per deployment mode.
Performance target: - O(1)-style lookups for common policy checks.
4) Mitigation Engine¶
Responsibilities: - Apply enforcement action when a deny decision is returned.
Action model (initial):
- Send SIGKILL to blocked process as primary kill path.
- Emit a mandatory audit event for every mitigation action.
5) Telemetry and Integrations¶
Responsibilities: - Publish normalized events for observability and incident response.
Output targets: - Local JSON logs (structured, one event per line) - gRPC event stream for SIEM or cluster control-plane integrations (planned)
Normalized audit event fields:
- event_id — monotonic hex counter, unique per daemon run; correlates log lines with alerts
- severity — info (allow), warning (deny in monitor mode), critical (deny in enforce mode or mitigation failure)
- node_id — host identity; defaults to system hostname, overridable via --node-id or Kubernetes Downward API
- policy_version — optional tag for tracing which policy version produced the decision
6) Alert Dispatcher¶
Responsibilities: - Deliver security alert events to external notification endpoints in real time. - Protect the event loop from any blocking I/O.
Design:
- Runs on a dedicated background worker thread.
- Receives events via a bounded in-process queue (max 128 entries); oldest entry dropped on overflow.
- Applies per-key deduplication (exe + severity) within a configurable window to suppress repeated alerts.
- Applies token-bucket rate limiting (configurable per-minute cap) to prevent alert storms.
- Posts Slack/Teams-compatible JSON payloads via HTTP or HTTPS POST.
- Delivery failures are logged to stderr; the daemon runtime is never interrupted.
- Instantiated only when --alert-enabled is passed; zero overhead when disabled.
Trust Boundaries¶
- Kernel boundary: untrusted process behavior, trusted verified eBPF program.
- User-space daemon boundary: trusted policy and mitigation logic.
- External integrations boundary: authenticated transport and least-privilege credentials.
Failure Modes and Safety Defaults¶
- Policy service unavailable: fail closed for strict mode, fail open for monitor mode.
- Ring buffer overflow: count drops, expose metrics, and alert.
- Mitigation failure: logged as
action_error; severity escalated tocritical; alert dispatched if enabled. - Webhook delivery failure: logged to stderr; daemon continues without interruption.
- Alert queue full: oldest queued alert dropped silently; event loop never blocked.
- Alert rate limit exceeded: excess alerts dropped; resets after one minute.
Deployment Topologies¶
Linux Host Mode¶
- One SysGuardd daemon per host.
- Local policy file or remote policy endpoint.
Kubernetes Node Mode¶
- DaemonSet deployment per node.
- Node-level visibility for workloads running on that node.
- Cluster-level telemetry aggregation.
Security Notes¶
- Run with minimum privileges required for eBPF loading and signaling.
- Sign and verify policy bundles where possible.
- Protect telemetry channels with TLS/mTLS.
Future Enhancements¶
- Policy-as-code with versioned bundles.
- Allow-list mode for high-trust environments.
- Behavioral heuristics layered on deterministic policies.
- Monitoring API (
/health,/status,/events,/stats) with token authentication. - Read-only web dashboard for live event feed and blocked-executable trends.
- gRPC telemetry stream for SIEM integration.
- eBPF CO-RE portability optimization.