Peer Authentication
Varta’s observer trusts the kernel, not the wire. Two layers of defence in-depth ensure that process identity cannot be spoofed by anything that can reach the Unix Domain Socket.
Layer 1: socket file permissions (--socket-mode)
After bind(2), the observer chmods the socket file to 0600 by
default (owner read and write only). Only processes running under the
same UID as the observer can connect(2) to the socket.
| Flag | Default | Format | Behaviour |
|---|---|---|---|
--socket-mode | 0600 | Octal (e.g. 0660) | File mode applied via chmod(2) after bind. Pass 0660 to allow group access. |
Layer 2: kernel credential verification
Linux
The observer sets SO_PASSCRED on the socket after binding. Every
recvmsg(2) call then receives a SCM_CREDENTIALS ancillary message
containing a struct ucred { pid, uid, gid } populated by the kernel.
The observer compares ucred.pid against frame.pid from the VLP wire
format. If they disagree the frame is silently dropped and
varta_frame_auth_failures_total is incremented. The ucred.uid
field is implicitly trusted by Layer 1 (--socket-mode 0600 already
restricts access to the owning UID), but could be checked as a
fail-safe if a permission bypass is ever discovered.
macOS
On macOS, the observer first attempts getsockopt(LOCAL_PEERTOKEN)
immediately after each recvmsg(2). LOCAL_PEERTOKEN returns an
audit_token_t containing the sender’s PID, UID, GID, and audit
information. Because the observer is single-threaded and calls
getsockopt immediately after recvmsg, no other datagram can arrive
between the two syscalls.
When LOCAL_PEERTOKEN succeeds, the observer performs the same PID +
UID verification as on Linux. When it fails (e.g. on older macOS
versions or unconnected SOCK_DGRAM where the kernel doesn’t expose
per-datagram credentials), the observer falls back to two separate
getsockopt calls:
LOCAL_PEERPID(0x0002) — returns the peer’s PID directly.LOCAL_PEERCRED(0x0001) — returns astruct xucredwith the peer’s UID incr_uid.
If the fallback also fails, the observer falls back to the sentinel
PID 0 — relying on --socket-mode 0600 as the primary defence.
FreeBSD, DragonFly BSD, NetBSD
On FreeBSD-family platforms, the observer sets LOCAL_CREDS on the
socket (value 0x0002 on FreeBSD/DragonFly, 0x0001 on NetBSD). Every
recvmsg(2) then receives a SCM_CREDS ancillary message containing a
struct cmsgcred { cmcred_pid, cmcred_uid, cmcred_euid, cmcred_gid, ... }
populated by the kernel. The observer extracts cmcred_pid and
cmcred_euid and performs the same PID + UID verification as on Linux.
The ancillary buffer is sized at 256 bytes — sufficient for the 84-byte
cmsgcred with generous headroom for future kernel extensions.
Note: On platforms other than Linux, macOS, FreeBSD, DragonFly, and NetBSD (OpenBSD, Solaris, illumos, etc.),
varta-watchemits a startup warning via stderr:"per-datagram PID verification is unavailable. The only defence is --socket-mode (default 0600); any process under the same UID can impersonate any PID."This is by design — the kernel does not expose per-datagram peer credentials for unconnectedSOCK_DGRAMon these platforms. Containers that run multiple processes under the same UID should be aware of this limitation.
UDP transport authentication
For network-based agents that emit beats over UDP, the trust model is
cryptographic, not kernel-attested. UDP has no peer-credential
mechanism on any platform — recvmsg(2) cannot tell the observer who
sent a datagram, only where it claims to be from. Varta therefore
requires authentication at the AEAD layer, and refuses to bind an
unauthenticated UDP listener without two layers of explicit opt-in.
Compile-time features (crates/varta-watch/Cargo.toml)
| Cargo feature | What it enables | Production posture |
|---|---|---|
secure-udp | SecureUdpListener (ChaCha20-Poly1305 AEAD + per-sender replay) | Recommended |
unsafe-plaintext-udp | UdpListener (no authentication) | Forbidden in production |
udp-core | Internal — shared UDP socket wiring | (transitive) |
A build that does not include unsafe-plaintext-udp cannot link the
plaintext path at all. Passing --udp-port without keys to such a build
hard-errors at startup; there is no warn-and-continue path.
Runtime selection rules
When --udp-port is set, the observer chooses exactly one listener:
- If
--features secure-udpis compiled in and--key-file/--master-key-fileresolve to a usable key, bindSecureUdpListener. - Otherwise, only the plaintext path remains. It is bound only if
both
--features unsafe-plaintext-udpis compiled in and--i-accept-plaintext-udpwas passed on the command line. - Any other configuration is a hard error (
InvalidInput).
When the plaintext path is taken, a high-visibility varta_warn! is
emitted at startup naming the bound address, so the choice appears in
SIEM / syslog logs:
UDP on <addr> is running WITHOUT authentication (--i-accept-plaintext-udp).Any device with network reach to this port can inject heartbeats, suppressstall detection, or trigger false recovery commands. NOT for production /safety-critical use.
--i-accept-plaintext-udp is intentionally verbose: an operator who
types it is making an explicit statement that this build is for
development or testing, not for a hospital VLAN.
Why no kernel-level UDP credentials
Unix Domain Sockets carry SCM_CREDENTIALS / LOCAL_PEERTOKEN /
SCM_CREDS per-datagram. UDP carries none of those. Even on a single
host where --udp-bind-addr 127.0.0.1 is used, any local process can
send to that port — there is no equivalent of --socket-mode 0600 for
network sockets. AEAD is the only durable defence.
Recovery eligibility and transport-origin gating
Recovery commands (--recovery-cmd / --recovery-exec and the *-file
variants) take the stalled agent’s frame.pid and substitute it into
the spawned process (kill -9 {pid}, systemctl restart agent@{pid}.service,
etc.). That makes recovery a privileged action that targets an arbitrary
process by id — and means the wire-level frame.pid must be tied back to
the real sending process, not just to whoever holds an AEAD key.
The trust invariant
A recovery command MUST NEVER fire for a pid whose beat lifetime is not kernel-attested. In practice that means:
| Transport | Kernel-attested? | Recovery-eligible by default? |
|---|---|---|
| UDS | Yes — SO_PASSCRED / LOCAL_PEERTOKEN / SCM_CREDS | Yes |
| Plaintext UDP | No — peer_pid is always 0 | No |
| Secure UDP | No — frame is cryptographically authenticated but the kernel does not attest the sending process; a holder of the AEAD key (or a per-agent key derived from a leaked master key) can forge a beat for any pid | No |
Internally each beat is tagged with a BeatOrigin
(KernelAttested vs NetworkUnverified). The tracker pins the origin on
the slot’s first beat and rejects subsequent beats from a different
origin as Event::OriginConflict (counter:
varta_origin_conflict_total). First-origin-wins prevents an attacker on
an untrusted transport from “tainting” a slot that legitimately belongs to
a kernel-attested agent.
Two-layer enforcement
-
Startup hard-error. If any
--recovery-cmd/--recovery-cmd-file/--recovery-exec/--recovery-exec-fileis configured and--udp-portis set, the daemon refuses to start withConfigError::RecoveryRequiresAuthenticatedTransport. Operators must pass--i-accept-recovery-on-unauthenticated-transportto proceed. The flag is verbose by design (matches the--i-accept-<risk>convention) and shows up incargo tree/ startup banners. -
Runtime origin gate. Even with the accept flag,
Recovery::on_stallrefuses to spawn the recovery command when the stalled slot’s pinned origin isNetworkUnverified. The refusal returns the typedRecoveryOutcome::RefusedUnauthenticatedSource { pid }, incrementsvarta_recovery_refused_total{reason="unauthenticated_transport"}, and emits a structuredrefusedrecord into the recovery audit log (--recovery-audit-file). To enable UDP-origin recovery the operator must construct theRecoverywithwith_allow_unauthenticated_source(true)— a second, conscious choice on top of the startup flag.
Why secure-UDP isn’t enough
The secure-UDP master-key mode binds frame.pid to the 4-byte PID prefix
in iv_random[0..4] and derives a per-agent key from the master key.
That is a useful cryptographic binding for the UDP threat model — a
holder of a single derived agent key cannot forge frames for other
pids. But the binding lives at the protocol layer, not at the kernel
layer:
- A leak of the shared key lets anyone forge any pid.
- A leak of the master key lets anyone derive any agent key.
- A leak of any per-agent key still lets that agent forge its own pid to misbehave (e.g. stop sending → trigger recovery against its own pid during legitimate maintenance windows).
Kernel attestation has no such failure mode: the kernel knows which
process owns the socket fd, and that knowledge cannot be forged by any
amount of key material. This is why Varta classifies all UDP variants
(plain and secure) as NetworkUnverified for the recovery-eligibility
decision.
Recovery command authentication boundary
--recovery-cmd (inline shell) and --recovery-cmd-file (file-based
shell) both spawn /bin/sh -c <template> with the observer’s full
process authority. In a safety-critical deployment a recovery template
like systemctl restart {service} or kill -9 {pid} can terminate
unrelated production processes if the template body is mis-edited or if
shell metacharacters appear unexpectedly.
To prevent accidental shell-mode deployment, shell mode requires
--i-accept-shell-risk at runtime. Without that flag, startup
hard-errors with a message that recommends --recovery-exec (which
calls execvp(2) directly — no shell, no metacharacter interpretation,
no injection surface). This applies to both the inline and file-based
forms; the shell-injection risk is identical regardless of where the
template comes from.
--recovery-exec and --recovery-exec-file do not require an
accept flag — they are the default-safe path.
Prometheus /metrics endpoint exposure
The /metrics endpoint is HTTP/1.0 with mandatory bearer-token
authentication. When --prom-addr is set, --prom-token-file is
required: the observer refuses to start without it. Every scrape must
send Authorization: Bearer <hex> where <hex> is the lowercase 64-byte
hex form of the file’s 32 random bytes (the format produced by
openssl rand -hex 32). Missing or wrong tokens get
HTTP/1.0 401 Unauthorized and bump varta_prom_auth_failures_total.
The token file is loaded through the same hardened validator that
guards --key-file (see “Secret-file validation” below): regular file,
no symlinks, owned by the observer UID, mode 0o600 or stricter,
opened with O_NOFOLLOW.
The endpoint also retains four DoS-protection layers from earlier work, so that a hostile scraper cannot exhaust file descriptors or starve the observer’s poll loop even before the auth check runs:
- Serve budget — at most
PROM_MAX_CONNECTIONS_PER_SERVE=8accepted connections per outer poll tick, and a 100 ms wall-clock deadline. - Drain budget — after the serve budget is exhausted, an
additional
PROM_MAX_DRAIN_PER_SERVE=50connections may be accepted and immediately closed, so the kernel accept queue does not back up. - Per-source-IP token bucket — every accepted connection (in both
serve and drain phases) decrements a per-IP token bucket sized by
--prom-rate-limit-burst(default 10) and refilled at--prom-rate-limit-per-sec(default 5). Connections from an IP whose bucket is empty are closed without serving and counted asvarta_prom_connections_dropped_total{reason="rate_limit"}. - Per-IP table cap — the per-IP map is bounded to 1024 entries;
when full, stale entries (no activity in 60 s) are evicted first,
then if necessary the oldest entry is force-evicted and counted as
varta_prom_connections_dropped_total{reason="ip_table_full"}.
Token comparison is constant-time
The exporter compares the presented and expected tokens via
varta_vlp::ct_eq — the same constant-time XOR-and-OR routine that
guards Poly1305 tag verification. This prevents byte-by-byte timing
oracles from leaking the prefix of the token to a remote scraper.
Bind-address recommendation
The bearer token is the authoritative authentication boundary. Loopback
bind (127.0.0.1:<port> or [::1]:<port>) behind a reverse proxy
remains the recommended posture for defense in depth, but is no longer
the only defense. The observer still emits a startup varta_warn!
whenever the bound address is non-loopback, so the exposure is visible
in audit logs.
Prometheus scrape config
The standard authorization: block injects the bearer token verbatim:
scrape_configs:
- job_name: 'varta'
static_configs:
- targets: ['varta-host:9100']
authorization:
type: Bearer
credentials_file: /etc/prometheus/varta-prom.token
The credentials_file should be the same content as
--prom-token-file on the observer; Prometheus reads it with the same
0600-or-stricter expectation.
Secret-file validation
Every file containing key material — --key-file, --accepted-key-file,
--master-key-file, and the new --prom-token-file — flows through
validate_secret_file in varta-watch/src/config.rs. The validator
enforces:
- The path is not a symlink (
symlink_metadata+is_symlink). - The path resolves to a regular file (not a directory, FIFO, block/char device, etc.).
- The mode is
0o600or stricter (mode & 0o077 == 0). - The file is owned by the observer’s UID (kernel-attested via
stat.uid, not derived from the env). - The file is opened with
O_NOFOLLOWto close the TOCTOU window between the metadata check and the read.
A failure on any of these aborts startup with a typed ConfigError
naming the failing constraint (insecure permissions ..., must not be a symlink, owned by uid X, expected uid Y, etc.).
Why environment-variable keys are gone
Earlier releases offered --key-env <NAME> as a key-source fallback.
That flag is removed. Passing it now returns
ConfigError::RemovedFlag with an inline migration hint pointing at
--key-file. The motivation:
- On Linux,
/proc/<pid>/environis readable by any process running under the same UID; a peer with a UDS connection to the observer (which already has UID-restricted access) can read the master key out of the observer’s own environment. - In containers,
docker inspect <container>exposes every environment variable to anyone with read access to the Docker socket — typically all members of thedockergroup, which is often a superset of the in-container UID. systemd-journaldcaptures process environment on demand for crash reports; an env-var key ends up in/var/log/journalindefinitely.
File-based keys avoid all three exposures and slot into the same ownership/permission model as TLS private keys, SSH host keys, and any other long-lived secret an operator already knows how to manage.
The Key type in varta_vlp::crypto also lost its Copy derive and
gained a Drop impl that volatile-zeros the secret bytes before the
allocation is returned to the stack, closing a small but real leak
surface in core dumps and ASLR-defeated speculative reads.
Shutdown grace and systemd
--shutdown-grace-ms (default 5000, minimum 100) bounds the time
Recovery::drop blocks waiting for outstanding recovery children to
exit after issuing SIGKILL during shutdown. Children that outlive the
grace are abandoned to PID 1 for reaping; the observer process exits
either way, so the bound on shutdown latency is deterministic.
In a systemd unit, TimeoutStopSec must be at least
shutdown_grace_ms + 2 s (roughly: grace + reap margin) to ensure
that systemd does not SIGKILL the observer mid-grace and leak an
unreaped recovery child:
[Service]
Environment=VARTA_SHUTDOWN_GRACE_MS=5000
ExecStart=/usr/local/bin/varta-watch --shutdown-grace-ms ${VARTA_SHUTDOWN_GRACE_MS} ...
TimeoutStopSec=7s
KillMode=mixed
KillMode=mixed is recommended: systemd sends SIGTERM to the main
observer process only; the observer then runs its own Drop sequence to
kill+reap any recovery children it had spawned. This is what the
shutdown-grace tunable is designed around.
Recovery command environment isolation
When --recovery-env KEY=VALUE is specified (repeatable), the recovery
child process runs with a sanitized environment:
- The child’s environment is cleared entirely.
PATHis set to/usr/bin:/bin(sufficient to locate common tools).- Only the explicitly-listed
KEY=VALUEpairs are exported.
Without --recovery-env, the child inherits the observer’s full
environment (backward compatible). This flag provides defense-in-depth
against environment-variable-based injection vectors (e.g. a malicious
LD_PRELOAD or IFS in the observer’s environment that could affect
/bin/sh -c behaviour).
Shell-mode recovery is gated by --i-accept-shell-risk at startup
(see the “Recovery command authentication boundary” section above).
When the flag is set, the observer still emits a single audit-trail
varta_warn! at startup so that the choice is captured in any SIEM /
syslog ingest alongside the other startup banners.
Template safety
The {pid} substitution in --recovery-cmd is safe regardless of the
authentication outcome. A u32 PID formatted as a decimal string
contains only the characters 0–9 and can never carry shell
metacharacters (;, |, &, $, `, etc.).
Metrics
| Metric | Type | Description |
|---|---|---|
varta_frame_auth_failures_total | counter | Incremented every time a frame’s claimed PID does not match the kernel-verified sender PID (Linux only). |
varta_beats_total{pid="..."} | counter | Per-PID total of accepted beats (only incremented after authentication passes). |
varta_prom_connections_dropped_total{reason="..."} | counter | /metrics connections accepted but closed before serving. Reasons: drain (serve budget exhausted), rate_limit (per-IP token bucket empty), ip_table_full (per-IP state map force-evicted). |
varta_prom_auth_failures_total | counter | /metrics scrapes that arrived without Authorization: Bearer <hex> or with a wrong token. Always emitted on every scrape (even at zero), so absent() alert rules stay green-on-green until the first incident. |
varta_recovery_refused_total{reason="..."} | counter | Recovery commands NOT spawned because of a structural safety gate. Only reason currently defined: unauthenticated_transport (stalled slot’s pinned origin was NetworkUnverified and the operator did not enable UDP-origin recovery). Emitted at zero on every scrape. |
varta_origin_conflict_total | counter | Beats dropped because the slot’s pinned transport origin disagreed with the beat’s origin (first-origin-wins). Non-zero values indicate either operator misconfiguration (same pid emitted from two transports) or an active spoofing attempt. |
Trust model summary
Process ── connect(2) to UDS ──┐
├─ [FAIL] Kernel blocks (Layer 1: --socket-mode 0600, wrong UID)
├─ [PASS] Layer 2: SO_PASSCRED → ucred.pid (Linux)
│ Layer 2: LOCAL_PEERTOKEN → audit_token.pid (macOS, best-effort)
│ Layer 2: LOCAL_CREDS → cmsgcred.pid (FreeBSD, DragonFly, NetBSD)
│ ├─ [PID MISMATCH] → Drop frame + bump counter
│ ├─ [UID MISMATCH] → Drop frame as IoError
│ └─ [PID MATCH + UID MATCH] →
↓
[SUCCESS] Observer trusts the PID → tracks,
surfaces stalls, triggers --recovery-cmd
with {pid} substitution.
The trust boundary is the kernel: a frame is only accepted if the kernel
attests that the sending process’s PID matches the one encoded in the
VLP frame and that the sending process runs under the observer’s UID.
On Linux this is enforced per-datagram via SO_PASSCRED; on macOS via
getsockopt(LOCAL_PEERTOKEN) with LOCAL_PEERPID/LOCAL_PEERCRED fallback;
on FreeBSD / DragonFly / NetBSD via LOCAL_CREDS + SCM_CREDS. Platforms
without kernel-level credential passing fall back to --socket-mode 0600.
Security limitations
No forward secrecy
The KDF derives per-agent and per-epoch keys from a single master key. An epoch key can decrypt frames from past epochs if the agent key is compromised. True forward secrecy requires bidirectional ephemeral key exchange (e.g. X25519), which is incompatible with the connectionless, one-way heartbeat model.
When the master key is rotated, all agents must be updated atomically.
The observer reads the master key once at startup from --master-key-file. To
rotate keys, restart the observer with the new master key file. SIGHUP-based
hot-reload is planned for a future release.
Panic-hook entropy policy (secure UDP)
install_panic_handler_secure_udp reads 8 bytes of cryptographic entropy at
install time (getrandom(2) on Linux, getentropy(3) on macOS/BSD, falling
back to /dev/urandom). The IV is pre-computed once so that no file I/O
occurs inside the panic handler itself (async-signal-safety).
Fail-closed default: if all entropy sources fail — common in chrooted
environments without a mounted /dev — the function returns
Err(PanicInstallError::EntropyUnavailable) and the hook is NOT registered.
This prevents a panic-time Critical frame from reusing a deterministic IV
under the same AEAD key, which would be a catastrophic nonce-reuse failure.
Degraded-entropy opt-in: use
install_panic_handler_secure_udp_accept_degraded_entropy to fall back to a
non-cryptographic IV derived from PID, TID, monotonic time, and a counter
(SipHash-2-4). This always succeeds but accepts nonce-reuse risk if the
process panics more than once. The verbose function name is intentional
structural enforcement matching the project’s --i-accept-<risk> convention.
Little-endian only
The VLP wire format uses little-endian integer encoding natively.
Protocol correctness depends on the host being little-endian (all tier-1
targets — x86_64 and aarch64 — satisfy this). Building on a big-endian
host is a compile error. See book/src/architecture/vlp-frame.md for design
rationale.
Panic-hook key lifetime — accepted residual
The secure-UDP panic handler (install_panic_handler_secure_udp,
install_panic_handler_secure_udp_accept_degraded_entropy) captures a Key
by move into a Box<dyn Fn> registered via std::panic::set_hook. The Box
is the single owner of the captured Key for the lifetime of the
process — Key is !Clone (see crates/varta-vlp/src/crypto/mod.rs), so
no duplicate of the secret bytes can exist anywhere else in the address
space.
The !Clone invariant pins the count of in-memory copies to one. The
remaining concern is the lifetime of that one copy on process exit:
- Normal hook replacement (
std::panic::take_hook): the prior Box is dropped, the capturedKey’sZeroizeOnDropfires, and the 32 secret bytes are wiped before the heap page is returned to the allocator. OK. panic = "unwind"profile, normal process exit: the panic-hook Box is leaked by the runtime —Dropis not called on registry-held objects at exit. The capturedKeybytes persist in heap memory until the kernel reclaims the page. Linux does not zero pages on reclaim (memory contents are reused; zero-on-allocation guarantees apply only to new allocations into the same process).panic = "abort"profile: the panic-hook closure never runs, butset_hookstill owns the Box — same residual as the normal-exit case. Additionally, noDropruns anywhere duringabort().
This residual is accepted: there is no async-signal-safe mechanism
that can reliably wipe a heap-resident secret at process exit. atexit
handlers do not run on abort(), are not async-signal-safe, and race the
panic hook firing. mlock / memfd_secret cannot prevent the kernel
from copying the page during scheduler context switches or core dumps.
The minimum-surface design is to keep the captured Key alive in a
single Box and treat the OS process boundary as the security boundary:
inspecting the memory of a live process requires ptrace or
/proc/<pid>/mem privileges, at which point all in-memory secrets in
any design are accessible.
Cross-references
- Safety profiles — compile-time feature gating for dangerous recovery paths; production-safe build verification recipe
- Observer liveness — defending against
varta-watchitself crashing or hanging - VLP transports — transport-level trust classification and
BeatOriginsemantics