evaluation-calibration-core
Source: GitHub repo (ARCHITECTURE.md). Evidence artifacts from PacketV2 traces.
Data flow
PacketV2 traces -> read_packets() -> compute_metrics() -> build_report() -> Report
Metrics computation
From PacketV2 traces: action distribution, guard trigger rates, latency percentiles, invariant pass rates.
Invariant checks
- Contract closure: Proposal.action and FinalDecision.action in Action enum
- Confidence clamp: Proposal.confidence in [0,1]
- Fail-closed: If mismatch has deny flags ⇒ allowed must be False
- Packet version: PacketV2.schema_version present
Contracts
- Input: PacketV2 traces (e.g. JSONL)
- Output: Metrics dict, Report (JSON + Markdown)
- Uses
decision_schema.packet_v2.PacketV2
Components
1. Packet Reader (eval_calibration_core/io/packet_reader.py)
Class: PacketReader — Reads PacketV2 from JSONL; validates schema compatibility.
2. Metrics Computation (eval_calibration_core/metrics/compute.py)
Function: compute_metrics(packets: Iterable[PacketV2]) -> dict — Action distribution, guard trigger rates, latency percentiles, invariant verification.
3. Report Generation (eval_calibration_core/report.py)
Function: build_report(packets: Iterable[PacketV2]) -> Report — JSON + Markdown report with metrics and invariant checks.
Safety invariants
- Fail-closed: On errors, empty metrics or safe defaults
- Deterministic: Same inputs → same outputs
- Schema validation: Reject incompatible PacketV2 versions
Import
from eval_calibration_core.report.builder import build_report