Skip to content

Data classification (Phase 5 S19)

Catalog of every event field Agentic SpendGuard records, with its data class. Operators consult this when configuring tenant_data_policy.export_redaction_field_paths and when running compliance reviews.

ClassDescriptionDefault redaction at exportDefault retention
metadataIdentity, structural, never user contentNEVER redactedfull audit window
pricingPricing freeze tuple, model id, token countsNEVER redacted (billing evidence)full audit window
decisionmatched_rule_ids, reason_codes, contract versionNEVER redactedfull audit window
promptUser prompt or LLM input/output textredacted by default at exporttenant.prompt_retention_days
provider_rawVerbatim provider API responseredacted by default at exporttenant.provider_raw_retention_days
piiUser identity (email, name) when in payloadredacted by default at exporttenant.prompt_retention_days
provider_secretWebhook signing keys, API tokens (in error logs only)NEVER stored in eventsn/a
Field pathClassNotes
audit_outbox_idmetadataUUID v7
tenant_idmetadata
decision_idmetadata
audit_decision_event_idmetadata
event_typemetadata
cloudevent_payload->>'specversion'metadata
cloudevent_payload->>'type'metadata
cloudevent_payload->>'source'metadata
cloudevent_payload->>'id'metadata
cloudevent_payload->>'time'metadata
cloudevent_payload->'data'promptRaw decision/outcome content; may include user prompt
cloudevent_payload->'data'->>'snapshot_hash'metadatahex hash, not content
cloudevent_payload->>'producer_id'metadata
cloudevent_payload->>'producer_sequence'metadata
cloudevent_payload_signaturemetadataEd25519 signature bytes
signing_key_idmetadata
producer_sequencemetadata
idempotency_keymetadataHash of business intent; not content
recorded_atmetadata

The S19 redaction sweeper, when prompt_retention_days has elapsed, sets cloudevent_payload->'data' to a marker JSONB ({"_redacted": true, "redacted_at": "..."}) and copies the SHA-256 hash of the original bytes to a separate cloudevent_payload->'_data_sha256_hex' field. The audit chain hash stays valid because the producer_signature was computed over the ORIGINAL bytes; verifiers re-derive canonical bytes from the redacted form’s hash + the remaining metadata.

Field pathClassNotes
record_idmetadata
providermetadata
tenant_idmetadata
provider_event_idmetadata
model_idmetadata
prompt_tokenspricingToken count, not text
completion_tokenspricing
cost_micros_usdpricing
raw_payloadprovider_rawVerbatim provider API response — may include text

S19 redaction sweeper for provider_raw_retention_days: clears raw_payload (sets to {"_redacted": true, ...})

  • retains only the structured fields above for billing forensics.

All fields class metadata or pricing — no prompt / provider raw content lands in these tables. NEVER redacted or deleted (S19 invariant; trigger enforces).

Field pathClassNotes
decision_contextmixedContains pricing tuple (metadata) + matched rules (decision); may include data echo (prompt)
requested_effectmetadataProjected claims; no prompt content
resolution_reasonmetadataOperator-supplied text (typically business reason)

If decision_context.data echoes prompt bytes, the export redaction policy applies the same way as for audit_outbox.cloudevent_payload->'data'.

Set tenant prompt retention to 0 (store hashes only)

Section titled “Set tenant prompt retention to 0 (store hashes only)”
UPDATE tenant_data_policy
SET prompt_retention_days = 0,
updated_by = 'me@example.com'
WHERE tenant_id = '...';

The retention sweeper (S19-followup) on its next pass finds any audit rows whose cloudevent_payload->'data' field is non-empty for this tenant and redacts them in place. New events from that tenant get redacted at write time (application-level enforcement).

UPDATE tenant_data_policy
SET tombstoned = TRUE,
tombstoned_at = clock_timestamp(),
tombstoned_by = 'me@example.com',
tombstoned_reason = 'customer offboarded'
WHERE tenant_id = '...';

The trigger enforces tombstone is one-way (cannot revert). Tombstoned tenant’s audit chain stays queryable.

SELECT sweep_kind, count(*), sum(rows_redacted)
FROM retention_sweeper_log
WHERE started_at > now() - interval '30 days'
GROUP BY 1;

The DB-layer triggers in migration 0028 reject DELETE on:

  • audit_outbox
  • audit_outbox_global_keys
  • ledger_transactions
  • ledger_entries

The retention sweeper UPDATEs (redacts in place) but never DELETEs. Even SUPERUSER would have to disable triggers explicitly to remove rows — that action would be visible in pg_audit logs.

  1. Retention sweeper service not yet shipped. The schema is in place; the background worker that scans audit_outbox + provider_usage_records + applies the redaction is the next chunk.
  2. Application-level write-time redaction (when prompt_retention_days = 0) needs sidecar + webhook_receiver code paths to consult tenant_data_policy before writing the data field.
  3. Export endpoint redaction (S9) needs to consult tenant_data_policy.export_redaction_field_paths and strip those JSONB paths before the JSONL line goes out.
  4. Application-level tombstone enforcement — sidecar / webhook_receiver / control_plane MUST check tenant_data_policy.tombstoned and reject writes for that tenant. Defense in depth: existing rows stay queryable.