Skip to content

Data contract

TokenShift has no behaviour-changing control plane. Customers configure the binary at install time via the local enrollment manifest deployed by their MDM; PointFive only ingests data and controls which release version is offered to a given client. See Installation & distribution for the install model.

Every record is a typed JSON document, hybrid-encrypted on the device against PointFive’s public key (pinned in the binary at build time). Anything not enumerated below is dropped at the wire boundary, fail-closed.


TokenShift emits exactly five typed records — never anything else.

#SignalMin tierFires when
1tool_invocation1The agent makes a tool call.
2recovery_retrieved1A developer pulls back the original output that was compressed away.
3session_summary1An agent session ends.
4compression_sample4Triple-gated, off by default — sampled redacted before/after pair for rule tuning.
5client_state1Periodically (≤ once / 24h per machine) — version + install metadata.

tool_invocation — one per tool call, regardless of whether TokenShift compressed it. Carries tool kind, model, activity bucket, project (raw directory name by default; can be hashed), a stable command_hash, original token count, and an outcome enum. At Tier 2+ also carries redacted command_shape and rewritten_command_shape. No raw command bytes ship at any tier.

recovery_retrieved — fires when a developer pulls back the original output via the recovery command. Carries only a join key, the recovering session, and an age in seconds. No content. No file paths.

session_summary — one best-effort record per session at session end. Carries end_reason, turn count, peak context-window usage, and small denormalized aggregates (tokens in, tokens saved, per-outcome counts).

compression_sample — a redacted before/after pair for PointFive rule tuning. The only signal that carries actual output content — and the most tightly gated record in the contract. Off by default. Three independent gates must all be true: tenant has enabled Tier 4, the matched rule is on the per-rule sample allowlist, and a random draw lands inside the configured sampling percentage. Carries no user_id (defense in depth). Hard-capped retention: 7 days.

client_state — periodic meta-telemetry about the binary itself: version, OS, architecture, install method, enrollment age, and whether the startup self-check passed. Rate-limited to at most once per 24 hours per client_id. No PII, no per-tool-call detail. This is how PointFive and admins see fleet hygiene.


A single integer set by user choice and capped from above by install-time config. Effective tier on the wire is min(user_choice, admin_cap). Each higher tier strictly adds.

TierWhat leaves the laptop
0Nothing. Local stats only.
1tool_invocation (no shape fields), recovery_retrieved, session_summary, client_state.
2Tier 1 + command_shape and rewritten_command_shape.
4Tier 2 + compression_sample. Triple-gated; off by default even when permitted.

Default tier when enrolled: 2 (patterns). tokenshift enroll --tenant-id=<id> caps and emits at Tier 2 unless overridden with --tier-cap or local telemetry config.


Every knob a tenant admin sets when deploying TokenShift via MDM. There is no runtime config push — changes require regenerating the manifest from the PointFive portal and redeploying via MDM. No user-level overrides.

ControlDefaultEffect
Tier capTier 2Maximum tier any client in the tenant can emit.
Project identifier modeRaw directory nameOr hashed directory name, or hashed git remote URL.
Recovery cache enabledOnOff disables the local short-lived cache.
Sample allowlistEmptyPer-rule list of which rules may be sampled at Tier 4. Empty = no samples ever emitted.
Sample rate0%Per-rule sampling probability used at Tier 4.
Retention extensionsDefaults belowNegotiated per tenant for four of the five signals. compression_sample is hard-capped regardless.

  • tenant_id — customer organization, from the enrollment manifest.

  • client_id — random UUID created on first run and persisted under ~/.tokenshift/. Reused across runs, survives upgrades and re-enrollment. One per OS user account.

  • user_id — pseudonymous user identifier derived locally as a one-way hash of the developer’s email under a per-tenant key:

    user_id_hmac_key (random secret, generated at enrollment, lives in manifest)
    email ─── one-way hash ──▶ user_id (ships on the wire)
    ▲ email is discarded immediately;
    │ never persisted, never sent.
    git config --global user.email
    (fallback: $TOKENSHIFT_USER_EMAIL; otherwise user_id = null)

    Same email + same tenant key → identical user_id on every machine that developer uses, enabling cross-machine joins. user_id is deliberately absent from compression_sample and client_state.

  • project — raw directory name by default; install-time config may switch to hashed directory name or hashed git remote URL.

  • session_id, invocation_id, recovery_id — opaque UUIDs used as join keys between signals.

Raw email, hostnames, OS usernames, IPs, MACs, and raw file paths are never captured and have no code path that would attach them.


Shape fields (command_shape, rewritten_command_shape, and the two halves of compression_sample) pass through a redactor that strips known secret formats and unknown command-argument values before the record ships. A redactor warning drops the whole record — never partial.


Anything that doesn’t go right ends in a drop, never a raw emit.

TriggerAction
Unknown attribute on a recordStripped before send.
Unknown signal nameRecord dropped.
Tier below the signal’s minimumNot emitted.
Redactor warning on a shape fieldWhole record dropped.
Sample allowlist or sample-rate missSample not emitted.
Local outbox overflowOldest records dropped first; developer’s command never blocked.
Network unavailableRecords queued locally until network returns.
Enrollment manifest absent or unparseableBinary refuses to send and surfaces the error.
Encryption failsRecord dropped.

SignalDefault retentionTenant-extensible?
tool_invocation90 daysYes
recovery_retrieved90 daysYes
session_summary90 daysYes
client_state90 daysYes
compression_sample7 daysNo — hard cap.

The 7-day hard cap on compression_sample is non-negotiable. Samples are the only signal that carries redacted output content; the rule-tuning use case doesn’t benefit from longer retention, and every extra day is unnecessary blast radius.


What the contract explicitly does not send

Section titled “What the contract explicitly does not send”

The wire never carries any of the following. None are on the allowlist; no code path attaches them. An automated test asserts only allowlisted keys appear on the wire — a change that adds any of these fails the build before merge.

  1. Bash stdout / stderr content, in any form (including hashes).
  2. Tool inputs: file contents, web content, MCP tool-call results.
  3. Agent transcripts, prompts, messages, message hashes.
  4. Environment variables — names or values.
  5. Raw file paths.
  6. Hostnames, OS usernames, real names, IP addresses, MAC addresses.
  7. Free-text feedback / notes / comments. None exist by design.
  8. Rule contents or rule diffs — only the rule’s identifier.
  9. Stack traces, panic messages, crash report bodies.
  10. Branch names, commit messages, git refs.
  11. Raw email. Read once locally to compute user_id, then discarded.

  • Allowlist — Explicit list of what is permitted; everything else is denied.
  • Enrollment manifest — JSON file generated by the PointFive portal and deployed via MDM. Carries tenant_id, bearer_token, user_id_hmac_key, and install-time config.
  • Fail-closed — On any error, drop the data rather than emit it raw.
  • MDM — Mobile Device Management / endpoint management system.
  • Signal — A named, schema’d record type on the wire (one of the five above).
  • Tier — Level of data sharing (0/1/2/4). Capped from above by install-time config.