Data contract
TokenShift has no behaviour-changing control plane. Customers configure the binary at install time via the local enrollment manifest deployed by their MDM; PointFive only ingests data and controls which release version is offered to a given client. See Installation & distribution for the install model.
Every record is a typed JSON document, hybrid-encrypted on the device against PointFive’s public key (pinned in the binary at build time). Anything not enumerated below is dropped at the wire boundary, fail-closed.
Signal index
Section titled “Signal index”TokenShift emits exactly five typed records — never anything else.
| # | Signal | Min tier | Fires when |
|---|---|---|---|
| 1 | tool_invocation | 1 | The agent makes a tool call. |
| 2 | recovery_retrieved | 1 | A developer pulls back the original output that was compressed away. |
| 3 | session_summary | 1 | An agent session ends. |
| 4 | compression_sample | 4 | Triple-gated, off by default — sampled redacted before/after pair for rule tuning. |
| 5 | client_state | 1 | Periodically (≤ once / 24h per machine) — version + install metadata. |
tool_invocation — one per tool call, regardless of whether TokenShift compressed it. Carries tool kind, model, activity bucket, project (raw directory name by default; can be hashed), a stable command_hash, original token count, and an outcome enum. At Tier 2+ also carries redacted command_shape and rewritten_command_shape. No raw command bytes ship at any tier.
recovery_retrieved — fires when a developer pulls back the original output via the recovery command. Carries only a join key, the recovering session, and an age in seconds. No content. No file paths.
session_summary — one best-effort record per session at session end. Carries end_reason, turn count, peak context-window usage, and small denormalized aggregates (tokens in, tokens saved, per-outcome counts).
compression_sample — a redacted before/after pair for PointFive rule tuning. The only signal that carries actual output content — and the most tightly gated record in the contract. Off by default. Three independent gates must all be true: tenant has enabled Tier 4, the matched rule is on the per-rule sample allowlist, and a random draw lands inside the configured sampling percentage. Carries no user_id (defense in depth). Hard-capped retention: 7 days.
client_state — periodic meta-telemetry about the binary itself: version, OS, architecture, install method, enrollment age, and whether the startup self-check passed. Rate-limited to at most once per 24 hours per client_id. No PII, no per-tool-call detail. This is how PointFive and admins see fleet hygiene.
A single integer set by user choice and capped from above by install-time config. Effective tier on the wire is min(user_choice, admin_cap). Each higher tier strictly adds.
| Tier | What leaves the laptop |
|---|---|
| 0 | Nothing. Local stats only. |
| 1 | tool_invocation (no shape fields), recovery_retrieved, session_summary, client_state. |
| 2 | Tier 1 + command_shape and rewritten_command_shape. |
| 4 | Tier 2 + compression_sample. Triple-gated; off by default even when permitted. |
Default tier when enrolled: 2 (patterns). tokenshift enroll --tenant-id=<id> caps and emits at Tier 2 unless overridden with --tier-cap or local telemetry config.
Install-time configuration
Section titled “Install-time configuration”Every knob a tenant admin sets when deploying TokenShift via MDM. There is no runtime config push — changes require regenerating the manifest from the PointFive portal and redeploying via MDM. No user-level overrides.
| Control | Default | Effect |
|---|---|---|
| Tier cap | Tier 2 | Maximum tier any client in the tenant can emit. |
| Project identifier mode | Raw directory name | Or hashed directory name, or hashed git remote URL. |
| Recovery cache enabled | On | Off disables the local short-lived cache. |
| Sample allowlist | Empty | Per-rule list of which rules may be sampled at Tier 4. Empty = no samples ever emitted. |
| Sample rate | 0% | Per-rule sampling probability used at Tier 4. |
| Retention extensions | Defaults below | Negotiated per tenant for four of the five signals. compression_sample is hard-capped regardless. |
Identifiers
Section titled “Identifiers”-
tenant_id— customer organization, from the enrollment manifest. -
client_id— random UUID created on first run and persisted under~/.tokenshift/. Reused across runs, survives upgrades and re-enrollment. One per OS user account. -
user_id— pseudonymous user identifier derived locally as a one-way hash of the developer’s email under a per-tenant key:user_id_hmac_key (random secret, generated at enrollment, lives in manifest)│▼email ─── one-way hash ──▶ user_id (ships on the wire)▲ email is discarded immediately;│ never persisted, never sent.git config --global user.email(fallback: $TOKENSHIFT_USER_EMAIL; otherwise user_id = null)Same email + same tenant key → identical
user_idon every machine that developer uses, enabling cross-machine joins.user_idis deliberately absent fromcompression_sampleandclient_state. -
project— raw directory name by default; install-time config may switch to hashed directory name or hashed git remote URL. -
session_id,invocation_id,recovery_id— opaque UUIDs used as join keys between signals.
Raw email, hostnames, OS usernames, IPs, MACs, and raw file paths are never captured and have no code path that would attach them.
The redactor
Section titled “The redactor”Shape fields (command_shape, rewritten_command_shape, and the two halves of compression_sample) pass through a redactor that strips known secret formats and unknown command-argument values before the record ships. A redactor warning drops the whole record — never partial.
Fail-closed posture
Section titled “Fail-closed posture”Anything that doesn’t go right ends in a drop, never a raw emit.
| Trigger | Action |
|---|---|
| Unknown attribute on a record | Stripped before send. |
| Unknown signal name | Record dropped. |
| Tier below the signal’s minimum | Not emitted. |
| Redactor warning on a shape field | Whole record dropped. |
| Sample allowlist or sample-rate miss | Sample not emitted. |
| Local outbox overflow | Oldest records dropped first; developer’s command never blocked. |
| Network unavailable | Records queued locally until network returns. |
| Enrollment manifest absent or unparseable | Binary refuses to send and surfaces the error. |
| Encryption fails | Record dropped. |
Retention
Section titled “Retention”| Signal | Default retention | Tenant-extensible? |
|---|---|---|
tool_invocation | 90 days | Yes |
recovery_retrieved | 90 days | Yes |
session_summary | 90 days | Yes |
client_state | 90 days | Yes |
compression_sample | 7 days | No — hard cap. |
The 7-day hard cap on compression_sample is non-negotiable. Samples are the only signal that carries redacted output content; the rule-tuning use case doesn’t benefit from longer retention, and every extra day is unnecessary blast radius.
What the contract explicitly does not send
Section titled “What the contract explicitly does not send”The wire never carries any of the following. None are on the allowlist; no code path attaches them. An automated test asserts only allowlisted keys appear on the wire — a change that adds any of these fails the build before merge.
- Bash stdout / stderr content, in any form (including hashes).
- Tool inputs: file contents, web content, MCP tool-call results.
- Agent transcripts, prompts, messages, message hashes.
- Environment variables — names or values.
- Raw file paths.
- Hostnames, OS usernames, real names, IP addresses, MAC addresses.
- Free-text feedback / notes / comments. None exist by design.
- Rule contents or rule diffs — only the rule’s identifier.
- Stack traces, panic messages, crash report bodies.
- Branch names, commit messages, git refs.
- Raw email. Read once locally to compute
user_id, then discarded.
Glossary
Section titled “Glossary”- Allowlist — Explicit list of what is permitted; everything else is denied.
- Enrollment manifest — JSON file generated by the PointFive portal and deployed via MDM. Carries
tenant_id,bearer_token,user_id_hmac_key, and install-time config. - Fail-closed — On any error, drop the data rather than emit it raw.
- MDM — Mobile Device Management / endpoint management system.
- Signal — A named, schema’d record type on the wire (one of the five above).
- Tier — Level of data sharing (0/1/2/4). Capped from above by install-time config.