architecture
Three compression axes
All three fire together to achieve 80%. Each is independently useful.
01INPUT COMPRESSION
1A Grep Before Read
Locate first, read targeted lines — never full-file
1B offset + limit
Cap reads by file size: 100–500 lines → max 80 lines
1C Structural Scan
find/grep map under 50 tokens; drill only what matters
1D Session Read-Cache
Track every read; never re-read the same file twice
1F Parallel Batching
All independent reads in one message — always
−30–40% input
02OUTPUT COMPRESSION
2A No Preamble
"I'll help you..." banned. First word is the answer.
2B No Trailing Summary
User can see the diff. Skip the recap.
2C Length Budget
Fact: 3 sentences. Code fix: +1 line. No more.
2D Format Compression
Prose → tables. Paragraphs → bullets.
2F Terse Confirmation
"Done." beats a 2-sentence summary every time.
−50–70% output
03CONTEXT HYGIENE
3A Memory-First
Check memory before any research. A hit costs ~0 tokens.
3B Progressive Disclosure
L1 answer default; expand only when asked.
3C Delegate + Compress
Agent explores; main loop synthesizes in 5 bullets.
3D Context Checkpoint
At 50% context fill: prune non-load-bearing items.
AUTO 60% → SLIM-COMPACT
Auto-activates at 60% context fill. No prompt needed.
−40–60% context
modes
Three operating modes
SLIM-NANO
1-sentence max · answer from knowledge only · no tool calls
/token-engine nanoSLIM-COMPACTDEFAULT
All axes enforced · ≤5 sentences · grep-first mandatory
/token-engineSLIM-STRUCTURED
Axes 1+3 enforced · code output unrestricted · complex features
/token-engine structuredenforcement
8 banned anti-patterns
SLIM detects and eliminates these. Seeing them means the engine is off.
✕ 01The Announcement — "I'll now look at your code to understand…"
✕ 02The Apology Prefix — "I apologize for the confusion, but…"
✕ 03The Full-File Read — opening 400 lines to find a 5-line function
✕ 04The Re-Read — reading the same file you read 3 messages ago
✕ 05The Recap — restating what you just did in the last paragraph
✕ 06The Option Buffet — offering 4 approaches when you have one clear best
✕ 07The Qualifier Chain — "Generally speaking, in most cases, often…"
✕ 08The Tool Parade — sequential calls that could run in parallel
the SLIM invariant
"A compressed answer must be observationally equivalent to the full answer. If compressing would lose a fact the user needs, expand that fact. Compress everything around it."