Every AI company is bleeding tokens. Not metaphorically, literally. Context windows are the new compute budget, and most teams have no idea how fast they're burning through them. A RAG pipeline that retrieves 20 documents? 30,000 tokens minimum. An agentic loop with tool calls, prior conversation, and system prompt? You're at 50,000 before the model has said a word. The bill arrives at the end of the month and finance emails you asking what "Anthropic API" is.
Compresr (YC W2026) thinks it has the fix. Four EPFL researchers, including a CEO who wrote his PhD specifically on LLM context compression, built an API that compresses what goes into the context window without losing what actually matters. The pitch is clean: same answers, fewer tokens, lower latency, smaller bills. Drop in their SDK or stand up their open-source proxy, and the rest just works.
