←Back to all posts

April 20, 2026•1 min read•from Machine Learning

Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]

I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions:

Cartridges: https://github.com/shreyansh26/cartridges
STILL: https://github.com/shreyansh26/STILL-Towards-Infinite-Context-Windows

The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries.

Broadly:

cartridges reproduces corpus-specific compressed KV caches
STILL reproduces reusable neural KV-cache compaction
the STILL repo also compares against full-context inference, truncation, and cartridges

Here are the original papers / blogs -

cartridges - https://arxiv.org/abs/2506.06266
STILL - https://www.baseten.co/research/towards-infinite-context-windows-neural-kv-cache-compaction/

Would be useful if you’re interested in long-context inference, memory compression, or practical systems tradeoffs around KV-cache reuse.

submitted by /u/shreyansh26
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article→

Tagged with

#no-code spreadsheet solutions

#rows.com

#natural language processing for spreadsheets

#generative AI for data analysis

#Excel alternatives for data analysis

#financial modeling with spreadsheets

#long-context inference

#KV-cache

#compaction

#open-source

#Cartridges

#STILL

#memory compression

#single-GPU

#compressed KV caches

#reproductions

#reusable neural

#full-context inference

#corpus-specific

#truncation