1 min readfrom Machine Learning

Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]

I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions:

The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries.

Broadly:

  • cartridges reproduces corpus-specific compressed KV caches
  • STILL reproduces reusable neural KV-cache compaction
  • the STILL repo also compares against full-context inference, truncation, and cartridges

Here are the original papers / blogs -

Would be useful if you’re interested in long-context inference, memory compression, or practical systems tradeoffs around KV-cache reuse.

submitted by /u/shreyansh26
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#no-code spreadsheet solutions
#rows.com
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#long-context inference
#KV-cache
#compaction
#open-source
#Cartridges
#STILL
#memory compression
#single-GPU
#compressed KV caches
#reproductions
#reusable neural
#full-context inference
#corpus-specific
#truncation