Releases: NVIDIA/kvpress
Releases · NVIDIA/kvpress
v0.5.0
v0.4.3
v0.4.2
v0.4.1
✨ New Features
KVzapPress- a fast approximation of KVzip for prefill and decoding compression (https://arxiv.org/abs/2601.07891). Comes with KVzap training and evaluation utilities (#171)ThresholdPress- adaptive compression using score thresholds instead of fixed compression ratios (#171)
📈 Improvements
- Update
KVzipPresswith improvements and evaluation registry support (#172) - Rename
compress-questiontoquery-awarein evaluation config (#168) - Refactor
ObservedAttentionPressfor cleaner implementation (#166) - Add leaderboard generation script (#171)
🐛 Bug Fixes
- Fix empty context handling in pipeline (#165)
v0.4.0
🚀 Release v0.4.0
✨ New Features
- CURPress - Value-Guided KV Compression for LLMs via Approximated CUR Decomposition (#150)
- CompactorPress - Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores (#143)
- Decoding Press Functionality - Support for KV cache compression during the decoding phase (#139)
- AIME25 & Math500 Benchmarks - New evaluation datasets for mathematical reasoning tasks (#142)
post_init_from_modelHook - Add model-specific initialization support in BasePress (#163)
📈 Improvements
- Moved tests to GPU for faster CI execution (#132)
- Improved needle-in-haystack test coverage (#133)
- Updated README and documentation for clarity (#162)
- Enhanced docstrings throughout the codebase (#159)
- Updated decoding notebook with latest examples (#156)
- Code cleanup: moved utilities, cleaned imports (#160)
🐛 Bug Fixes
- Fixed LongBench-v2 benchmark evaluation (#161)
- Fixed kvzip press access to
past_key_values - Fixed ComposedPress behavior (#148)
- Fixed import issues (#144)
📦 Installation
pip install kvpress==0.4.0📚 Full Changelog
v0.3.0
What's Changed
- refactor: optimized covariance transform in ExpectedAttentionPress by @neuralsorcerer in #111
- fix ruler integration tests by @maxjeblick in #113
- fix typo by @neuralsorcerer in #116
- Add needle in haystack test by @alessiodevoto in #121
- fix masked_key_indices by @maxjeblick in #122
- Add copy-pr-bot settings by @maxjeblick in #123
- Add Github runner by @maxjeblick in #124
- evaluation README.md command error and logging error #127 by @wzp-0815 in #128
- add gpu runner by @maxjeblick in #125
- Upgrade expected attention with support for more models by @alessiodevoto in #126
- Add Expected Attention with Stats by @alessiodevoto in #120
⚠️ Transformers compatibility by @maxjeblick in #115 ---> this is a breaking change (the KV caching machinery changed in HF transformers and we adjusted KVPress accordingly)
New Contributors
- @neuralsorcerer made their first contribution in #111
- @wzp-0815 made their first contribution in #128
Full Changelog: v0.2.10...v0.3.0
v0.2.10
v0.2.9
What's Changed
- Refactor evaluation by @alessiodevoto in #96
- Fix QFilters and DuotAttention when used with wrapper presses by @alessiodevoto in #97
- Add HuggingFace leaderboard by @alessiodevoto in #98
- Fix links in benchmarks directory by @alessiodevoto in #101
- Add KVzipPress by @Janghyun1230 in #93
- Test head-wise compression by @alessiodevoto in #103
- run backbone model only for prefill by @giulio98 in #100
- Transformers compatibility + evaluation by @alessiodevoto in #105
Full Changelog: v0.2.8...v0.2.9
v0.2.8
What's Changed
🐛 Bug Fixes
- Fix failing tests by @maxjeblick in #94
Reverts changes toCriticalKVPressperformed in #90 that caused the press to initialize incorrectly. The PR also fixes some test logic.
Full Changelog: v0.2.7...v0.2.8
v0.2.7
What's Changed
🐛 Bug Fixes
- Fix FinchPress for Qwen models family by @alessiodevoto in #82
Resolved compatibility issues with Qwen model architecture in FinchPress compression
✨ New Features
- Add KeyDiffPress and BlockPress by @figuremout in #86
Introduces new compression methods based on key difference analysis - Fix for Qwen with Yarn by @giulio98 in #85
Enable Yarn scaling in FinchPress and KeyRerotationPress
📚 Documentation & Maintenance
- Improve documentation by @maxjeblick in #90
Add docstrings to all presses, with their corresponding parameters and paper reference. - Add @alessiodevoto's to authors by @maxjeblick in #92 🚀
Full Changelog: v0.2.6...v0.2.7