Conversation
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
|
Thanks a lot for your PR, the results look very promising!
Feel free to discuss these proposed changes here |
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
cd1e722 to
2799cab
Compare
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
|
Following your guidelines, I've moved the modifications in I ran some tests and confirmed that the current version maintains performance. (The prefilling and compression time has slightly increased.) Please review and let me know if you have any further suggestions! |
|
Thanks a lot for the quick updates! |
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for the extensive refactoring of the press!
I've tested the press on ruler4k benchmark, and the results look very nice!
We will add your press to our benchmark, once it is merged.
For the press to be merged, I kindly ask to
- Add a warning in the press post init method, informing the user that the press uses multiple forward passes.
- The press implementation will benefit from being refactored in several places. I left some comments in the code; there are also other places where refactoring can help. Please also add some more comments/docstrings that help users.
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
|
I appreciate your detailed feedback! Your comments improve the clarity and interpretability of the code. I've incorporated all of your suggestions, leaving comments on some specific points. One thing I'd like to mention is that the current This incompatibility raises an error in |
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for the refactoring, code looks good!
One thing I'd like to mention is that the current KVzipPress implementation is not compatible with ComposedPress. This is due to that KVzipPress follows a slightly different logic and adopts fake compression as AdaKV, which is also incompatible with ComposedPress.
Ok, thanks for the notice. For now, you can add an if statement in the test, skipping that combination. I'll merge the PR once test pass.
Signed-off-by: Janghyun1230 <kimjanghyun1230@gmail.com>
|
Thank you for the review! I've added a statement in the |
PR description
Hi! I've tried to add KVzip, a recent work on query-agnostic KV cache eviction.
KVzip achieves near-lossless compression at eviction ratios of up to 80% on RULER-4k with LLaMA3.1-8B (evaluated using the evaluation.py script from this repository). I've uploaded the result json files on Drive.
KVzip introduces compression overhead (2× prefilling time, with negligible memory overhead). The original KVzip repository also provides a version without compression overhead at the cost of performance, using DuoAttention-style head-level eviction.
I tried to make minimal changes to this repository, but I had to make some additions in pipeline.py. I follow the fake compression strategy from AdaKV in this repository, whereas the original KVzip repository provides optimized code that improves decoding speed by 2×.
Please review and let me know if there are any issues or if everything looks fine. Truly appreciate your great repository!
Checklist
make test)make style, on errors try fix withmake format)git commit -smypress_press.pyis in thepressesdirectoryMyPressis in__init__.pyREADME.mdis updated with a 1 liner about the new press in the Available presses sectiondefault_presseslist intests/default_presses.py