-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Saved State on Disk is very large #1303
Copy link
Copy link
Closed
Description
Is your feature request related to a problem? Please describe.
When using Llama.save_state, the size on disk is very large, partially because Llama.scores is always of size (n_ctx, n_vocab), even when the number of tokens actually used is much less than this.
Describe the solution you'd like
- Only store the number of tokens used by the model (
self.n_tokens). - If possible, re-use logits from the
llama_contextstruct instead of serializingscores.- These didn't match when I loaded both and compared, though didn't have logit processor callbacks.
Describe alternatives you've considered
- Somehow compressing
LlamaState - Re-using logits from C++ state saving functions and eliminating
scoreson Python model- Looks like this is needed for sampling logic
Partially fixed by #1296
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels