Saved State on Disk is very large

**Is your feature request related to a problem? Please describe.**
When using `Llama.save_state`, the size on disk is very large, partially because `Llama.scores` is always of size `(n_ctx, n_vocab)`, even when the number of tokens actually used is much less than this.

**Describe the solution you'd like**
* Only store the number of tokens used by the model (`self.n_tokens`).
* If possible, re-use logits from the `llama_context` struct instead of serializing `scores`.
  * These didn't match when I loaded both and compared, though didn't have logit processor callbacks.

**Describe alternatives you've considered**
* Somehow compressing `LlamaState`
* Re-using logits from C++ state saving functions and eliminating `scores` on Python model
  * Looks like this is needed for sampling logic

Partially fixed by #1296 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saved State on Disk is very large #1303

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Saved State on Disk is very large #1303

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions