You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can also install Bitsandbytes through `pip install bitsandbytes Acceleration`, and simply add configuration to perform int8 or int4 inference (if you need to further compress the temporary memory applied at runtime, it is recommended to install FlashAttention):
256
+
257
+
```python
258
+
259
+
import sys
260
+
import torch
261
+
from hf_mini.utils import input_wrapper
262
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
0 commit comments