Skip to content

Upgrade llama.cpp to most recent release (b1645) so it supports mixtral#33

Merged
kherud merged 12 commits intokherud:masterfrom
cestella:upgrade_llama_cpp_b1645
Dec 19, 2023
Merged

Upgrade llama.cpp to most recent release (b1645) so it supports mixtral#33
kherud merged 12 commits intokherud:masterfrom
cestella:upgrade_llama_cpp_b1645

Conversation

@cestella
Copy link
Contributor

@cestella cestella commented Dec 17, 2023

This PR does the following:

  • Upgrades llama.cpp to the most recent release of llama.cpp: b1645. This enables the library to load mixtral models
  • Slightly adjusts the testing
    • Moves the only test to operate as an integration test in maven
    • Removes the hard coded model expectation and downloads the model if it doesn't exist when run via maven

After this PR, you can run the integration tests via maven like so:
mvn verify -Dmodel.home=$HOME/llm/models
Note:

  1. $HOME/llm/models is the model home directory. This is where the integration test model will be stored.
  2. The assumption is only that the directory exist (otherwise it errors out appropriately). If the model (a version of mistral 7b) doesn't exist in the directory it will download it

Examples

In addition to refactoring the unit test, I reran the examples. I modified the examples to be able to be run via maven:

# Run the MainExample with the codellama model.  Note: the model must exist.
mvn exec:java -Dexec.mainClass="examples.MainExample" -Dmodel.home="/Users/cstella/llm/models" -Dmodel.name="codellama-13b.Q5_K_M.gguf"

examples.GrammarExample

mvn exec:java -Dexec.mainClass="examples.GrammarExample" -Dmodel.home="/Users/cstella/llm/models" -Dmodel.name="codellama-13b.Q5_K_M.gguf"
...
root ::= root_4
root_1 ::= expr [=] term [<U+000A>]
expr ::= term expr_6
term ::= [0-9]
root_4 ::= root_1 root_4 | root_1
expr_5 ::= [-+*/] term
expr_6 ::= expr_5 expr_6 |
4-3-1-2-3-4-1-1-1-1-1-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-...

This looks roughly like what I see if I run the same example from main:

4-3-1-2-3-4-3-3-4-5-6-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3 ...

examples.InfillExample

With the new code I see:

def remove_non_ascii(s: str) -> str:
    """ Removes non-ASCII characters from a string """
    # Remove all non-ascii characters.
    result = b''.join(
        c for c in s if 0x20 <= ord(c) < 0x7F).decode() <EOT>
    return result

With master I see:

def remove_non_ascii(s: str) -> str:
    """ remove non-ASCII characters from a string """
    try:
        result = s.encode("latin1", "replace").decode()
    except AttributeError:
        result = str(s).encode("latin1", "replace").decode() <EOT>
    return result

examples.MainExample

I made a couple of modifications to both the old version and the new version to compare:

  • I changed the AntiPrompt to User: so it actually completes the statement
  • I have the system prompt expanded to include a greeting

With this PR I see:

User: Hello Llama
Llama: Hello.  How may I help you today?
User: Hi can you write a python function that computes the nth fibonacci number?
Llama:
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

User:

With master I see:

This is a conversation between User and Llama, a friendly chatbot.
Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.

User: Hello Llama
Llama: Hello.  How may I help you today?
User: Hi can you write a function that computes the nth fibonnaci number in python?
Llama:  Sure, here is my code:
def fib(n):
    if n == 0 or n == 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

}

@Test
public void testTokenization() {
String prompt = "Hello, world!";
int[] encoded = model.encode(prompt);
Assert.assertArrayEquals(new int[]{15043, 29892, 3186, 29991}, encoded);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token IDs are different because the model is different. I figured that a sufficient test is that the encode/decode can be round-tripped.

@@ -157,14 +171,13 @@ public void testCompleteGrammar() {
@Test
public void testEmbedding() {
float[] embedding = model.embed(prefix);
Assert.assertEquals(5120, embedding.length);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different dimensions of model between codellama and mistral 7b, so the embeddings have different dims.

<junit.version>4.13.1</junit.version>
<test.plugin.version>3.2.3</test.plugin.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<integration.test.model>mistral-7b-instruct-v0.2.Q5_K_S.gguf</integration.test.model>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to rework this to operate with mistral 7b since it's smaller and apache licensed.

@cestella cestella changed the title Upgrade llama.cpp to most recent release (b1645) Upgrade llama.cpp to most recent release (b1645) so it supports mixtral Dec 17, 2023
jsize len = string.size();
jbyteArray bytes = env->NewByteArray(len);
env->SetByteArrayRegion(bytes, 0, len, (jbyte *)string.c_str());
env->SetByteArrayRegion(bytes, 0, len, reinterpret_cast<const jbyte *>(string.c_str()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids a warning.


String modelPath = "/run/media/konstantin/Seagate/models/llama2/llama-2-13b-chat/ggml-model-q4_0.gguf";
.setAntiPrompt("User:");
String modelName = System.getProperty("model.name");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables us to run the examples via maven directly rather than have it hard coded. For instance, you can run this with the codellama model like so:
mvn exec:java -Dexec.mainClass="examples.MainExample" -Dmodel.home="/Users/cstella/llm/models" -Dmodel.name="codellama-13b.Q5_K_M.gguf"

Not sure if you would like this documented in the Readme.

@kherud
Copy link
Owner

kherud commented Dec 18, 2023

Awesome work, thank you! It's 12 am here, I will review everything tomorrow and merge if everything is fine!

@kherud kherud merged commit f961ebb into kherud:master Dec 19, 2023
@kherud
Copy link
Owner

kherud commented Dec 19, 2023

Thanks again @cestella great work! Just some notes:

  • Maybe we should default model.home to target/models if it's not set. Since the model is downloaded anyway, this would simplify getting started.
  • For running the integration tests, the lowest quantisation q2_k instead of q5_k_s should suffice, right? This would reduce the necessary download size.

@cestella
Copy link
Contributor Author

Absolutely!

  • Regarding making model.home default to target/models you might want to just create a models directory at the top level and make it that. mvn clean will delete the whole target directory, so you will end up redownloading the model a lot.
  • No problem moving to q2_k. I did notice that the unit tests may have been slightly different when I tried other quants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants