Upgrade llama.cpp to most recent release (b1645) so it supports mixtral by cestella · Pull Request #33 · kherud/java-llama.cpp

cestella · 2023-12-17T02:14:35Z

This PR does the following:

Upgrades llama.cpp to the most recent release of llama.cpp: b1645. This enables the library to load mixtral models
Slightly adjusts the testing
- Moves the only test to operate as an integration test in maven
- Removes the hard coded model expectation and downloads the model if it doesn't exist when run via maven

After this PR, you can run the integration tests via maven like so:
mvn verify -Dmodel.home=$HOME/llm/models
Note:

$HOME/llm/models is the model home directory. This is where the integration test model will be stored.
The assumption is only that the directory exist (otherwise it errors out appropriately). If the model (a version of mistral 7b) doesn't exist in the directory it will download it

Examples

In addition to refactoring the unit test, I reran the examples. I modified the examples to be able to be run via maven:

# Run the MainExample with the codellama model.  Note: the model must exist.
mvn exec:java -Dexec.mainClass="examples.MainExample" -Dmodel.home="/Users/cstella/llm/models" -Dmodel.name="codellama-13b.Q5_K_M.gguf"

`examples.GrammarExample`

mvn exec:java -Dexec.mainClass="examples.GrammarExample" -Dmodel.home="/Users/cstella/llm/models" -Dmodel.name="codellama-13b.Q5_K_M.gguf"
...
root ::= root_4
root_1 ::= expr [=] term [<U+000A>]
expr ::= term expr_6
term ::= [0-9]
root_4 ::= root_1 root_4 | root_1
expr_5 ::= [-+*/] term
expr_6 ::= expr_5 expr_6 |
4-3-1-2-3-4-1-1-1-1-1-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-...

This looks roughly like what I see if I run the same example from main:

4-3-1-2-3-4-3-3-4-5-6-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-1-2-3-4-5-6-7-8-9-0-1-2-3 ...

`examples.InfillExample`

With the new code I see:

def remove_non_ascii(s: str) -> str:
    """ Removes non-ASCII characters from a string """
    # Remove all non-ascii characters.
    result = b''.join(
        c for c in s if 0x20 <= ord(c) < 0x7F).decode() <EOT>
    return result

With master I see:

def remove_non_ascii(s: str) -> str:
    """ remove non-ASCII characters from a string """
    try:
        result = s.encode("latin1", "replace").decode()
    except AttributeError:
        result = str(s).encode("latin1", "replace").decode() <EOT>
    return result

`examples.MainExample`

I made a couple of modifications to both the old version and the new version to compare:

I changed the AntiPrompt to User: so it actually completes the statement
I have the system prompt expanded to include a greeting

With this PR I see:

User: Hello Llama
Llama: Hello.  How may I help you today?
User: Hi can you write a python function that computes the nth fibonacci number?
Llama:
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

User:

With master I see:

This is a conversation between User and Llama, a friendly chatbot.
Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.

User: Hello Llama
Llama: Hello.  How may I help you today?
User: Hi can you write a function that computes the nth fibonnaci number in python?
Llama:  Sure, here is my code:
def fib(n):
    if n == 0 or n == 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

cestella · 2023-12-17T02:15:16Z

src/test/java/de/kherud/llama/LlamaModelIT.java

 	}

 	@Test
 	public void testTokenization() {
 		String prompt = "Hello, world!";
 		int[] encoded = model.encode(prompt);
-		Assert.assertArrayEquals(new int[]{15043, 29892, 3186, 29991}, encoded);


The token IDs are different because the model is different. I figured that a sufficient test is that the encode/decode can be round-tripped.

cestella · 2023-12-17T02:15:48Z

src/test/java/de/kherud/llama/LlamaModelIT.java

@@ -157,14 +171,13 @@ public void testCompleteGrammar() {
 	@Test
 	public void testEmbedding() {
 		float[] embedding = model.embed(prefix);
-		Assert.assertEquals(5120, embedding.length);


Different dimensions of model between codellama and mistral 7b, so the embeddings have different dims.

cestella · 2023-12-17T02:29:08Z

pom.xml

 		<junit.version>4.13.1</junit.version>
+		<test.plugin.version>3.2.3</test.plugin.version>
 		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+		<integration.test.model>mistral-7b-instruct-v0.2.Q5_K_S.gguf</integration.test.model>


I decided to rework this to operate with mistral 7b since it's smaller and apache licensed.

cestella · 2023-12-17T04:06:55Z

src/main/cpp/jllama.cpp

    jsize len = string.size();
    jbyteArray bytes = env->NewByteArray(len);
-    env->SetByteArrayRegion(bytes, 0, len, (jbyte *)string.c_str());
+    env->SetByteArrayRegion(bytes, 0, len, reinterpret_cast<const jbyte *>(string.c_str()));


This avoids a warning.

cestella · 2023-12-17T04:12:22Z

src/test/java/examples/MainExample.java

-
-        String modelPath = "/run/media/konstantin/Seagate/models/llama2/llama-2-13b-chat/ggml-model-q4_0.gguf";
+                .setAntiPrompt("User:");
+        String modelName = System.getProperty("model.name");


This enables us to run the examples via maven directly rather than have it hard coded. For instance, you can run this with the codellama model like so:
mvn exec:java -Dexec.mainClass="examples.MainExample" -Dmodel.home="/Users/cstella/llm/models" -Dmodel.name="codellama-13b.Q5_K_M.gguf"

Not sure if you would like this documented in the Readme.

kherud · 2023-12-18T23:31:21Z

Awesome work, thank you! It's 12 am here, I will review everything tomorrow and merge if everything is fine!

kherud · 2023-12-19T10:44:00Z

Thanks again @cestella great work! Just some notes:

Maybe we should default model.home to target/models if it's not set. Since the model is downloaded anyway, this would simplify getting started.
For running the integration tests, the lowest quantisation q2_k instead of q5_k_s should suffice, right? This would reduce the necessary download size.

cestella · 2023-12-19T21:42:39Z

Absolutely!

Regarding making model.home default to target/models you might want to just create a models directory at the top level and make it that. mvn clean will delete the whole target directory, so you will end up redownloading the model a lot.
No problem moving to q2_k. I did notice that the unit tests may have been slightly different when I tried other quants.

cestella added 3 commits December 16, 2023 15:28

Upgrade to use most recent release of llama.cpp: b1645

209d8e5

Refactoring tests to be slightly more generic

c8d05c3

made test error out in a more instructive way

311539d

cestella commented Dec 17, 2023

View reviewed changes

cestella added 3 commits December 16, 2023 21:58

Small refactoring to run examples

d65fa76

Fixing a few examples

915c066

Forgot to change back the temp

495a4d2

cestella changed the title ~~Upgrade llama.cpp to most recent release (b1645)~~ Upgrade llama.cpp to most recent release (b1645) so it supports mixtral Dec 17, 2023

cestella commented Dec 17, 2023

View reviewed changes

cestella added 2 commits December 16, 2023 23:08

removing extraneous import

4220ac2

comments

5adb4a7

cestella commented Dec 17, 2023

View reviewed changes

cestella added 4 commits December 16, 2023 23:16

formatting

d3cd256

Accidentally clobbered too much

8f78cb8

formatting

263f2dc

Formatting

4f2dc97

kherud merged commit f961ebb into kherud:master Dec 19, 2023

cestella mentioned this pull request Dec 19, 2023

Fixing build and defaulting models.home #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade llama.cpp to most recent release (b1645) so it supports mixtral#33

Upgrade llama.cpp to most recent release (b1645) so it supports mixtral#33
kherud merged 12 commits intokherud:masterfrom
cestella:upgrade_llama_cpp_b1645

cestella commented Dec 17, 2023 •

edited

Loading

Uh oh!

cestella Dec 17, 2023

Uh oh!

cestella Dec 17, 2023

Uh oh!

cestella Dec 17, 2023

Uh oh!

cestella Dec 17, 2023

Uh oh!

cestella Dec 17, 2023

Uh oh!

kherud commented Dec 18, 2023

Uh oh!

kherud commented Dec 19, 2023

Uh oh!

cestella commented Dec 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cestella commented Dec 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples

examples.GrammarExample

examples.InfillExample

examples.MainExample

Uh oh!

cestella Dec 17, 2023

Choose a reason for hiding this comment

Uh oh!

cestella Dec 17, 2023

Choose a reason for hiding this comment

Uh oh!

cestella Dec 17, 2023

Choose a reason for hiding this comment

Uh oh!

cestella Dec 17, 2023

Choose a reason for hiding this comment

Uh oh!

cestella Dec 17, 2023

Choose a reason for hiding this comment

Uh oh!

kherud commented Dec 18, 2023

Uh oh!

kherud commented Dec 19, 2023

Uh oh!

cestella commented Dec 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cestella commented Dec 17, 2023 •

edited

Loading

`examples.GrammarExample`

`examples.InfillExample`

`examples.MainExample`