Dynamic Arrays in LLVM - Declaring a constant/global

Question

I want to model dynamic arrays. This is the plan i've come up with: there will be a base struct for all my arrays, including a vtable-pointer, and also the runtime-size of the array:

%anyarray_base = type {
  %other_stuff,
  i64,           ; runtime size
}
%bytearray = type { %anyarray_base, [0 x i8] }

This works for arrays created fully at runtime. I malloc memory for the %anyarray_base plus the size of the "payload". I can access the data in that [0 x i8] using getelementptr just fine.

The problem i have is constants. Very concrete case: my program has a constant string "Hello World", and i want to create a constant in my LLVM module to hold that string. So, i'd write

@myConstantString = global %bytearray {
  %anyarray_base {
    @other_stuff,    ; constant misc data about the array
    i64 12           ; array size, 12 bytes
  },
  [12 x i8] c"Hello, World!"  ; the actual literal from the source code
}

llvm-as doesn't accept this:

error: element 1 of struct initializer doesn't match struct element type
(points at the global %bytearray constant)

I'm clearly missing some understanding on how LLVM works. Please help me build Hello World in my toy language :)

arnt · Accepted Answer · 2024-02-10 15:46:47Z

1

What you need is types, lots of types! Which will add complexity, which you need to contain within the smallest possible part of your code.

You need one type for each array size that you will use as a constant. If your code uses string constants of length 0, 1, 2, 5, 10 and 15, you need six string types. These will typically be in a map from int to type, maintained by a small module with just two public functions:

One function provides and returns a pointer to a constant array (such as a string). That function's purpose is to encapsulate five of the six types. If the string is "hello", then it asks the LLVM Module to allocate an instance of your "five-byte string" type, and then it returns a pointer to that.
The other function returns the string type. It always returns the "zero-byte string".

Most of the code you write uses only one type for strings, and will getelementptr beyond the end of the type (which is well-defined in LLVM IR). One small module sees many string types.

This is easily generalised to any array of constant-sized elements, and leads to pleasant, simple code when you use it with LLVM.

answered Feb 10, 2024 at 15:46

arnt

9,8285 gold badges27 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

arnt Over a year ago

One additional comment: Make sure the offsets match. I don't remember the name of the relevant LLVM layout. Just write a unit test that checks that the offset of the first string byte is equal for all of your string structs.

marstato Over a year ago

Thanks a ton!! It seemed strange to me at first; but yep, doing that "unsafe" cast but covering my ass for it with the unit test is a great idea!

arnt Over a year ago

Covering... you'll need it. LLVM contains some clever code that notices that the total size of the struct will be 32/64/144 bytes including padding, and will distribute padding so as to place individual fields for ease of access. That can cause individual fields to have different offsets depending on the fields after them.

arnt Over a year ago

Maybe I should give an example. Suppose a struct contains an i8 and an i12, and LLVM concludes that on this architecture, the struct will have size 8 bytes. LLVM can then say: "hm, if I put the i8 on offset 0 and the i12 on offset 4 bytes, the code might be a bit faster than if I put all the padding at the end". I'm not going to admit that this happened to me, but this optimisation can completely ruin the day of a careless, sloppy, naïve programmer. Test early and often.

marstato Over a year ago

True, LLVM going nuts with optimization would set me up for a very bad day™. This means a unit test is not enough, though! I need to abort compilation if, using the actual target/datalayout, the offsets don't work out. I'm already using getelementptr very strictly to stay safe, but that doesn't work in this case.

|

Collectives™ on Stack Overflow

Dynamic Arrays in LLVM - Declaring a constant/global

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related