Skip to content

[WIP] Migrate yajl to yyjson#6534

Draft
orestisfl wants to merge 29 commits into
i3:nextfrom
orestisfl:yyjson
Draft

[WIP] Migrate yajl to yyjson#6534
orestisfl wants to merge 29 commits into
i3:nextfrom
orestisfl:yyjson

Conversation

@orestisfl
Copy link
Copy Markdown
Member

@orestisfl orestisfl commented Nov 29, 2025

Closes #6257

First full draft. Heavily assisted by LLMs, opening the PR for initial review. Significant reviewing is still needed both from me and reviewers.

Using the benchmark script from: #6257 (comment)

before:

i3 version 4.24-27-gb1e99d8f © 2009 Michael Stapelberg and contributors
nodes= 10 min= 0.07 median= 0.08 max= 0.28 avg= 0.08
nodes= 12 min= 0.09 median= 0.09 max= 0.34 avg= 0.10
nodes= 16 min= 0.11 median= 0.12 max= 0.36 avg= 0.13
nodes= 24 min= 0.16 median= 0.18 max= 0.58 avg= 0.20
nodes= 40 min= 0.25 median= 0.26 max= 0.53 avg= 0.27
nodes= 72 min= 0.46 median= 0.51 max= 0.83 avg= 0.53
nodes=136 min= 0.80 median= 0.91 max=16.96 avg= 1.18
nodes=264 min= 1.46 median= 1.65 max=15.32 avg= 1.97
python /tmp/i3-benchmark-get-tree.py 2> /dev/null  19,44s user 1,70s system 99% cpu 21,241 total

after:

i3 version 4.24-47-gbaedde18+ © 2009 Michael Stapelberg and contributors
nodes= 10 min= 0.04 median= 0.05 max= 0.27 avg= 0.06
nodes= 12 min= 0.05 median= 0.06 max= 0.76 avg= 0.09
nodes= 16 min= 0.06 median= 0.07 max= 0.25 avg= 0.07
nodes= 24 min= 0.08 median= 0.14 max= 0.69 avg= 0.16
nodes= 40 min= 0.13 median= 0.14 max= 0.42 avg= 0.15
nodes= 72 min= 0.21 median= 0.23 max= 0.64 avg= 0.26
nodes=136 min= 0.32 median= 0.46 max=14.75 avg= 0.74
nodes=264 min= 0.63 median= 0.79 max=17.16 avg= 1.14
python /tmp/i3-benchmark-get-tree.py 2> /dev/null  18,59s user 1,86s system 99% cpu 20,493 total

more benchmarks are needed.

- Migrate load_layout.c from yajl to yyjson for JSON parsing
- Migrate i3-msg/main.c from yajl to yyjson
- Add yyjson dependency to meson.build
- Use DOM-style parsing with yyjson which is simpler than SAX callbacks

Part of issue i3#6257: Migrate away from yajl
- Migrate display_version.c from yajl to yyjson
- Use DOM-style parsing instead of SAX callbacks
- Simplifies the code significantly

Part of issue i3#6257: Migrate away from yajl
- Migrate parse_json_header.c from yajl to yyjson
- Add yyjson dependency to i3bar build
- Add 120s timeout to test runner to prevent hung tests

Part of issue i3#6257: Migrate away from yajl
Major migration of JSON generation from yajl to yyjson:
- ipc.c: All IPC handlers now use yyjson for JSON generation
- commands.c: Command results use yyjson via ysuccess/yerror macros
- commands_parser.c: Parser context now uses yyjson_mut_doc
- util.c: store_restart_layout uses dump_node with yyjson
- con.c, workspace.c: Workspace events now use yyjson directly

Changed function signatures:
- dump_node: now takes yyjson_mut_doc* and returns yyjson_mut_val*
- parse_command: now takes yyjson_mut_doc* instead of yajl_gen

Removed ipc_marshal_workspace_event since it returned yajl_gen.
The workspace event JSON is now built inline where needed.

Part of issue i3#6257: Migrate away from yajl
Migrate all i3bar JSON parsing from yajl to yyjson:
- i3bar/src/mode.c: Mode change event parsing
- i3bar/src/outputs.c: Outputs reply parsing
- i3bar/src/workspaces.c: Workspaces reply parsing
- i3bar/src/config.c: Bar configuration parsing
- i3bar/src/child.c: Status line input parsing and click event generation

Part of issue i3#6257: Migrate away from yajl
- Remove yajl from meson.build dependencies
- Delete include/yajl_utils.h (no longer needed)
- Clean up unused yajl include in config_parser.h

Completes issue i3#6257: Migrate away from yajl

yajl is no longer actively maintained and has CVEs. This migration
replaces all yajl usage with yyjson, a faster DOM-based JSON library.
- Update DEPENDS to list yyjson instead of yajl
- Update debian/control to use libyyjson-dev
- Remove yajl mentions from comments in code
- Update docs/layout-saving to remove libyajl example

Part of issue i3#6257: Migrate away from yajl
yyjson is configured with YYJSON_READ_ALLOW_TRAILING_COMMAS which
intentionally accepts trailing commas in JSON. This is more user-friendly
for manually edited layout files. Update test descriptions to reflect
this behavior change.
yyjson_mut_obj_add_str() stores a pointer without copying, so if the
string is freed before serialization, the JSON output is corrupted.

Use yyjson_mut_obj_add_strcpy() for strings that are freed after being
added to the JSON document:
- yerror macro in commands.c: 'message' is freed after sasprintf
- commands_parser.c: 'position' is freed after being added

This fixes corrupted error messages like 'L      match: invalid con_id'
instead of 'Invalid match: invalid con_id'.
Introduce yyjson_utils.h with:
- Custom allocator using i3's smalloc/srealloc (which abort on OOM)
- json_write() wrapper that uses the i3 allocator and asserts non-NULL

This simplifies ipc.c by:
- Removing manual NULL checks after yyjson_mut_write()
- Using a consistent json_write() function throughout

Since smalloc/srealloc abort on allocation failure, json_write()
is guaranteed to return a valid pointer, making the assert a
defense-in-depth check rather than error handling.
- Move json_utils from header-only to libi3/json_utils.c
- Add json_new() and json_write() declarations to libi3.h
- Add __attribute__((returns_nonnull)) to safe allocation wrappers
  (smalloc, scalloc, srealloc, sstrdup, sstrndup)
- Fix use-after-free in commands_parser.c error handling:
  yyjson_mut_obj_add_str() stores pointers without copying, but the
  'input' and 'error' strings were freed before json_write() was called.
  Changed to yyjson_mut_obj_add_strcpy() which copies the strings.
- Add error reporting to json_write() for debugging write failures
yyjson handles numeric formatting consistently regardless of locale,
so the setlocale(LC_NUMERIC) calls around JSON generation are no longer
needed.

Also updates con.c, workspace.c, and util.c to use the json_new() and
json_write() wrappers for consistency.

https://ibireme.github.io/yyjson/doc/doxygen/html/api.html#locale-independence
Since we already validate the JSON document before traversing it, we can
use the unsafe_ variants of yyjson functions which skip redundant type
checks. Also add a NULL check at the start of traverse_and_invoke_callbacks
for safety.
- Fix infinite JSON array protocol: skip leading '[' (array opener),
  ',' (subsequent updates), and ']' (shutdown)
- Use YYJSON_READ_STOP_WHEN_DONE to accept trailing content
- Add get_string() helper to simplify string field parsing
- Use unsafe_ yyjson variants after type checks for performance
- Use json_new()/json_write() for click event generation
- Inline add_json_result helper in commands_parser.c
- Use json_new() in commands_parser test main()
- get_buffer: return char* instead of unsigned char*
- read_json_input: take const char* parameter
- copy_statusline: take const struct statusline_head*
- isempty, is_alive: add const to parameters
- Use isspace() instead of manual whitespace checks
- get_string: also reject empty strings
- Move variable declarations closer to use
Replace yyjson_read() calls with yyjson_read_opts() and use
yyjson_read_err to provide detailed error messages including
the error description and position in the input.

Updated files:
- i3-msg/main.c: parse_reply, parse_config_reply
- i3bar/src/child.c: read_json_input
- i3bar/src/config.c: parse_config_json, parse_get_first_i3bar_config
- i3bar/src/mode.c: parse_mode_json
- i3bar/src/outputs.c: parse_outputs_json
- i3bar/src/workspaces.c: parse_workspaces_json
- src/display_version.c: version parsing
- src/ipc.c: subscribe, sync handlers
- Simplify JSON field access patterns in i3bar config parsing
- Use unsafe_yyjson_get_* variants after type checks for better performance
- Consolidate DLOG statements for related config options
- Remove redundant yyjson_val declarations where inline access suffices

The unsafe_ variants skip type validation which is safe after explicit
yyjson_is_* checks, providing a minor performance improvement.
Add compact macros using token pasting (##) for JSON field access:
- json_get(obj, key, type): required field with type check
- json_get_obj(obj, key): required object field
- json_opt(obj, key, type, def): optional field with default
- json_opt_val(obj, key, type): optional field returning yyjson_val*

These macros expand to yyjson_is_##type and unsafe_yyjson_get_##type,
providing type-safe JSON parsing with clear error messages on mismatch.
Since the JSON comes from i3, type mismatches indicate bugs.
Add a simple json_opt(obj, key, type, def) macro that returns the
default value if the key is missing or has the wrong type. This
reduces verbose get+check+extract patterns to single-line assignments.

Applied to: workspaces.c, outputs.c, mode.c, parse_json_header.c
Each non-empty line in a change/bugfix file now becomes a separate
bullet point. This allows grouping related changes (like the yyjson
migration) in a single file.

Also fix inconsistent 2-space indentation in get_number().
@orestisfl orestisfl requested a review from stapelberg November 29, 2025 17:46
@orestisfl orestisfl self-assigned this Nov 29, 2025
@orestisfl orestisfl changed the title [WIP] Migrate yyjson [WIP] Migrate yajl to yyjson Nov 29, 2025
@orestisfl orestisfl mentioned this pull request Nov 29, 2025
2 tasks
@orestisfl
Copy link
Copy Markdown
Member Author

@stapelberg this is quite impossible to review like this. I can split it into multiple PRs and we could target a different branch before merging into next.

@stapelberg
Copy link
Copy Markdown
Member

Which LLM and prompts did you use for this (if you still have them)?

In general, just to be prudent, I wouldn’t want to merge an LLM-authored PR for something as intricate as the JSON parsing.

However, we can still use LLMs to prototype this migration and get a better feel for things.

I notice that the PR contains patterns like:

        yyjson_val *hr_val = yyjson_obj_get(root, "human_readable");
        if (hr_val && yyjson_is_str(hr_val)) {
            human_readable_version = sstrdup(unsafe_yyjson_get_str(hr_val));
        }

But the yyjson docs specifically states that the library performs null checks: https://ibireme.github.io/yyjson/doc/doxygen/html/api.html#null-check

So we could avoid the unsafe accessor and explicit type-checking:

        const char* hr = yyjson_get_str(yyjson_obj_get(root, "human_readable"));
        if (hr) {
            human_readable_version = sstrdup(hr);
        }

Sometimes, the LLM seems to inline functions…?! See ipc_marshal_workspace_event for example, whose call was replaced with direct yyjson calls.

Instead of having the LLM make the code changes, I would be more interested in having the LLM produce rewrite rules (like, sed commands, or a script, or something similar) that can be deterministically applied, reviewed and changed. Have you tried that approach?

@orestisfl
Copy link
Copy Markdown
Member Author

Which LLM and prompts did you use for this (if you still have them)?

LLM is Claude Opus 4.5. It was quite a long back and forth.

I notice that the PR contains patterns like:

Yup, I noticed that. It was probably my own mis-use of the libray in next...orestisfl:i3:yyjson-tree_append_json. I used that as the "seed" of the refactor and copied my pattern. I'll be correcting the use if we plan to make this effort ready for review but I was running out of my time budget and wanted to push this draft for discussion.

Sometimes, the LLM seems to inline functions…?! See ipc_marshal_workspace_event for example, whose call was replaced with direct yyjson calls.

Yes, noticed that 👍

In general, just to be prudent, I wouldn’t want to merge an LLM-authored PR for something as intricate as the JSON parsing.
...
Instead of having the LLM make the code changes, I would be more interested in having the LLM produce rewrite rules (like, sed commands, or a script, or something similar) that can be deterministically applied, reviewed and changed. Have you tried that approach?

Some thoughts:

  1. yajl uses callbacks, yyjson uses DOM traversal. We can adapt yyjson to use callbacks but I feel that would be a missed opportunity as our callback-based code is quite difficult to understand and can lead to hard-to-fix/reason bugs: Assertion failure when appending layout #4335. That said, my initial branch next...orestisfl:i3:yyjson-tree_append_json did maintain the callback-based code so it's definitely possible to continue with that and have a smaller diff as a result.
  2. The APIs for writing json are quite similar. We could adjust our macros y,ystr macros and leave the rest of the code largely intact apart from allocating/freeing.
  3. Perfecting a script like that is extremely difficult because allocating and freeing is context dependent. We could still have a script that gives us 90% correct code, git commit the result to ease reviewing and then work on fixing the code base manually on subsequent commits.
  4. This PR is not LLM-authored, I am still the main author here, the LLM is the tool. I steered the direction of the refactor, caught issues, gave directions to amend them, validated behavior and manually reviewed all files. The draft status of this PR indicates that the complete code is not ready to be reviewed. A ready-for-review PR should be reviewed like any human code.

Addendum: optimization notes

For the DOM-based API, there are optimizations we could be using like instead of the "linear search times" e.g. yyjson_obj_get use yyjson_obj_iter_get

@stapelberg
Copy link
Copy Markdown
Member

Thanks for the details!

  1. yajl uses callbacks, yyjson uses DOM traversal. We can adapt yyjson to use callbacks but I feel that would be a missed opportunity as our callback-based code is quite difficult to understand and can lead to hard-to-fix/reason bugs: Assertion failure when appending layout #4335. That said, my initial branch next...orestisfl:i3:yyjson-tree_append_json did maintain the callback-based code so it's definitely possible to continue with that and have a smaller diff as a result.

Agreed, we should switch to DOM-based traversal instead of callbacks.

  1. The APIs for writing json are quite similar. We could adjust our macros y,ystr macros and leave the rest of the code largely intact apart from allocating/freeing.

Sounds very good. Starting with a minimal diff (e.g. changing only the write path with minimal changes) and then expanding from there is generally a good strategy.

  1. Perfecting a script like that is extremely difficult because allocating and freeing is context dependent. We could still have a script that gives us 90% correct code, git commit the result to ease reviewing and then work on fixing the code base manually on subsequent commits.

Agreed. No need to make the translation perfect, as long as it removes most of the tedious steps of the refactoring.

Ideally, we could have git commits that are entirely deterministic tool applications, and then separate commits for where we manually fixed up things.

  1. This PR is not LLM-authored, I am still the main author here, the LLM is the tool. I steered the direction of the refactor, caught issues, gave directions to amend them, validated behavior and manually reviewed all files. The draft status of this PR indicates that the complete code is not ready to be reviewed. A ready-for-review PR should be reviewed like any human code.

OK. As long as you review the entire diff you’re sending, that’s okay with me.

That said, I think we should be transparent about LLM usage. Adding a line like “Assisted by Claude Opus 4.5, but every line is human-reviewed.” would be sufficient IMO.

Addendum: optimization notes

For the DOM-based API, there are optimizations we could be using like instead of the "linear search times" e.g. yyjson_obj_get use yyjson_obj_iter_get

That’s fair, but we can move performance optimizations into a follow-up step to reduce scope of this already-large refactor :)

@orestisfl
Copy link
Copy Markdown
Member Author

Another point of incompatibility is between the write APIs: yajl_gen_string is used both for keys and values:

Generate a string value or map key from str.

I've also tried to create a compatibility layer in C only. That could be something we can use as well as a base. Tests pass, no further manual testing done (e.g. i3bar is broken without handling streaming json): 8a08011

@stapelberg
Copy link
Copy Markdown
Member

Now that v4.25 is released, we can resume work on this.

I've also tried to create a compatibility layer in C only.

Yeah, I like that idea. We could also change our existing code to be easier to migrate, e.g. by using two separate names for keys/values instead of directly using yajl_gen_string for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate away from yajl

2 participants