[WIP] Migrate yajl to yyjson#6534
Conversation
- Migrate load_layout.c from yajl to yyjson for JSON parsing - Migrate i3-msg/main.c from yajl to yyjson - Add yyjson dependency to meson.build - Use DOM-style parsing with yyjson which is simpler than SAX callbacks Part of issue i3#6257: Migrate away from yajl
- Migrate display_version.c from yajl to yyjson - Use DOM-style parsing instead of SAX callbacks - Simplifies the code significantly Part of issue i3#6257: Migrate away from yajl
- Migrate parse_json_header.c from yajl to yyjson - Add yyjson dependency to i3bar build - Add 120s timeout to test runner to prevent hung tests Part of issue i3#6257: Migrate away from yajl
Major migration of JSON generation from yajl to yyjson: - ipc.c: All IPC handlers now use yyjson for JSON generation - commands.c: Command results use yyjson via ysuccess/yerror macros - commands_parser.c: Parser context now uses yyjson_mut_doc - util.c: store_restart_layout uses dump_node with yyjson - con.c, workspace.c: Workspace events now use yyjson directly Changed function signatures: - dump_node: now takes yyjson_mut_doc* and returns yyjson_mut_val* - parse_command: now takes yyjson_mut_doc* instead of yajl_gen Removed ipc_marshal_workspace_event since it returned yajl_gen. The workspace event JSON is now built inline where needed. Part of issue i3#6257: Migrate away from yajl
Migrate all i3bar JSON parsing from yajl to yyjson: - i3bar/src/mode.c: Mode change event parsing - i3bar/src/outputs.c: Outputs reply parsing - i3bar/src/workspaces.c: Workspaces reply parsing - i3bar/src/config.c: Bar configuration parsing - i3bar/src/child.c: Status line input parsing and click event generation Part of issue i3#6257: Migrate away from yajl
- Remove yajl from meson.build dependencies - Delete include/yajl_utils.h (no longer needed) - Clean up unused yajl include in config_parser.h Completes issue i3#6257: Migrate away from yajl yajl is no longer actively maintained and has CVEs. This migration replaces all yajl usage with yyjson, a faster DOM-based JSON library.
- Update DEPENDS to list yyjson instead of yajl - Update debian/control to use libyyjson-dev - Remove yajl mentions from comments in code - Update docs/layout-saving to remove libyajl example Part of issue i3#6257: Migrate away from yajl
yyjson is configured with YYJSON_READ_ALLOW_TRAILING_COMMAS which intentionally accepts trailing commas in JSON. This is more user-friendly for manually edited layout files. Update test descriptions to reflect this behavior change.
yyjson_mut_obj_add_str() stores a pointer without copying, so if the string is freed before serialization, the JSON output is corrupted. Use yyjson_mut_obj_add_strcpy() for strings that are freed after being added to the JSON document: - yerror macro in commands.c: 'message' is freed after sasprintf - commands_parser.c: 'position' is freed after being added This fixes corrupted error messages like 'L match: invalid con_id' instead of 'Invalid match: invalid con_id'.
Introduce yyjson_utils.h with: - Custom allocator using i3's smalloc/srealloc (which abort on OOM) - json_write() wrapper that uses the i3 allocator and asserts non-NULL This simplifies ipc.c by: - Removing manual NULL checks after yyjson_mut_write() - Using a consistent json_write() function throughout Since smalloc/srealloc abort on allocation failure, json_write() is guaranteed to return a valid pointer, making the assert a defense-in-depth check rather than error handling.
- Move json_utils from header-only to libi3/json_utils.c - Add json_new() and json_write() declarations to libi3.h - Add __attribute__((returns_nonnull)) to safe allocation wrappers (smalloc, scalloc, srealloc, sstrdup, sstrndup) - Fix use-after-free in commands_parser.c error handling: yyjson_mut_obj_add_str() stores pointers without copying, but the 'input' and 'error' strings were freed before json_write() was called. Changed to yyjson_mut_obj_add_strcpy() which copies the strings. - Add error reporting to json_write() for debugging write failures
yyjson handles numeric formatting consistently regardless of locale, so the setlocale(LC_NUMERIC) calls around JSON generation are no longer needed. Also updates con.c, workspace.c, and util.c to use the json_new() and json_write() wrappers for consistency. https://ibireme.github.io/yyjson/doc/doxygen/html/api.html#locale-independence
Since we already validate the JSON document before traversing it, we can use the unsafe_ variants of yyjson functions which skip redundant type checks. Also add a NULL check at the start of traverse_and_invoke_callbacks for safety.
- Fix infinite JSON array protocol: skip leading '[' (array opener), ',' (subsequent updates), and ']' (shutdown) - Use YYJSON_READ_STOP_WHEN_DONE to accept trailing content - Add get_string() helper to simplify string field parsing - Use unsafe_ yyjson variants after type checks for performance - Use json_new()/json_write() for click event generation - Inline add_json_result helper in commands_parser.c - Use json_new() in commands_parser test main()
- get_buffer: return char* instead of unsigned char* - read_json_input: take const char* parameter - copy_statusline: take const struct statusline_head* - isempty, is_alive: add const to parameters - Use isspace() instead of manual whitespace checks - get_string: also reject empty strings - Move variable declarations closer to use
Replace yyjson_read() calls with yyjson_read_opts() and use yyjson_read_err to provide detailed error messages including the error description and position in the input. Updated files: - i3-msg/main.c: parse_reply, parse_config_reply - i3bar/src/child.c: read_json_input - i3bar/src/config.c: parse_config_json, parse_get_first_i3bar_config - i3bar/src/mode.c: parse_mode_json - i3bar/src/outputs.c: parse_outputs_json - i3bar/src/workspaces.c: parse_workspaces_json - src/display_version.c: version parsing - src/ipc.c: subscribe, sync handlers
- Simplify JSON field access patterns in i3bar config parsing - Use unsafe_yyjson_get_* variants after type checks for better performance - Consolidate DLOG statements for related config options - Remove redundant yyjson_val declarations where inline access suffices The unsafe_ variants skip type validation which is safe after explicit yyjson_is_* checks, providing a minor performance improvement.
Add compact macros using token pasting (##) for JSON field access: - json_get(obj, key, type): required field with type check - json_get_obj(obj, key): required object field - json_opt(obj, key, type, def): optional field with default - json_opt_val(obj, key, type): optional field returning yyjson_val* These macros expand to yyjson_is_##type and unsafe_yyjson_get_##type, providing type-safe JSON parsing with clear error messages on mismatch. Since the JSON comes from i3, type mismatches indicate bugs.
Add a simple json_opt(obj, key, type, def) macro that returns the default value if the key is missing or has the wrong type. This reduces verbose get+check+extract patterns to single-line assignments. Applied to: workspaces.c, outputs.c, mode.c, parse_json_header.c
Each non-empty line in a change/bugfix file now becomes a separate bullet point. This allows grouping related changes (like the yyjson migration) in a single file. Also fix inconsistent 2-space indentation in get_number().
|
@stapelberg this is quite impossible to review like this. I can split it into multiple PRs and we could target a different branch before merging into next. |
|
Which LLM and prompts did you use for this (if you still have them)? In general, just to be prudent, I wouldn’t want to merge an LLM-authored PR for something as intricate as the JSON parsing. However, we can still use LLMs to prototype this migration and get a better feel for things. I notice that the PR contains patterns like: yyjson_val *hr_val = yyjson_obj_get(root, "human_readable");
if (hr_val && yyjson_is_str(hr_val)) {
human_readable_version = sstrdup(unsafe_yyjson_get_str(hr_val));
}But the yyjson docs specifically states that the library performs null checks: https://ibireme.github.io/yyjson/doc/doxygen/html/api.html#null-check So we could avoid the unsafe accessor and explicit type-checking: const char* hr = yyjson_get_str(yyjson_obj_get(root, "human_readable"));
if (hr) {
human_readable_version = sstrdup(hr);
}Sometimes, the LLM seems to inline functions…?! See Instead of having the LLM make the code changes, I would be more interested in having the LLM produce rewrite rules (like, |
LLM is Claude Opus 4.5. It was quite a long back and forth.
Yup, I noticed that. It was probably my own mis-use of the libray in next...orestisfl:i3:yyjson-tree_append_json. I used that as the "seed" of the refactor and copied my pattern. I'll be correcting the use if we plan to make this effort ready for review but I was running out of my time budget and wanted to push this draft for discussion.
Yes, noticed that 👍
Some thoughts:
Addendum: optimization notesFor the DOM-based API, there are optimizations we could be using like instead of the "linear search times" e.g. yyjson_obj_get use yyjson_obj_iter_get |
|
Thanks for the details!
Agreed, we should switch to DOM-based traversal instead of callbacks.
Sounds very good. Starting with a minimal diff (e.g. changing only the write path with minimal changes) and then expanding from there is generally a good strategy.
Agreed. No need to make the translation perfect, as long as it removes most of the tedious steps of the refactoring. Ideally, we could have git commits that are entirely deterministic tool applications, and then separate commits for where we manually fixed up things.
OK. As long as you review the entire diff you’re sending, that’s okay with me. That said, I think we should be transparent about LLM usage. Adding a line like “Assisted by Claude Opus 4.5, but every line is human-reviewed.” would be sufficient IMO.
That’s fair, but we can move performance optimizations into a follow-up step to reduce scope of this already-large refactor :) |
|
Another point of incompatibility is between the write APIs: yajl_gen_string is used both for keys and values:
I've also tried to create a compatibility layer in C only. That could be something we can use as well as a base. Tests pass, no further manual testing done (e.g. i3bar is broken without handling streaming json): 8a08011 |
|
Now that v4.25 is released, we can resume work on this.
Yeah, I like that idea. We could also change our existing code to be easier to migrate, e.g. by using two separate names for keys/values instead of directly using yajl_gen_string for both. |
Closes #6257
First full draft. Heavily assisted by LLMs, opening the PR for initial review. Significant reviewing is still needed both from me and reviewers.
Using the benchmark script from: #6257 (comment)
before:
after:
more benchmarks are needed.