RTC: Fix cursor index sync with rich text formatting#76418
RTC: Fix cursor index sync with rich text formatting#76418alecgeatches merged 22 commits intotrunkfrom
Conversation
…versions between text indexes
|
@dmsnell I'd love your review on this approach! It's not exactly what you described in #76058 (comment), but it uses marker characters in a similar way. Thanks! |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Size Change: +337 B (0%) Total Size: 8.75 MB
ℹ️ View Unchanged
|
|
Flaky tests detected in 9fefe8c. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/23207323696
|
| ) { | ||
| safeIndex = lastAmp; | ||
| } | ||
| } |
There was a problem hiding this comment.
this function is a bit confusing to me, so I think I might be misunderstanding. it looks like it’s recreating the same kind of string-based HTML parsing we had before; with the assumption that < opens a tag and > closes a tag.
it also appears to traverse backwards through the HTML which is even more fraught because some text genuinely represents HTML non-text nodes in some context while not in others, something that cannot be revealed when walking backwards.
There was a problem hiding this comment.
Thank you for reviewing, I'd love to improve this. This is intended to correct when the given htmlIndex is within a tag. Given:
some <strong>words</strong> test
If we receive htmlIndex = 9, then our marker would fall within the <strong> tag:
some <str\uFFFDong>words</strong> test
When we parse HTML later with create(), a marker in the HTML content itself could break parsing if it's mid-tag. The goal is to avoid inserting our marker character in a place that would break either the HTML tag or entities e.g. &am\uFFFDp;.
But you're right, this edge-case workaround still bakes in the assumption that we can per-character parse HTML without context. Ideally htmlIndex is never within a tag though. Maybe we can just ignore that case, since we're getting htmlIndex from another user's richTextOffsetToHtmlIndex() that should be correctly anchored in HTML. We also might be able to return some default value if create() returns invalid markup.
Open to any ideas if you have them!
There was a problem hiding this comment.
if we’re using the marker-based approach, how are we ending up inside a tag?
the problem with using < and > like this is that is gives hardly any guarantee that it will slide in front of or behind a tag. it’s also incredibly likely that it will move markers that were already in a text node into a tag.
many definitely try to parse HTML this way. we have forty years of prior art…
There was a problem hiding this comment.
if we’re using the marker-based approach, how are we ending up inside a tag?
We're receiving these markers from other peers with possibly different CRDTs, so this was an attempt to add some edge-case detection. You're correct though, this doesn't catch "edge cases" as much as it will probably make problems with regular content to a greater degree.
Thank you for the wisdom here, it's really helpful. I'm going to modify htmlIndexToRichTextOffset() not to use this method for tag detection. At very least we can remove manual parsing and just fall back to some default, or maybe we can look at the output of create() to determine if something unexpected happened.
| // Candidate markers for insertion. We try each in order and pick the first | ||
| // one that does not already appear in the text, so existing content cannot | ||
| // collide with the marker we search for after parsing. | ||
| const MARKER_CANDIDATES = [ '\uFFFD', '\uFFFE', '\uFFFF' ] as const; |
There was a problem hiding this comment.
this approach should work, but this is a curious set of characters to use, particularly U+FFFD which serves in so many common roles.
as non-characters, U+FFFE and U+FFFF should work, but also we can try any of the private-use area characters, the easiest being those starting at U+E000
for what reasons did you choose these characters?
There was a problem hiding this comment.
\uFFFD is just the classic replacement character, but I wanted to add some backups in the unlikely case real content contained it. Happy to do whatever here though, would you recommend U+E000-U+E002?
There was a problem hiding this comment.
the private-use characters are for special purposes like this, contained within a system. they intentionally have no meaning and should never have meaning between systems.
U+FFFD has a very specific universal meaning and a lot of texts will contain it, oftentimes by accident (because, while it should be reserved for display-only, many tools end up replacing invalid byte sequences with it anyway)
|
@dmsnell I pushed up some changes to use private-use entities and remove the unhelpful manual HTML parsing. Currently I thought of a potential way to detect invalid HTML. We could parse the HTML once before adding the character and once after, and if the length isn't Please let me know if you have any more feedback here, it's been appreciated. |
| CURSOR_REDRAW_INTERVAL_MS | ||
| ); | ||
| return () => clearInterval( interval ); | ||
| }, [ cursors.length, rerenderCursorsAfterDelay ] ); |
There was a problem hiding this comment.
we should be careful with interval functions, especially if there’s any potential for long compute time. the problem is that they can stack up in the even queue and if the system experiences high load, it may not be able to run them fast enough to clear it out.
another available approach is to set a mean time between runs instead of schedulings
let timerHandle;
const runner = () => {
rerenderCursorsAfterDelay();
timerHandle = setTimeout( runner, CURSOR_REDRAW_INTERVAL_MS );
};
return () => clearTimeout( timerHandle );the benefit here is natural deferral of these updates when the editor is laggy and decreased pressure on the main thread. the risk is that if there’s an exception in that execution context then the timer won’t be reset. this is easy to fix by creating a light-weight scheduler on setInterval() whose only purpose is to ensure that the timer is set if it hasn’t been reset.
let timerHandle;
setInterval( () => {
if ( timerHandle ) {
return;
}
timerHandler = setTimeout( () => {
timerHandle = null;
runner();
}, 0 );
}, CURSOR_REDRAW_INTERVAL_MS );and should anyone find this intermingling ugly, there are ways to separate the recursion.
const setDelayedInterval( runner, CURSOR_REDRAW_INTERVAL_MS );There was a problem hiding this comment.
Cool pattern, and makes sense to me. Implemented in 648b72a.
There was a problem hiding this comment.
just noting that this function is already debounced (currently by 500ms)
| * @param text The string to check for existing marker characters. | ||
| */ | ||
| function pickMarker( text: string ): string | null { | ||
| for ( const candidate of MARKER_CANDIDATES ) { |
There was a problem hiding this comment.
just a considerably less-significant thought, but at this point is there an incredible need to predefined these? if we pick a candidate from the private-use area, of which we are almost certainly able to find one (typically the first try), we could just spin on the loop. is a count of three important?
for ( let marker = 0xE000; marker <= 0xF8FF; marker++ ) {
if ( ! text.includes( marker ) ) {
return marker;
}
}
return null;will it matter to us if we have distant markers for the start and end of the selection? (anchor and focus, of which the “start” can be after the “end” in the document for backwards selections).
if so, this candidate selection can test for pairs and there are simple schemes like even/odd to determine anchor and focus/start and end. a RegExp could even be faster, but probably isn’t.
There was a problem hiding this comment.
There's no reason that we have exactly 3, used this pattern in 5a6d535 but with a limit of 0x10 guesses.
There was a problem hiding this comment.
nice. you noticed and accounted for the defect in my code snippet. 👏
There was a problem hiding this comment.
I figured '\uE000' wasn't exactly the same as 0xE000 but I did have to test to find out.
…drawing operations if the main thread is busy
| * Like setInterval but measures the delay between the end of one run and the | ||
| * start of the next, so callbacks never stack up when the main thread is busy. |
There was a problem hiding this comment.
it does not appear to do this
There was a problem hiding this comment.
but i understand it as written, maybe just update this docblock for clarity
There was a problem hiding this comment.
I can see how "measures the delay" might not convey what I intended. I tried another go in 7ddd2fc!
| timerHandle = setTimeout( runner, delayMs ); | ||
| }; | ||
|
|
||
| // Restart the runner if an exception killed it |
There was a problem hiding this comment.
if we're worried about an exception in the callback, why not try / catch?
There was a problem hiding this comment.
This appears to work the same, with a bit less code: 0dd859f. Good idea.
There was a problem hiding this comment.
Also removed the console.log in 9fefe8c.
| // changes that don't trigger awareness state updates (e.g. a collaborator | ||
| // applying formatting shifts text but the cursor's logical position is | ||
| // unchanged). Only active when remote cursors are visible. | ||
| const CURSOR_REDRAW_INTERVAL_MS = 10_000; |
There was a problem hiding this comment.
Minor: Worth pulling this outside the component to line 12?
chriszarate
left a comment
There was a problem hiding this comment.
Tested, working very well
* Add failing e2e test for user cursor position after formatting * Use htmlIndexToRichTextOffset() & richTextOffsetToHtmlIndex() for conversions between text indexes * Update tests with new API * Add unit and e2e tests for index conversions * Add occassional cursor redraw * Fix tests failing due to missing redux mocks * Remove explicit page timeout * Pass a bool instead of a DOM element to avoid serialization error in test * Poll until cursor position changes in awareness test * Use private-use characters for markers * Remove within-tag detection in htmlIndexToRichTextOffset() * Add pinning tests for incorrect htmlIndex behavior * Use setTimeout wrapper instead of setInterval to avoid running heavy drawing operations if the main thread is busy * Move setDelayedInterval() to separate file * Use a loop to find an unused character instead of 3 hardcoded values * Update setDelayedInterval() docblock * Use try/catch instead of a guard interval * Move CURSOR_REDRAW_INTERVAL_MS constant to top of file, minor formatting changes * Remove console log on failure Co-authored-by: alecgeatches <alecgeatches@git.wordpress.org> Co-authored-by: dmsnell <dmsnell@git.wordpress.org> Co-authored-by: chriszarate <czarate@git.wordpress.org> Co-authored-by: ingeniumed <ingeniumed@git.wordpress.org>
|
I just cherry-picked this PR to the wp/7.0 branch to get it included in the next release: d5570fd |
This updates the pinned hash from the `gutenberg` from `8c78d87453509661a9f28f978ba2c242d515563b` to `487a096a9782ba6110a7686d7b4b2d0c55ed1b06`. The following changes are included: - Disables anchor support for the Page Break block. (WordPress/gutenberg#76434) - WP Admin: Update Connectors screen footer text for consistency. (WordPress/gutenberg#76382) - E2E Tests: Add coverage for AI plugin callout banner on Connectors page (WordPress/gutenberg#76432) - Update sync docs (WordPress/gutenberg#75972) - RTC: Add preference for collaborator notifications (WordPress/gutenberg#76460) - Fix "should undo bold" flaky test (WordPress/gutenberg#76464) - TimePicker: Clamp month day to valid day for month (WordPress/gutenberg#76400) - RTC: Fix error when entity record doesn't have 'meta' property (WordPress/gutenberg#76311) - Navigation: Update close button size. (WordPress/gutenberg#76482) - TemplateContentPanel: fix useSelect warning (WordPress/gutenberg#76421) - DataViews: Add spinner in `DataViewsLayout` in initial load of data (WordPress/gutenberg#76486) (WordPress/gutenberg#76490) - RTC: Fix TypeError in areEditorStatesEqual when selection is undefined (WordPress/gutenberg#76163) - Page/Post Content Focus Mode: Fix insertion into Post Content block (WordPress/gutenberg#76477) - Revisions: use useSubRegistry={false} to fix global store selectors (WordPress/gutenberg#76152) (WordPress/gutenberg#76522) - Fix RTL styling on Connectors, Font Library, and boot-based admin pages (WordPress/gutenberg#76496) - RTC: Auto-register custom taxonomy rest_base values for CRDT sync (WordPress/gutenberg#75983) - RTC: Add a limit for the default provider (WordPress/gutenberg#76437) - Fix RTL styling on AI plugin callout banner (WordPress/gutenberg#76497) - Add command palette trigger button to admin bar (WordPress/gutenberg#75757) - Block Bindings: Remove source items constrained by enums (WordPress/gutenberg#76200) - HTML Block: Remove "unsaved changes" check (WordPress/gutenberg#76086) - CI: Don't build release notes during plugin build workflow for WP Core sync (WordPress/gutenberg#76398) (WordPress/gutenberg#76578) - CI: Simplify strategy matrix in Build Gutenberg Plugin Zip workflow (WordPress/gutenberg#76435) (WordPress/gutenberg#76538) - Media: Add hooks and extension points for client-side media processing (WordPress/gutenberg#74913) - RTC: Fix list sidebar reset during real-time collaboration (WordPress/gutenberg#76025) - RTC: Fix CRDT serialization of nested RichText attributes (WordPress/gutenberg#76597) - RTC: Remove post list lock icon and replace user-specific lock text (WordPress/gutenberg#76322) - Fix HEIC upload error handling and sub-size format (WordPress/gutenberg#76514) - RTC: Fix cursor index sync with rich text formatting (WordPress/gutenberg#76418) - RTC: Allow filtering of `SyncConnectionModal` (WordPress/gutenberg#76554) - RTC: Implement front-end peer limits (WordPress/gutenberg#76565) - Navigation overlay close button may be displayed twice (WordPress/gutenberg#76585) - Site Editor > Templates: fix author filter (WordPress/gutenberg#76625) - Revisions: Show changed block attributes in inspector sidebar (WordPress/gutenberg#76550) - Fix IS_GUTENBERG_PLUGIN env var override in build config (WordPress/gutenberg#76605) - Real Time Collaboration: Introduce filters for the polling intervals. (WordPress/gutenberg#76518) - RTC: Fix RichTextData deserialization (WordPress/gutenberg#76607) - Cross Origin Isolation: Remove `img` from the list of elements that get mutated (WordPress/gutenberg#76618) - RTC: Scroll to collaborator on click (WordPress/gutenberg#76561) - Update changelog link for pull request 11276 (WordPress/gutenberg#76638) - Fix backport changelog filename (WordPress/gutenberg#76651) - Build: Skip non-minified build for WASM-inlined workers (WordPress/gutenberg#76615) - RTC: Change RTC option name (WordPress/gutenberg#76643) - BlockListBlock: fix crash when selectedProps are null (WordPress/gutenberg#75826) - Build: Fix vips worker 404 when SCRIPT_DEBUG is true (WordPress/gutenberg#76657) - useMediaQuery: support in-iframe queries via new `WindowContext` (WordPress/gutenberg#76446) (WordPress/gutenberg#76660) - Refactor admin-ui Page component to use @wordpress/theme tokens and @wordpress/ui layout primitive (WordPress/gutenberg#75963) - Connectors: Improve accessibility (WordPress/gutenberg#76456) - Build: Remove unused JXL WASM module from vips worker (WordPress/gutenberg#76639) - Connectors: fix button size (WordPress/gutenberg#76582) - Compose: Implement useCopyToClipboard and useCopyOnClick with native clipboard API (WordPress/gutenberg#75723) (WordPress/gutenberg#76663) - Connectors: Fetch specific plugin instead of all plugins (WordPress/gutenberg#76594) - Revisions: Add Meta fields diff panel to document sidebar (WordPress/gutenberg#76341) - Loosen client-side media processing requirements (WordPress/gutenberg#76616) - Reduce the added halo for selected block. (WordPress/gutenberg#76619) - Connectors: Add unregisterConnector and upsert support (WordPress/gutenberg#76541) A full list of changes can be found on GitHub: https://github.com/WordPress/gutenberg/compare/8c78d87453509661a9f28f978ba2c242d515563b…487a096a9782ba6110a7686d7b4b2d0c55ed1b06. Log created with: git log --reverse --format="- %s" 8c78d87453509661a9f28f978ba2c242d515563b..487a096a9782ba6110a7686d7b4b2d0c55ed1b06 | sed 's|#\([0-9][0-9]*\)|https://github.com/WordPress/gutenberg/pull/\1|g; /github\.com\/WordPress\/gutenberg\/pull/!d' | pbcopy See #64595. git-svn-id: https://develop.svn.wordpress.org/trunk@62063 602fd350-edb4-49c9-b593-d223f7449a82
This updates the pinned hash from the `gutenberg` from `8c78d87453509661a9f28f978ba2c242d515563b` to `487a096a9782ba6110a7686d7b4b2d0c55ed1b06`. The following changes are included: - Disables anchor support for the Page Break block. (WordPress/gutenberg#76434) - WP Admin: Update Connectors screen footer text for consistency. (WordPress/gutenberg#76382) - E2E Tests: Add coverage for AI plugin callout banner on Connectors page (WordPress/gutenberg#76432) - Update sync docs (WordPress/gutenberg#75972) - RTC: Add preference for collaborator notifications (WordPress/gutenberg#76460) - Fix "should undo bold" flaky test (WordPress/gutenberg#76464) - TimePicker: Clamp month day to valid day for month (WordPress/gutenberg#76400) - RTC: Fix error when entity record doesn't have 'meta' property (WordPress/gutenberg#76311) - Navigation: Update close button size. (WordPress/gutenberg#76482) - TemplateContentPanel: fix useSelect warning (WordPress/gutenberg#76421) - DataViews: Add spinner in `DataViewsLayout` in initial load of data (WordPress/gutenberg#76486) (WordPress/gutenberg#76490) - RTC: Fix TypeError in areEditorStatesEqual when selection is undefined (WordPress/gutenberg#76163) - Page/Post Content Focus Mode: Fix insertion into Post Content block (WordPress/gutenberg#76477) - Revisions: use useSubRegistry={false} to fix global store selectors (WordPress/gutenberg#76152) (WordPress/gutenberg#76522) - Fix RTL styling on Connectors, Font Library, and boot-based admin pages (WordPress/gutenberg#76496) - RTC: Auto-register custom taxonomy rest_base values for CRDT sync (WordPress/gutenberg#75983) - RTC: Add a limit for the default provider (WordPress/gutenberg#76437) - Fix RTL styling on AI plugin callout banner (WordPress/gutenberg#76497) - Add command palette trigger button to admin bar (WordPress/gutenberg#75757) - Block Bindings: Remove source items constrained by enums (WordPress/gutenberg#76200) - HTML Block: Remove "unsaved changes" check (WordPress/gutenberg#76086) - CI: Don't build release notes during plugin build workflow for WP Core sync (WordPress/gutenberg#76398) (WordPress/gutenberg#76578) - CI: Simplify strategy matrix in Build Gutenberg Plugin Zip workflow (WordPress/gutenberg#76435) (WordPress/gutenberg#76538) - Media: Add hooks and extension points for client-side media processing (WordPress/gutenberg#74913) - RTC: Fix list sidebar reset during real-time collaboration (WordPress/gutenberg#76025) - RTC: Fix CRDT serialization of nested RichText attributes (WordPress/gutenberg#76597) - RTC: Remove post list lock icon and replace user-specific lock text (WordPress/gutenberg#76322) - Fix HEIC upload error handling and sub-size format (WordPress/gutenberg#76514) - RTC: Fix cursor index sync with rich text formatting (WordPress/gutenberg#76418) - RTC: Allow filtering of `SyncConnectionModal` (WordPress/gutenberg#76554) - RTC: Implement front-end peer limits (WordPress/gutenberg#76565) - Navigation overlay close button may be displayed twice (WordPress/gutenberg#76585) - Site Editor > Templates: fix author filter (WordPress/gutenberg#76625) - Revisions: Show changed block attributes in inspector sidebar (WordPress/gutenberg#76550) - Fix IS_GUTENBERG_PLUGIN env var override in build config (WordPress/gutenberg#76605) - Real Time Collaboration: Introduce filters for the polling intervals. (WordPress/gutenberg#76518) - RTC: Fix RichTextData deserialization (WordPress/gutenberg#76607) - Cross Origin Isolation: Remove `img` from the list of elements that get mutated (WordPress/gutenberg#76618) - RTC: Scroll to collaborator on click (WordPress/gutenberg#76561) - Update changelog link for pull request 11276 (WordPress/gutenberg#76638) - Fix backport changelog filename (WordPress/gutenberg#76651) - Build: Skip non-minified build for WASM-inlined workers (WordPress/gutenberg#76615) - RTC: Change RTC option name (WordPress/gutenberg#76643) - BlockListBlock: fix crash when selectedProps are null (WordPress/gutenberg#75826) - Build: Fix vips worker 404 when SCRIPT_DEBUG is true (WordPress/gutenberg#76657) - useMediaQuery: support in-iframe queries via new `WindowContext` (WordPress/gutenberg#76446) (WordPress/gutenberg#76660) - Refactor admin-ui Page component to use @wordpress/theme tokens and @wordpress/ui layout primitive (WordPress/gutenberg#75963) - Connectors: Improve accessibility (WordPress/gutenberg#76456) - Build: Remove unused JXL WASM module from vips worker (WordPress/gutenberg#76639) - Connectors: fix button size (WordPress/gutenberg#76582) - Compose: Implement useCopyToClipboard and useCopyOnClick with native clipboard API (WordPress/gutenberg#75723) (WordPress/gutenberg#76663) - Connectors: Fetch specific plugin instead of all plugins (WordPress/gutenberg#76594) - Revisions: Add Meta fields diff panel to document sidebar (WordPress/gutenberg#76341) - Loosen client-side media processing requirements (WordPress/gutenberg#76616) - Reduce the added halo for selected block. (WordPress/gutenberg#76619) - Connectors: Add unregisterConnector and upsert support (WordPress/gutenberg#76541) A full list of changes can be found on GitHub: https://github.com/WordPress/gutenberg/compare/8c78d87453509661a9f28f978ba2c242d515563b…487a096a9782ba6110a7686d7b4b2d0c55ed1b06. Log created with: git log --reverse --format="- %s" 8c78d87453509661a9f28f978ba2c242d515563b..487a096a9782ba6110a7686d7b4b2d0c55ed1b06 | sed 's|#\([0-9][0-9]*\)|https://github.com/WordPress/gutenberg/pull/\1|g; /github\.com\/WordPress\/gutenberg\/pull/!d' | pbcopy See #64595. Built from https://develop.svn.wordpress.org/trunk@62063 git-svn-id: http://core.svn.wordpress.org/trunk@61345 1a063a9b-81f0-0310-95a4-ce76da25c4cd
What?
Closes #76057
Currently when two users are editing in the same rich text area, adding formatting causes the other user's cursor to jump:
cursor-jumps-with-formatting.mov
This is because we retrieve a user's selection from
getSelectionStart(), which returns their logical selection within text, not including formatting. However this selection offset is incorrectly anchored to the HTMLY.Textused by rich text attributes. That means there's a disconnect when formatting is applied. When text is bolded,<strong>/</strong>tags are added within the HTML, causing cursors later in text to jump 17 characters, and italics<em>/</em>cause a 9-character jump due to the mismatch in indexes.In this branch we correctly translate between selection (rich text) offsets and actual HTML offsets:
cursor-formatting-fixed.mov
Why?
The CRDT layer stores block content as raw HTML in
Y.Text, where a relative position index counts every character including HTML. The block editor uses rich-text offsets that count only visible text characters. These two coordinate systems were being used interchangeably, which worked by accident when no formatting changes occurred. However, when a remote user added or removed formatting, the indexes went out of sync and caused weird jumping behavior. Combined with typing in the same rich text field, this can break interaction within text if any users start to use HTML formatting.How?
The two main functions added to switch between text index and HTML indexes are
htmlIndexToRichTextOffset()andrichTextOffsetToHtmlIndex(). These go through Gutenberg's@wordpress/rich-textparser and avoids hand-rolling HTML tag counting. A marker character is inserted at the target position and its location is found after parsing/serializing, which gives us the index in the other coordinate system.These conversions are applied at four boundary points where the mismatch occurs: two write paths (
block-selection-history.tsandcrdt-user-selections.ts) where rich-text offsets are stored into Y.Text positions, and two read paths (crdt-selection.tsandpost-editor-awareness.ts) whereY.Textpositions are resolved back into rich-text offsets. The field was also renamed fromtextIndextorichTextOffsetfor clarity on what "index" refers to.I also added a periodic cursor redraw to deal with cases where formatting changes the width of previous characters but doesn't actually affect another user's relative position:
cursor-redraw.mov
Above, the visible user bolds a line of text, which causes another user's cursor to briefly appear in the incorrect location. Rather than listen for every formatting change and attempting a redraw, this solution allows cursors to catch-up without much performance overhead. This will also handle cases where editor content changes in other unexpected ways and allow RTC cursors and selections to automatically catch-up after a short time.
Testing Instructions
Vivamus diam purus, volutpat id auctor nec, vehicula nec eros.Tom & Jerryor<>) and verify cursors behave correctly when formatting is applied near the entity.