Opened 4 months ago
Last modified 3 months ago
#63769 new defect (bug)
Pasting from MS Word can convert abbreviations to ordered lists
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | TinyMCE | Keywords: | has-patch |
| Focuses: | javascript | Cc: |
Description
Hi folks - long time bug in classic editor I keep forgetting about until it happens (every time). Rewrite bug with certain text strings when copy pasting from MS Word edited text.
Basically there is a funky thing when using classic editor, when copying clean text from MS Word (docx) file (by clean, I mean remove any oddball encoding like superscripts, double spaces, double line breaks, etc.) into the post editor...
and if the first paragraph starts with, oh say, a dateline of ST. LOUIS, Mo., July 31, 2025 --- blah.... the editor rewrites the ST. LOUIS as <ol><li>LOUIS, Mo. ...etc. ... </li></ol> where the entire first paragraph is wrapped in ol and the ST. is removed.
This happens with another string we use sometimes, which I think is also ST. -- basically 'ST.' being rewritten as an <ol></ol> wrapper, for no apparent reason.
Not world ending, but happens every time for us.
Attachments (2)
Change History (10)
#2
@
4 months ago
- Version 6.8.2 deleted
As a workaround, you could expand the toolbar to two lines and activate the 'Paste as text' button.
#3
@
4 months ago
Ah, yes - that was the other one I was thinking of specifically
ST. LOUIS, MO
FT. LAUDERDALE, FL
(etc.)
We have "datelines" at the start of first paragraph for each news story.
We use MS Word then clean up, but it's faster for composing since we can have live links, bolded sub-heads for topics (no h2/h3, just bold face). Sometimes a disclaimer in italic at bottom, without having manually do that in the MCE.
I am not sure the 'paste as text' keeps the inline links, proper line breaks, bold and whatnot. I seem to recall we used to do that years back, but other than this weirdo ol list conversion, MS Word modern docx imports when done from CLEAN docs, work perfectly.
Seems like converting to an ordered list would make more sense *only* when a paragraph has
- Tacos are not toes
Which it does, actually.
This one has been bugging me for years, and I keep totally forgetting about it as only run into this couple of times a year it seems. More so lately.
This ticket was mentioned in PR #9398 on WordPress/wordpress-develop by @sabernhardt.
4 months ago
#4
- Keywords has-patch added; needs-patch removed
Removes patterns of Roman numerals and letters from isNumericList() to avoid inappropriately converting them to ordered lists when pasting paragraphs from MS Word.
Numbers at the beginning of paragraphs would still become ordered lists.
#5
@
4 months ago
- Summary changed from Long time bug in classic editor to Pasting from MS Word can convert abbreviations to ordered lists
I removed the Roman numerals in addition to other letters because:
- The most common and appropriate numeral to convert would be
I.(one), but that would not fit if the paragraph starts with a name such as I. M. Pei. - Other initials with IVXLCDM would have similar issues.
- M. and MM. are French abbreviations for Monsieur and Messieurs ("M. Dupont", "MM. Dupont et Durand").
- The version in WordPress did not get the update to support longer numerals (iii, vii, xii, xiii, etc.).
Even though TinyMCE Paste dropped the entire filter, I kept the pattern for numbers.
#7
@
3 months ago
happens every time for me using classic editor, latest WP, first paragraph pasted in from MS Word where first paragraph starts with something like ST. LOUIS, Mo., blah blah.
e.g.,
ST. ALBANS, W.Va.,
will turn into
<oL ALBANS, W.Va., ... </ol>
#8
@
3 months ago
Steps to reproduce:
- Install and activate Classic Editor or a similar plugin. If you want to use Playground, you can add the plugin name in the URL:
https://playground.wordpress.net/?plugin=classic-editor - Add a new post.
- Open a document such as 63769.docx in Microsoft Word.
- Copy full lines (paragraphs), either individually or as a group, and paste them into the Visual tab of the editor in WordPress. (Do not use the "Paste as text" option.)
- Check whether each pasted paragraph is converted to a list item within an ordered list.
This could go back as early as WordPress 4.0, when r29458 updated to TinyMCE 4.1.3. I reproduced it in WP 6.8.2 (with Classic Editor 1.6.7) and in WP 4.8.25.
The regular expressions for converting pasted content to an ordered list can match any of the examples below if they are at the beginning of a paragraph copied from a Word document. (The pattern is one English letter, or two in the same letter case, followed by either a dot or closing parenthesis and then a space.)
Later in TinyMCE development, TINY-7368 fixed the Roman numeral detection before TINY-7493 removed
WordFilter.jsentirely.Changes to the WordPress version of TinyMCE should be limited, but the
[a-z]and[A-Z]matches might be inappropriate more often than they are helpful.