Fixes two problems related to emojis in PDF import.

  1. UTF-8 conversion.

    If a "ToUnicode" table is included in an OpenType font in a PDF file, one can find the Unicode code point that corresponds to a given glyph (or group of glyphs). This is often the only way one can reconstruct text from a PDF (which might contain only glyphs and glyph positions). For Emoji, the code points are outside the "Basic Plane" (code points that can be encode by four or fewer hexadecimal digits) and in are located the "Supplementary Multilingual Plane", a.k.a. "Plane 1". Code points in "Plane 1" are represented by a hexadecimal number of the form "1xxxx", where 'x' is any hex digit.

    Inkscape's PDF import code takes a Unicode code point and converts it to its UTF-8 representation. This code assumes that the code point can be represented by a gunichar2 which is typedef'ed from a guint16. The glib function g_utf16_to_utf8 is then used for the conversion. This in incorrect: a single guint16 can only represent a 4 hex-digit code point and then not all possible values (some values are used to indicate that a second 16-bit value is being used to to enable encoding a code point outside the basic plane).

    We already use std::wstring_convert<> and std::codecvt<> earlier in the same function to build up a string to store the original text in the 'aria-label' attribute. I changed the code to reuse that result. Note that these are deprecated and will be removed in C++26 so we'll eventually need to find a different solution.

  2. Empty paths.

    Emoji fonts usually have color. There are four competing methods of embedding color font data in an OpenType font. Two of these use bitmaps. If a font has only bitmap glyph data then there is no vector data to build a path. If there is no path, Inkscape's current code returns 'nullptr' instead of a pointer to a Path node. This triggers an assert. Simply removing the assert leads to other problems down the line. The simplest solution is to return a Path node with an empty "d" attribute. This also allows one to store the original text in the "aria-label" attribute of the Path node.

    In the future, we should be able to render bitmaps from fonts in the same way the same way that we render SVG OpenType fonts (which caches the glyphs as bitmaps).

Unfixed problems:

If the "Noto-Color-Emoji" font is present, it will be used for rendering emoji even if "Noto-Emoji" or "Symbola" is specified. It will also be used as a fallback font for rendering emoji. "Noto-Color-Emoji" has to "glyf" table and thus lacks vectorized paths. This leads to empty paths. It would be good to block the use of "Noto-Color-Emoji" in this case. It's not clear how easy it would be to do this.

What is even stranger is that the terminal will not show an Emoji if "Noto-Color-Emoji" is not installed even though the glyph that is shown is not from "Noto-Color-Emoji"! (It's from "Symbola".)

There appears to be incorrect logic in SvgBuider::_flushTextPath(). If the style changes inside the function, the node existing node is replaced, effectively orphaning the previous node. Whether this actually happens in PDF input is unknown.

Fixes #5235 (closed)

Merge request reports

Loading