Skip to content

Implement TeX's fraction and script alignment#31046

Open
QuLogic wants to merge 14 commits intomatplotlib:text-overhaulfrom
QuLogic:22852/mathtext-vertical-align
Open

Implement TeX's fraction and script alignment#31046
QuLogic wants to merge 14 commits intomatplotlib:text-overhaulfrom
QuLogic:22852/mathtext-vertical-align

Conversation

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Jan 28, 2026

PR summary

This is a rebase of #22852 by @tfpf. However, since we are planning to refresh the test images already, this reverts the change to move many mathtext images to SVG only (and I believe fixes some duplicated tests due to incorrect conflict resolution). Now, the only test change is to a nested-\frac test that is a duplicate and now is a nested-\dfrac test.

This is based on #30059 plus all the current test image changes, so that you can look at the second-last commit for this change and the last commit to review image changes from only this PR. I believe it does a fairly good job of fixing the fraction bar alignment issue that came up in #30059.

I have reviewed the fraction and sub/super script implementation with reference to the TeX book, but have not yet finished reviewing the font constants.

Fixes #18086
Fixes #18389
Fixes #22172

PR checklist

num_shift_up = consts.num2 * x_height
den_shift_down = consts.denom2 * x_height
min_clr = rule
delta = rule / 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to #22852, I added the intermediate delta variable to align a bit more with the TeX book algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I I find it surprisingly difficult to map this code onto Knuth's description at nodes 743~748, perhaps it could be made clearer by trying to keep as many variable names as possible (even though I certainly understand that some of them are being inlined)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the calculations of num_clr and den_clr just below these lines? The apparent difference in the logic arises from the fact that TeX and Mathtext are calculating different things: while TeX determines the amount to independently move the numerator and denominator by, Mathtext calculates the space above and below the fraction line to pack into a Vlist.

For instance, $delta1$ is a temporary variable whose only purpose is to ask whether the space between the descender of the numerator and the top of the fraction line is less than the minimum clearance. Which is the same as checking whether $delta1>0$.

Pascal presumably did not have min and max functions, which may explain the gymnastics used in the book. My opinion is that diverting from the current code might obfuscate the meaning: if I look at num_shift_up - cnum.depth - axis_height - delta, I instinctively read it as, 'Go up to baseline of the numerator, come down to its descender, and subtract the height of the axis and half its own vertical thickness,' which yields the space between the descender of the numerator and the fraction line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the clarification is very helpful. Perhaps the following

diff --git i/lib/matplotlib/_mathtext.py w/lib/matplotlib/_mathtext.py
index 3ab5aaa8be..04e0f90ab4 100644
--- i/lib/matplotlib/_mathtext.py
+++ w/lib/matplotlib/_mathtext.py
@@ -2692,14 +2692,14 @@ class Parser:
             if style is self._MathStyle.DISPLAYSTYLE:
                 num_shift_up = consts.num1 * x_height
                 den_shift_down = consts.denom1 * x_height
-                min_clr = 3 * rule
+                clr = 3 * rule  # The minimum clearance.
             else:
                 num_shift_up = consts.num2 * x_height
                 den_shift_down = consts.denom2 * x_height
-                min_clr = rule
+                clr = rule  # The minimum clearance.
             delta = rule / 2
-            num_clr = max(num_shift_up - cnum.depth - axis_height - delta, min_clr)
-            den_clr = max(axis_height - delta + den_shift_down - cden.height, min_clr)
+            num_clr = max((num_shift_up - cnum.depth) - (axis_height + delta), clr)
+            den_clr = max((axis_height - delta) - (cden.height - den_shift_down), clr)
             vlist = Vlist([cnum,                # numerator
                            Vbox(0, num_clr),    # space
                            Hrule(state, rule),  # rule

(and ditto for the case without fraction rule) would be clearer, in that it follows more directly Knuth's variable names and groupings, while maintaining the higher degree of abstraction provided by python?

@QuLogic
Copy link
Member Author

QuLogic commented Jan 28, 2026

WRT the original #22852 (comment):

  • r'$\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+x}}}}}}}$' (There are neither fractions nor superscripts or superscripts. And yet, this image failed the image comparison test.)

Also unsure about this one; it appears that one square root sign is sized a bit different?

  • r"$f'\quad f'''(x)\quad ''/\mathrm{yr}$" (Looks like apostrophes affected by the superscript positioning.)

Seems to be the case, but apostrophes are handled by the subsuper method, so this seems okay.

  • test_operator_space (In this check_figures_equal test, the 's' of 'cos' is displaced slightly compared to the reference image, even though 'co' is placed identically. This is strange, because operator kerning should have remained unchanged.)

This is no longer failing; I think this may be one of the rounding issues for the initial character found (and fixed) in #30059.

@QuLogic QuLogic requested a review from anntzer January 28, 2026 09:46
@QuLogic
Copy link
Member Author

QuLogic commented Jan 28, 2026

TODO: Since the height of fractions is a little bigger, I think I may need to tweak the AutoHeightChar tests to ensure that all sizes are correctly included.

@QuLogic
Copy link
Member Author

QuLogic commented Jan 29, 2026

I've gone through the sub/super script changes and they seem fine as well, other than a couple known shortcuts.

I'm only uncertain about the constants; they may have only been chosen to minimize image changes instead of matching TeX. For CM, it was also calculated with our current hinting settings and not the defaults that have been switched to in the text-overhaul branch.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo re basing and shortening the constants.

@tacaswell tacaswell moved this from Waiting for other PR to Ready for Review in Font and text overhaul Jan 29, 2026
@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from b8b8063 to b6c10a0 Compare January 29, 2026 23:21
@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from b6c10a0 to 75c11c0 Compare January 29, 2026 23:32
@QuLogic
Copy link
Member Author

QuLogic commented Jan 31, 2026

While looking at the TeX algorithms, I found out about tftopl which can show some metrics from TFM files. For Computer Modern, I looked at the output for cmsy10:

(FAMILY CMSY)
(FACE O 352)
(CODINGSCHEME TEX MATH SYMBOLS)
(DESIGNSIZE R 10.0)
(COMMENT DESIGNSIZE IS IN POINTS)
(COMMENT OTHER SIZES ARE MULTIPLES OF DESIGNSIZE)
(CHECKSUM O 4110426232)
(FONTDIMEN
   (SLANT R 0.25)
   (SPACE R 0.0)
   (STRETCH R 0.0)
   (SHRINK R 0.0)
   (XHEIGHT R 0.430555)
   (QUAD R 1.000003)
   (EXTRASPACE R 0.0)
   (NUM1 R 0.676508)
   (NUM2 R 0.393732)
   (NUM3 R 0.443731)
   (DENOM1 R 0.685951)
   (DENOM2 R 0.344841)
   (SUP1 R 0.412892)
   (SUP2 R 0.362892)
   (SUP3 R 0.288889)
   (SUB1 R 0.15)
   (SUB2 R 0.247217)
   (SUPDROP R 0.386108)
   (SUBDROP R 0.05)
   (DELIM1 R 2.389999)
   (DELIM2 R 1.01)
   (AXISHEIGHT R 0.25)
   )

Comparing the values, we have:

metric current tfm tfm/xheight
supdrop 0.354296875 0.386108 0.896768125
subdrop 0.354296875 0.05 0.116129182
sup1 0.79716796875 0.412892 0.958976205
sub1 0.354296875 0.15 0.348387546
sub2 0.5314453125 0.247217 0.57418216
num1 1.5 0.676508 1.571246415
num2 1.5 0.393732 0.914475503
num3 1.5 0.443731 1.030602362
denom1 1.6 0.685951 1.593178572
denom2 1.2 0.344841 0.80092206

Remember that we multiply everything by x-height, so here I've divided that metric out in the last column. Also, we use subdrop in for both sub/superscripts, so it appears as the current values for both.

So, the numbers we have for sub1, sub2, are fairly close. subdrop / subdrop are quite different from what we have, and sup1 is also a bit bigger, but if we change the code to apply each separately, then it's mostly that superscripts drop by another pixel or so. If we make this change, about 100 test images are affected.

For the fraction metrics (num1, num2, num3, denom1, denom2) there is a quite a bit of a difference in some of them. If we apply those, it affects about 70 test images, and mostly fractions "close up" a bit, but this is closer to how TeX does it.

Before we had this (with usetex=True on the right):
Figure_1
and with these updated constants we have:
Figure_2
It's still a tiny bit off, but I think that's because we aren't the ones rendering the usetex text; with #30039 it's even closer.

I haven't found any corresponding TFM files with the same metrics in them for the other fonts we have. I believe modern LaTeX can synthesize these directly from the font, and I do see some embedded MATH tables in there, but I haven't worked out the conversions for those yet.

I've pushed the changes to the Computer Modern constants as two commits for ease of review.

@tfpf
Copy link
Contributor

tfpf commented Jan 31, 2026

This extra +/- rule is odd compared to TeX, but it is explained in the original PR

Vlists containing Hrules don't render vertical spaces correctly. The code for reproduction given in #23763 still works!

  • test_operator_space (In this check_figures_equal test, the 's' of 'cos' is displaced slightly compared to the reference image, even though 'co' is placed identically. This is strange, because operator kerning should have remained unchanged.)

This is no longer failing; I think this may be one of the rounding issues for the initial character found (and fixed) in

I reported this as a bug back then (#23474). Changing the script rendering logic is what exposed it. After merging a fix (#23482), the test passed even with the updated logic.

I'm only uncertain about the constants; they may have only been chosen to minimize image changes instead of matching TeX.

🎯💯

subdrop / subdrop are quite different from what we have, and sup1 is also a bit bigger, but if we change the code to apply each separately, then it's mostly that superscripts drop by another pixel or so. If we make this change, about 100 test images are affected.

For the fraction metrics (num1, num2, num3, denom1, denom2) there is a quite a bit of a difference in some of them.

I remember being puzzled that using the same constants (from either Computer Modern or Latin Modern—I can't remember which one I had checked) didn't yield neatly aligned denominators. The multiplication by the x-height appears to be the key, which didn't strike me then.

sup1 = 0.79716796875
sub1 = 0.354296875
sub2 = 0.5314453125
supdrop = 0.386108 / 0.430555
Copy link
Contributor

@anntzer anntzer Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get more accurate (effectively fixed-point representation) values from the tfm file, as tftopl prints values after scaling and drops some decimal points, e.g. for cmsy10 I get
slant=262144, space=0, space_stretch=0, space_shrink=0, x_height=451470, quad=1048579, extra_space=0 num1=709370, num2=412858, num3=465286, denom1=719272, denom2=361592, sup1=432949, sup2=380520, sup3=302922, sub1=157286, sub2=259226, supdrop=404864, subdrop=52429, delim1=2506096, delim2=1059062, axis_height=262144
(values need to be scaled by 2**20).

I extended the Tfm class to read these values at anntzer@a360989 if you want to try your hand at it (we don't need to decide now whether we actually want to integrate this functionality into the Tfm class).

Note that this also allows reading the actual definition of 1em and 1ex ("quad" and "x_height"), which should be better than the current approach of trying to guess the values (see @tfpf's comment

I reported this as a bug back then (#23474). Changing the script rendering logic is what exposed it. After merging a fix (#23482), the test passed even with the updated logic.

the linked threads (#23474 (comment)), and https://tex.stackexchange.com/a/98139).
Perhaps also worth fixing this properly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's very useful, and might be something to implement along with stuff needed for #31048. For now, I've just put the numbers you have read directly into the file. They're very close and it only affects one image.

For quad / x_height, I can take a look, but we can also make that change independently from this one, I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure (though it'll probably again break all images, so it has to be done in the text-overhaul branch too).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I just noticed that fontTools has a TFM parser; I'm not sure how extensive it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a very quick look it looks complete; you can run python -mfontTools.tfmLib foo.tfm and in particular the params above will be printed (again they are scaled by 2**20 but all the decimal points are given so you can just remultiply back).

@anntzer
Copy link
Contributor

anntzer commented Jan 31, 2026

Vlists containing Hrules don't render vertical spaces correctly. The code for reproduction given in #23763 still works!

It appears to me that this specific issue has been fixed by #30059, can you confirm? (I haven't looked carefully yet at how this interacts with the fraction rendering.)

@anntzer
Copy link
Contributor

anntzer commented Feb 1, 2026

re: the +/-rule issue. Empirically it looks like the +/-rule can be removed by changing how Rules are being rendered; i.e. the following patch appears to work:

diff --git i/lib/matplotlib/_mathtext.py w/lib/matplotlib/_mathtext.py
index f45e1044bc..bfe3f3ec29 100644
--- i/lib/matplotlib/_mathtext.py
+++ w/lib/matplotlib/_mathtext.py
@@ -1424,7 +1424,7 @@ class Rule(Box):

     def render(self, output: Output,  # type: ignore[override]
                x: float, y: float, w: float, h: float) -> None:
-        self.fontset.render_rect_filled(output, x, y, x + w, y + h)
+        self.fontset.render_rect_filled(output, x, y - h, x + w, y)


 class Hrule(Rule):
@@ -2687,9 +2687,9 @@ class Parser:
             den_clr = max(axis_height - delta + den_shift_down - cden.height, min_clr)
             # Possible bug in fraction rendering. See GitHub PR 22852 comments.
             vlist = Vlist([cnum,                     # numerator
-                           Vbox(0, num_clr - rule),  # space
+                           Vbox(0, num_clr),  # space
                            Hrule(state, rule),       # rule
-                           Vbox(0, den_clr + rule),  # space
+                           Vbox(0, den_clr),  # space
                            cden                      # denominator
                            ])
             vlist.shift_amount = cden.height + den_clr + delta - axis_height

It would seem credible to me that we got the definition of boxes mixed up at some point (especially due to the mix between upwards y's and downwards y's, which always confuses me), though I haven't actually tracked that down.

Edit: I convinced myself that the patch is correct. At the call site

cur_v += rule_height
p.render(output,
cur_h + off_h, cur_v + off_v,
rule_width, rule_height)
the box should clearly go from cur_v + off_v to cur_v + off_v - rule_height (this is why cur_v is shifted by + rule_height just before; also at that point some print debugging indicates that y's go downwards), so Rule.render should indeed call render_rect_filled from y - h to y.

@QuLogic
Copy link
Member Author

QuLogic commented Feb 3, 2026

Happy to see you're still around @tfpf and thanks for answering any questions we have.

re: the +/-rule issue. Empirically it looks like the +/-rule can be removed by changing how Rules are being rendered; i.e. the following patch appears to work:

If I make this change, it affects 65 test images. Most of them do not contain fractions, but \overline and square roots. In the case of heavily nested roots (image 53), it actually looks better aligned. There are several tips of the ⎷ that are about a pixel above the corresponding bar. I'd need to check the overlines. Since this almost always shifts up rects by one pixel (after snapping, presumably), I guess this is the original origin of the incorrect fraction bar?

@anntzer
Copy link
Contributor

anntzer commented Feb 3, 2026

Happy to see you're still around @tfpf and thanks for answering any questions we have.

Yes, I didn't mention this, but: @tfpf: I'm sorry your (nice!) work didn't go in last time, turns out we needed an even bigger overhaul of many parts to fix all the issues; thanks for coming back to help. I'll also take that opportunity to thank @QuLogic for keeping all the moving parts together and taking care of the more-or-less complete patches I keep throwing around 😅

re: the +/-rule issue. Empirically it looks like the +/-rule can be removed by changing how Rules are being rendered; i.e. the following patch appears to work:

If I make this change, it affects 65 test images. Most of them do not contain fractions, but \overline and square roots. In the case of heavily nested roots (image 53), it actually looks better aligned. There are several tips of the ⎷ that are about a pixel above the corresponding bar. I'd need to check the overlines. Since this almost always shifts up rects by one pixel (after snapping, presumably), I guess this is the original origin of the incorrect fraction bar?

I suspect so? Not that I looked.

@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from b2afc68 to df479e9 Compare February 3, 2026 10:21
Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various small points remain (reorder/rewrite the num/den clearance calculations to follow Knuth more closely, whether to get x-height before or after shrinking, x-height/axis-height detection) but I'll let @QuLogic decide whether he wants to do that in this PR (self-merge is fine I think?) or open separate issues to track them.

@anntzer
Copy link
Contributor

anntzer commented Feb 3, 2026

By the way, https://www.tug.org/TUGboat/tb27-1/tb86jackowski.pdf has some nice figures explaining fraction layout, and https://www.tug.org/~vieth/papers/bachotex2008/math-font-paper.pdf some more discussion on inferring font parameters (though neither the x-height nor the axis-height...).

@QuLogic
Copy link
Member Author

QuLogic commented Feb 4, 2026

According to The LaTeX Font Catalogue, I don't think DejaVu Sans provides math support in LaTeX, so I'm a bit uncertain how to compare our math constants with it. Opening up the MATH table with fontTools and picking out the values from https://ntg.nl/maps/38/03.pdf it seems that pretty much all of them are 0 and thus useless.

Instead, I've managed to find values in FontForge's TeX tables in a slightly-hidden settings page, then dividing by the xheight in that same setting page (1120), we get:

metric current TeX table TeX / xheight
supdrop 0.4 790.527 0.705827679
subdrop 0.4 102.4 0.091428571
sup1 0.7 845.824 0.7552
sub1 0.3 307.199 0.274284821
sub2 0.5 632.832 0.565028571
num1 1.4 1529.86 1.365946429
num2 1.5 868.352 0.775314286
num3 1.3 970.752 0.866742857
denom1 1.3 1548.29 1.382401786
denom2 1.1 768 0.685714286

supdrop/subdrop are quite different, but that's because they used to be one constant. num2/num3/denom2 all shrunk quite a bit, but the same thing happened with Computer Modern above.

Before, we had:
DejaVu Sans before
and now we get:
DejaVu Sans after

Note that the right side of the figure is usetex=True with \usepackage{arev} for the preamble. These are both from the same lineage of Bitstream Vera Sans, but I'm not sure if they're comparable if DejaVu Sans never had math support, so take those with a grain of salt, especially since the x doesn't really look the same style.

DejaVu Serif has the same metrics, except the xheight is 1063 instead, so I won't post the table over again. Instead, just before and after images:
DejaVuSerif before
DejaVuSerif after

@anntzer
Copy link
Contributor

anntzer commented Feb 4, 2026

There is a dejavu math font: https://www.gust.org.pl/projects/e-foundry/tex-gyre-dejavu-math / https://ctan.org/pkg/tex-gyre-math-dejavu?lang=en but it's only a serif font :/ not sure how well the constants would compare?

@QuLogic
Copy link
Member Author

QuLogic commented Feb 4, 2026

Well, we do have DejaVu Serif as well with almost the same metrics; only the xheight is different at 1063 instead of 1120 (updated above with those images as well.) But it looks like the TeX Gyre fonts are for unicode-math and would require the other LaTeX engines to plug in to our usetex setup, too.

@llohse
Copy link

llohse commented Feb 4, 2026

Well, we do have DejaVu Serif as well with almost the same metrics; only the xheight is different at 1063 instead of 1120 (updated above with those images as well.) But it looks like the TeX Gyre fonts are for unicode-math and would require the other LaTeX engines to plug in to our usetex setup, too.

#31064 enables to use the TeX Gyre fonts from within usetex. Currently, it does not dynamically set the "constants" but uses the defaults, but one could in principle read the otf MATH table.

@QuLogic
Copy link
Member Author

QuLogic commented Feb 4, 2026

For STIX, I again took the TeX tables and divided by xheight (450):

metric current TeX table TeX / xheight
supdrop 0.4 386 0.857777778
subdrop 0.4 50.0002 0.111111556
sup1 0.8 413 0.917777778
sub1 0.3 150 0.333333333
sub2 0.6 309 0.686666667
num1 1.6 747 1.66
num2 1.6 424 0.942222222
num3 1.6 474 1.053333333
denom1 1.6 756 1.68
denom2 1.1 375 0.833333333

It's largely different in the same places as before, I think.
Before:
STIX before
After:
STIX after

And for STIX Sans, it's the same font, so it's the same final metrics, but the metrics were slightly different before:

metric current TeX table TeX / xheight
supdrop 0.4 386 0.857777778
subdrop 0.4 50.0002 0.111111556
sup1 0.8 413 0.917777778
sub1 0.3 150 0.333333333
sub2 0.5 309 0.686666667
num1 1.5 747 1.66
num2 1.5 424 0.942222222
num3 1.5 474 1.053333333
denom1 1.5 756 1.68
denom2 1.1 375 0.833333333

Before:
STIX Sans before
After:
STIX Sans after

QuLogic and others added 14 commits February 4, 2026 19:55
As described in *TeX: the Program* by Don Knuth.

New font constants are set to the nearest integral multiples of 0.1 for
which numerators and denominators containing normal text do not have to
be shifted beyond their default shift amounts at font size 30 in display
and text styles. To better process superscripts and subscripts, the
x-height is now always calculated instead of being retrieved from the
font table (which was the case for Computer Modern); the affected font
constants have been changed.

A duplicate test was also fixed in the process.
These values are taken from `cmsy10.tfm`, and divided by the x-height in
that output to match the scale used in Matplotlib.

Also, split the `supdrop`/`subdrop` constants to match with TeX's
algorithm.
These values are taken from `cmsy10.tfm`, and divided by the x-height in
that output to match the scale used in Matplotlib.
@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from 39afc75 to a77b250 Compare February 5, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Ready for Review

5 participants