-
-
Notifications
You must be signed in to change notification settings - Fork 102
refactor(types): Code cleaning and LDoc comments on node types #2256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| end | ||
|
|
||
| nodetypes.zerovglue = pl.class(nodetypes.vglue) | ||
| --- A zerovglue is a standard vglue with zero height and depth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This zerovglue class is dubious
- Only used in
tests/class/testsidenote... - But it's not having a `.type = "zerovglue",
- so it's really equivalent to
SILE.types.node.vglue(...)(it doesn't even an an init to ensure it's not called with dimension parameters that wouldn't make it non-zero) - a
is_zerovgluewon't be defined, yet the baseboxuses it (and that's the only occurrence in the code) - we don't have a companion zeroglue for horizontal mode ;)
- so it's really equivalent to
Likely not needed at all --> Suggestion = removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tentatively agree with immediate deprecation and eventual removal on one condition: zerohbox can go away with it. Conceptually they seem the same even if one is more utilized than the other, they just break the immediate sibling relationship of other nodes by stuffing something in the queue that later processing can short-circuit and not bother looking deeper. If this is a valid use case in horizontal mode, it should be valid in vertical mode too. The processing logic isn't that different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, zerohglue is not really used for anything.
Unfortunately zerohbox is used, and I haven't yet passed the point I understand why it is needed. (Sometimes, just to kill parindent, for instance... So something else is deeply broken as far as concepts are concerned, but we are not yet there... (And I would tend to think that vertical processing is a very different beast. We don't need to kill all monsters on the same day, says Grandpa (returning from the Land of Maths, he ought to know that some beasts are slow to slay properly 🤣 )
| -- FIXME TODO | ||
| -- Here we cannot do: | ||
| -- nodetypes.vkern.type = "vkern" | ||
| -- vkern.type = "vkern" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I really want to know what "explicit"/"discardable" mean semantically.
A proper vkern makes sense the same way we have a proper kern.
(I'll add my findings and a bit of discussion below later EDIT added in another comment)
| -- and "height" is always in the "pageDirection" | ||
| -- | ||
| -- @treturn SILE.types.length The width or height of the box, depending on the orientation. | ||
| function box:lineContribution () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never understood this despite multiple attempts.
- "mistfit" is a weird concept
- When this method is used in
hbox:scaledWidththe second argument is always self.width.
Franky incomprehensible, one can never know what is the expected code (all the more with such self-targeting calls toSU.rationWidth, another very obscure concept everywhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misfit seems like a case of bad naming. This is only used for CJK tate, and I assume means an element whose internal contents and hence dimensioning are rotated 90 degrees from the current flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we renamed this property from misfit to orthogonal or crossways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -- | ||
| -- @type migrating | ||
| -- @usage | ||
| -- SILE.types.node.migrating({ material = ... }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class can be passed anything (no constructor to enforce anything), and is thus very loosely defined.
What material is supposed to be is also unclear (a list of vboxes? insertions-specific nodes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- typesetter.pushBack passes a list of 1 vbox (?)
- insertions use a vbox-derived SILE.types.node.insertion (which is not defined here)
This has code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smell stink.
Is it possible this should be at attribute some other node type? All nodes perhaps, or only on a special wrapper node type that we're missing kind of like your parbox—basically just a nesting structure by which to wrap another set of sequential nodes that would otherwise have been in the queue at the same level?
| end | ||
|
|
||
| getmetatable(nodetypes.unshaped).__index = function (_, _) | ||
| getmetatable(unshaped).__index = function (_, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
The commented code is not helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It appears to only be used internally for dubious use cases: passing auto generated spaces and hyphens? Maybe because those things are being generated before the surrounding content is shaped so the context isn't ready yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. Would be tempted to have it removed, and see where SILE breaks, if it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. A vast majority of our regression suite outputs blank space if I remove this node type and adjust anywhere unshaped pushes happen to push an hbox node. Notably much of the BiDi segmenting falls apart, your italic correction code would be removed, etc. Somehow most of our nodes are starting out as unshaped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow ^^
types/node.lua
Outdated
| -- The depth of the vbox is the depth of the last box. | ||
| -- | ||
| -- @tparam box|table box A box or a list of boxes to add to the vbox. | ||
| function vbox:append (box) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parameter is sometimes a single box, sometimes a vboxlist. Unclear API?
It could take both, but the check below is that if it is a box, it can be unboxed. Lot's of undocumented assumptions.
- "insertions" use a vboxlist (material to split) or a box (via its own vbox-derived class).
- If not mistaken, the only other use case is
pagebuilder.collateVboxeswith a vboxlist. This method by the way has nothing to do with the pagebuilder. It's a fairly dumb convenience, and see above comment on the constructor.
| --- Constructor | ||
| -- | ||
| -- @tparam table spec A table with the properties of the vbox. | ||
| function vbox:_init (spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember having suffered here. As with all of these classes, one can pass almost anything and the box class makes it a field. So if called with SILE.types.node.vbox({ nodes = ... } then the nodes will be set after the parent class super call. Ok... and it's indeed used this way with lines in the typesetter.
- So why does pagebuilder.collateVboxes uses an empty constructor and an explicit append()?
- ... Oh wait, the depth/height logic then differs (here, the max, but with append it's "the box height is the height is the total height of the content, minus the depth of the last box."
If the discrepancy is legit (?) then some explanation is needed, 'cause again one can't make any sense of it currently.
| -- @type zerohbox | ||
| local zerohbox = pl.class(hbox) | ||
| zerohbox.type = "zerohbox" | ||
| zerohbox.value = { glyph = 0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered what this standalone use of glyph in this module here actually is...
I think it comes from the pango shaper (regex-searched for occurrences, so maybe I haven't been exhaustive): It's shaper:shapeToken does return a "glyph" (vs. value.glyphString etc. with harfbuzz).
But I've a serious concern here. Nodes ought to be shaper-agnostic, at least at some level. Or is every potential shaper implementation going to put its crap in the nodes? And how can we expect using them correctly in packages etc. then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly why I've been nursing along the Pango shaper at various points just to verify that our shaper API is sane. I don't know anything about this particular case, but in order to be shaper agnostic our nodes do need enough information attached to please any reasonably sane shaper, no?
If we were to drop down to just one shaper our API surface area would be smaller, but SILE itself would be far less robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SILE itself would be far less robust.
Hmm. I am not convinced the current state of art is "robust" in any sense I can give to that word. ^^
| -- Derived from `hbox`. | ||
| -- | ||
| -- Properties is_zerohbox (and convenience is_zero) and is_box are true. | ||
| -- Note that is_hbox is NOT true: zerohbox are used in a specific context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in a specific context" = I wrote this here but I have no idea when a zerohbox is used and should differ from an hbox with 0-dimensions !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And no specific constructor, so if passed with dimensions etc. it's a zerohbox with dimensions. Maybe a zerohbox is not a hbox with zero width, height and depth, after all, but something wholly alien. Who knows what was the intent of the Architect.
| local nnode = pl.class(hbox) | ||
| nnode.type = "nnode" | ||
| nnode.language = "" | ||
| nnode.pal = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me die in utter pain 🤣 What is this? Ahhh might have found it. The Pango shaper again! (See another comment relating to it.) OMG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the next review we should do is the shapers, to see if the API can be made more generic, no?
If we can't avoid enriching/annotating (n)nodes with shaper-specific stuff, at least we ought to put them in a subobject. nnode.pal --> nnode._shaper_specific.pango.pal or anything similar so one knows that the data is somewhat internal and tied to a shaper... for reasons...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there even builds of SILE with the pango shaper? 'Cause keeping a software maintainable is also the take, sometimes, the hard but necessary decision to cut some branches in the tree, if they never blossomed.
I am not even sure it is useful as an "example" of anything, since it probably breaks everywhere it can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I periodically test with Pango yes. For a couple reasons, first to keep the existing libtexpdf outputter API sane. If changes I make there don't readily apply to the Cairo outputter then the API shape might be wrong. I do periodically check. Also, Pango can be used with a GTK canvas for live previews of the state and ever incremental output. I've played with this some with an eventual GUI project in mind.
I just made some small tweaks from recent changes so the Cairo/Pango combo will work in v0.15.11 should work again — with the caveat that I could only get the Pango font loader to work on an ancient version of Pango. It returns no glyphs when pained with current versions of Pango and I don't have time to debug that right now.
| -- The notion of parent is used by the hyphenation logic and discretionaries. | ||
| -- | ||
| -- @treturn table A list of nnodes representing the shaped text. | ||
| function unshaped:shape () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In relation to that discussion
Unless mistaken, that the only method (that is, besides SILE.outputter in output routines, a topic worth discussing separately) when we invoke an external (global) object currently from a node, the SILE.shaper. The later will eventually need the language.
|
|
||
| --- Constructor | ||
| -- | ||
| -- @tparam table spec A table with the properties of the unshaped node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again those "undefined" spec.
The only meaningful use is SILE.types.node.unshaped({ text = "string", options = { ... }})
We should make it clear what all these nodes take as contructor argument. Again I understand we can pass any key, and they'll end up in the instance via the generic box... But it's not clear what to expect, and it's certainly not a good object-oriented design, if any node can be abused in so many ways.
|
|
||
| function unshaped:_init (spec) | ||
| self:super(spec) | ||
| self.width = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. The only reason we may have to clear self.width is if a width was passed to the super constructor. The box class is too clever for its own.
| box.width = nil | ||
| box.misfit = false | ||
| box.explicit = false | ||
| box.discardable = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To read last 🤣
So we have this concept of explicit/discardable.
The explicit property is only set "directly" from the calling code (direct affectation, or sometimes passed to the constructor).
Two booleans = four combinations...
Let's try to see what we have:
| Node | discardable | explicit |
|---|---|---|
| normal glue | true | false |
| explicit glue | false | true |
| kern | false | false |
| normal vglue | true | false |
| explicit vglue | false | true |
| vkern | false | false |
| penalty | true | false |
| other box | false | false |
| (insertion) | true | false |
Though 4 combinations, but (true, true) is not possible apparently, so 3 combinations only.
I thought I knew my old (TeX-derived) box and glue model, but can't find something similar in the TeXBook, unless it has a very different name. What does it mean and imply?
Why are smallskip/medskip etc. "explicit" for instance?
In the code base, discardable nodes are sometimes skipped. Sounds logical.
Sometimes explicit nodes are skipped, regardless of being discardable or not. I don't get the concept here. What I am sure of is that parts of the code (e.g. the typesetter) does things depending on these properties, and other parts (e.g. the pagebuilder) don't even test exactly the same thing. Either this is very clever, or we are very lucky. I don't want to rely on some uncertain luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes explicit nodes are skipped
E.g. at the top of a page.
Hence the necessary TeX-like weird consequence: having to \hbox{}\par\skip[...] (or vfill, etc.). Which is not without side-effect, depending on the linespacing strategy (as the empty hbox ends in a vbox line of its own, indeed of 0-height/depth, but tell that to something grid-based for instance...)
What I want to say here is that the desperately obscure design is where I failed at all my attempts of proper page vertical spacing, new page builder strategies and grid-baseline strategies). Might be a skill issue, I can hear it. Might be it's just so hard to guess what this things are supposed to do.... that after all, I am right to feel desparate too, vs. a solution that kinda worked, despite all its \expandafter \csname \z@ ugly constructs. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explicit property is only set "directly" from the calling code (direct affectation, or sometimes passed to the constructor).
Or... in script, \skip[discardable=false] = make me an explicit glue. I'm totally lost, if one boolean parameter is enough, then why two boolean properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An update from reading more of the code from the page builder. It would seem:
- discardable vglues are removed at the top and bottom of a frame
- explicit vglues are removed at the top of a frame but kept at bottom
I've not checked yet whether horizontal glues work in a similar way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another update, reading the TeXBook ch. 14 p. 95: "The last four types (glue, kern, penalty, and math items) are called discardable, since they may change or disappear at a line break (...)'. The math items here refers to math-on/math-off kerns, let's not discuss these here... The point is that in TeX-lingvo, a kern is "discardable". But not in SILE -- our kern has "discardable false".
An implication is that the "dirtry tricks" (TeXBook app. D) for "hanging punctuation" (p. 395) do not work in SILE. Trying to implement a "naive" punctuation protrusion based on that trick (albeit not so naively, but adding "clever" kerns in the Unicode node breaker)... Of course, one could tweak computeLineRatio() with a hack that would be working...
But that's a bad hack and the devil in the the detail -- let's say (I might be wrong but) SILE doesn't seem to follow the exact rules from the TeXbook here either for the Knuth-Plass line breaking.
Anyhow, the conclusion would then be that SILE's notion of "discardable" node here is therefore also different from that of TeX. Which, as usual in such a situation, might be legit or not... But with no explanation and discussion of the original intents, it's real hard to understand the expectations and the rationale behind all these stubble differences.
I keep running into dead ends in my investigation.
Do you think it would make sense for us to revisit the underlying concepts entirely, and suggest a rebuild from scratch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! But while the TeXbook mentions kerns as being discardable (ch. 14), but only consecutive kerns are ignored (app. D). So they are sort of non-discardable, unless consecutive. Er. Knuth's wording is rather opaque here.
69fea71 to
82a0370
Compare
|
There are some obvious wins in commits here already, and just as obvious areas that still need attention. Unfortunately I don't think all the open questions are going to get resolved before I have a chance to cut a release, likely tomorrow. I'll likely cherry pick some bits from here so they make the cut for v0.15.11, but keep this PR with discussion running. |
82a0370 to
a320c1d
Compare
|
BTW the misfit→orthogonal commit here isn't because that idea is finalized (and it it were to be it probably needs a regression shim) but just because I needed something to keep this PR and review comment chains alive after doing a partial merge of the branch locally. |

FWIW @alerque Some attempt at documenting and slightly reshuffling the code in
SILE.types.node...WIP as doing this, many comments popped up, I'll annotate the PR later, there's a lot of code smell that I'd like to sort out "file by file"