Skip to content

Conversation

@Omikhleia
Copy link
Member

@Omikhleia Omikhleia commented Jul 13, 2025

Work in progress - progressively back-porting upstream some things from my (private) fork of the bibliography components.

An Overview Of The Changes So Far...

(Slightly) Breaking changes

  • The legacy bibliography implementation is removed.

    The CSL-based bibliography engine, introduced in SILE 0.15.7 at the end of Nov. 2024, was foreshadowing the eventual deprecation of the earlier home-grown legacy implementation of (a very limited and wrong at places subset) of the Chicago author-date style.

    It's now the only one supported, so the code base can be simplified and extended with new features (see below).

    This also encompasses the removal of the i18n keys (in language files) that were used by the legacy implementation.

Features

  • \printbibliography supports new cool options:

    • filter: A space separated list of filters to apply to the bibliography, when cited is false.

      It allows filtering by entry type (ex. type-book or not-type-book), by keywords (ex. keyword-linguistics), or issue date (ex. issued-2020 for entries issued in 2020, or issued-2023-2025 for entries issued between 2023 and 2025).

      Entry types and keywords are to be understood in terms of CSL here (not BibTeX).

      Besides those simple built-in filters (which, in my experience at least, cover most use cases), a (very and somewhat internal) low-level Lua API is provided to allow users to implement their own named filters.

      (The reasons for these filters being allowed only when cited is false are that they are meant to filter the complete bibliography. Filtering cited entries would make tracking and resetting more complex, for little gain in normal use cases, in my opinion - at least for now, with things still in the flux.)

    • related: A boolean option to include "related" entries in the bibliography (in an indented block under the main entries).

      With a Bib(La)TeX bibliography, it correspond to entries which have a related field (a comma-separated list of entry keys). So, say, if you have a book B and an article A (e.g. possibly a review, commentary, etc.) with a related field containing the key of B, A will be listed nested under B in the bibliography (independently of being cited or not, on its own).

  • By default, ISBNs and ISBNs in bibliographies can now break at dashes, which is useful for justification and line breaking (as these elements are quite long).

    This can be disabled by setting the breakISBN option to false in the bibliography options.

  • Bib(La)TeX support improvements:

    • The ids field for citation key aliasing is now supported.

    • Several fields (ids, keywords, related, and xdata) are interpreted as lists of comma-separated values, as they should be.

  • \nocite elements are now accepted in \cites constructs for grouped citation, e.g., \cites{\cite[key1]\nocite[\key2] -- it's a convenience, allowing one to "skip" entries in grouped citations, while keeping them in the
    bibliography of cited works, without much rewriting of that block.

Fixes

  • Sorting is more consistent in its attempt at a strict weak ordering, avoiding random exceptions.

  • Superscript "folding" is now applied to ordinal numbers in bibliographies, as it was already done for other terms.

  • State is better handled across multiple bibliography runs, avoiding issues with author substitution (e.g., dashes) in per-chapter bibliographies (or when the new "related" references are enabled).

  • Smart straight quotes transformation plays better with italic text (when this extension is enabled, which is the default).

  • Rework how sorting and substitutions are done in bibliographies. This is an attempt at fixing Bibliography CSL "substitute" element does not always work as expected #2283 but it still seems that something is amiss with certain styles. Well, it might be partial, either due to a misinterpretation of the specification or a real bug. I haven't been able to pinpoint it, despite hours of attempts 😿 -- But it does repair something that I broke in SILE 0.15.13, and the code should be much better as new refactored.

Other changes

  • Welcome the new CslProcessor class.

    It's a big refactor, splitting most of the CSL processing logic out of the bibtex SILE package, which now just implements the command "layer" for SILE as a typesetting engine, delegating the work to that processor for the bibliography processing.

    This also allows using the CSL processing logic outside of SILE's regular processing flow (e.g., using SILE as a "Lua-on-steroids" toolkit with ICU, etc. but without going through the typesetter, PDF output, etc.)1

  • Improved in-code documentation, with better LDoc annotations.

Closes #2126

Closes #2085

Regarding #2283 - the changes here might be a partial / imperfect fix, as noted above, but the code makes more sense now.

Footnotes

  1. For instance, I am using it to generate HTML versions of my bibliographies, soon to be released. It's still a bit rough, but it paves the way for better abstractions.

…sform

Edge case: an article title for, say a rewiew, may contain a book title
in italic (About _reviewed book_ by John Doe) and that book title may
start is double quotes ("Something": Other).
Rare, but not uncommon, some authors start with such quotations.
With a style using quotes=true, quotes are added and the existing quotes
have to be shifted ("About _‘Something’: Other_ by John Doe”).
@Omikhleia Omikhleia self-assigned this Jul 13, 2025
@Omikhleia Omikhleia requested a review from alerque as a code owner July 13, 2025 09:31
@Omikhleia Omikhleia marked this pull request as draft July 13, 2025 09:31
…files

Fields 'ids', 'keywords', 'related' and 'xdata' can consist in a single
value, or multiple comma-separated values.
We now always resolve them to a list, for easier use afterwards.
We also check at entry resolution that the 'related' field points to
existing keys, and warn if it isn't the case.
BREAKING CHANGE: The new CSL-based bibliography engine was introduced in
SILE 0.15.7 at the end of Nov. 2024, foreshadowing the eventual deprecation
of the earlier home-grown legacy implemention of (a subset) of the Chicago
author-date style.
It's now removed, so we can move forward simplifying the code base and
extend the package with new features.
The bibtex.style setting is however kept, so as not to break documents
or modules that were setting it (to 'csl' or 'chicago'), but it does
nothing now, and the CSL implementation is always used.
It started as a refactor, but ends up as a feature:
Bibliography processing is now available in the CslProcessor class.
The latter may be used outside of SILE's regular processing flow
(that is, using SILE as a Lua-on-steroids toolkit with ICU, etc. but
without going through the typesetter etc. - for instance, I am using
it to generate HTML versions of my bibliographies. It's still rough,
but paves the way for better abstractions).
The package itself now just implements commands for SILE as a
typesetting engine, delegating to that processor and just containing
the very specific code layer for processing in documents.
Some simple built-in filters are provided, allowing to filter the complete
bibliography by entry type, keywords or issue date.

N.B. Some code also refactored and notably in-code LDoc comments and annotations
have been improved.
…locks

Introducing <bibParagraph> wrapping entries, instead of a <par> in-between.
The way it's implemented does not change anything as far as SILE processing
is concerned, but makes it easier for other output processors (e.g. to HTML
where one would want explicit start-end constructs to mark paragraphs and
translate them to anything suitable (<p> or <div>) with appropriate CSS.
… entries

It's fairly interesting to be able to list related entries in an indented
block under the main entries: typically reviews, translations, etc.
@Omikhleia Omikhleia force-pushed the csl-july-2025 branch 2 times, most recently from 5858d0b to 5188da2 Compare July 26, 2025 22:10
Introduces a push/pop stack to preserve state across multiple
bibliography invocations.

This avoids rare issues with author substitution (e.g., dashes)
in per-chapter bibliographies (e.g. same authors end one and start
the next), or more frequent issues when "related" references are
enabled and disrupt sequences due to their "nested" behavior.

There weren't many internal variables (mainly current authors) to save
and restore, and save the day. But using a stack is cleaner and more
future-proof.
…iographies

We had the "superfolding" on terms implemented in the CSL engine class, but
not on ordinal numbers (whose terms where left unprocessed).
Code refactored to move that logic in the CSL locale class earlier, and this
applied to all term values.
…rder

There were cases where depending on citation style the sorting function
could lead to identical keys, and not provide a strict weak ordering,
resulting in a random exception.
The code is also slightly refactored and simplified, since PR 2105 was
merged since SILE 0.15.6.
…liographies

Enabled by defaut:
These fields might be problematic for justification, and the default line breaking
algorithm may not recognize these dashes as word boundaries when occurring in the
middle of the number.
Correct bib-box-for-indent support (used with styles such as ieee.csl, with
second field alignment).
…ographies

In the commits in PR 2230, that went in SILE 0.15.13, I tried to fix an issue
regarding the suppression of empty macros (impacting substitutes in names).
It turned out to cause other issues (2283) and badly break some previously
working styles (Chicago, MLA) with some entries...
This is an attempt at repairing them. But something still eludes me in the
CSL specifications, and while it does seem to work now with the styles we
previously tested, using the new Chicago styles recently updated in the CSL
repository shows that something is still amiss. These updated styles have
a lot more nested macro with conditionals than previously, and I haven't
pinpointed where the misinterpretation or raw bug could lie.
Yet, the code as now refactored makes more sense, with less side effects
and better scoping, so at least it's worth having, albeit perhaps as a partial
fix...
@Omikhleia
Copy link
Member Author

For the record:
My ETA for my work on this topic is around end of August 2025 (where I'll use the new features and fixes for a new edition of one of my books). I'm mostly done (and could have moved the PR to "Ready", but I have a short break the two next weeks, and will check this and other topics upon my return.

It allows one to skip entries in grouped citations, while keeping them in the
bibliography of cited works, without much rewriting.
Since the removal of the legacy bibliography implementation, these two
tests were deprecated.
One (bug-2054.bib) is clearly obsolote (checking a side-effect of fluent
in bibligraphy internationalization, but the new implementation uses
CSL)
The other leads to a wholly different output (really following Chicago
Author-Date now) but would have been hard to adapt, and was also fairly
uncomplete. A different testing approach would be required, as we move
to a more layered and modular approach.
The newer CSL-based implementation use, by definition, the terms and
rules from the bibliography style and locale.
@Omikhleia Omikhleia marked this pull request as ready for review August 21, 2025 18:48
@Omikhleia Omikhleia requested review from a team as code owners August 21, 2025 18:48
@Omikhleia Omikhleia added this to the v0.15.14 milestone Aug 21, 2025
@Omikhleia
Copy link
Member Author

My ETA for my work on this topic is around end of August 2025

Main objectives achieved, and PR moved to Ready.

(I had two other very secondary objectives in my wish-list. I'll probably open dedicated issues for them. This PR here is already a nice baby.)

(Le Dragon de Brume will soon communicate regarding a book update, and an online web site. Both were made with SILE, and the code from this PR.)

I am suggesting this should go in 0.15.14, nearly one year after the work that eventually went in 0.15.7 was initiated. True, it's a slightly breaking assuming some people might have relied on the legacy bibtex implementation in anything serious... I cannot believe such users really exist; and even so, the update should be quite transparent, or at least easy and understandable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

bibliography entry filtering strategy Support ids field in bibtex files

2 participants