Skip to content

sclang: Unicode mathematical operators#7425

Draft
JordanHendersonMusic wants to merge 3 commits into
supercollider:developfrom
JordanHendersonMusic:topic/unicode-operators
Draft

sclang: Unicode mathematical operators#7425
JordanHendersonMusic wants to merge 3 commits into
supercollider:developfrom
JordanHendersonMusic:topic/unicode-operators

Conversation

@JordanHendersonMusic

@JordanHendersonMusic JordanHendersonMusic commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Just a draft right now to demonstrate the new lexer.

This PR depends on that one, the actual change here is about 5 lines of code.

Purpose and Motivation

Adds mathematical operators to the supported binary operator list

The sc code here is just an example, I don't intend it seriously, but could stay?

The real conversation here is how to support unicode. The ICU has a huge document on this subject, its really complicated. I suggest we just selectively add bits as we need them, rather than trying to add everything!

For example, we probably want to forbid things like (unicode minus sign) because it looks the same as - (normal minus sign).

One objection to this approach is that the codepoints added here are always binary operators, this means you can't do ∑(array), instead you'd have to write (∑)(array) or array ∑ nil or array ∑ (_ + _).

I've added the Mathematical Operators block and Supplemental Mathematical Operators block to the binary operator list, you can find these here: https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode

We may also wish to consider how the user types these. Julia's vscode extension has a lovely unicode popup when you type \. I.e. turning \sum into . We might want to do the same in scide for \\ ?

This works as expected.

Note that ∋∀ is a unique operator, not two.

// all less than 5
[1, 2, 3, 4] ∀ (_ < 5)

// any equal 2
[1, 2, 3, 4] ∃ ( _ === 2)

// includes 3
[1, 2, 3, 4] ∋ 3

// does not include 2
[1, 2, 3, 4] ∌ 2 

Types of changes

  • New feature

To-do list

  • Code is tested
  • All tests are passing
  • Updated documentation
  • This PR is ready for review

This pr creates a new lexer, replacing the old one with an easier to read and more resusable alternative.

Fuzz testing with the old is also provided and has been ran for a while until no new issues were found.
@JordanHendersonMusic JordanHendersonMusic added the comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" label Mar 20, 2026
@JordanHendersonMusic JordanHendersonMusic changed the title Unicode mathematical operatos sclang: Unicode mathematical operatos Mar 20, 2026
@JordanHendersonMusic JordanHendersonMusic changed the title sclang: Unicode mathematical operatos sclang: Unicode mathematical operators Mar 20, 2026
@telephon

Copy link
Copy Markdown
Member

That would be lovely. This means that symbols and strings also support unicode?

@JordanHendersonMusic

JordanHendersonMusic commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

They already do! Symbols only when declared like 'abcd', not \abcd.

Strings work by gobbling everything until ' " ' is encountered, same with quoted symbols. I could add this to the slash symbol, but the logic there is already weird and complex so I'm reluctant to do so.

@telephon

Copy link
Copy Markdown
Member

Are you sure? If you call

"∀x∃y"[2] == "∃"[0]
"∀x∃y"[2] == $∃

Do you get true?

@JordanHendersonMusic

JordanHendersonMusic commented Mar 21, 2026

Copy link
Copy Markdown
Contributor Author

No, our string class isn't a text container, it's a byte container. Changing this is a huge breaking changing. What we could do is introduce a new string literal that turns into a text object, or make the old string syntax turn into a byte array. The former means people have to use this new string class everywhere, there latter means everyone needs to update their code. The latter is the 'correct' solution done by other languages like python.

// Text literal
t"∀x∃y"[1] = "x"
"∀x∃y"[1] != "x"

// Byte literal
"∀x∃y"[1] = "x"
b"∀x∃y"[1] != "x"

That is a separate pr though and not directly related to this, which is strictly about adding unicode operators to the language. It would also involve including some form of unicode library and we'd have to deal with complex things like graphemes, multicodepoints, and normalisation forms. This PR here is far smaller, only adding a little piece of unicode into the language.

I meant you can store and concat unicode in symbols and strings, which is sufficient to do most things you'd want to do with selectors.

@JordanHendersonMusic

JordanHendersonMusic commented Mar 21, 2026

Copy link
Copy Markdown
Contributor Author
$∃

That's impossible, $ is a char, you need an arbitrary amount of space to store a grapheme.

Consider this thing, its only one 'character' (grapheme)....

ḧ̶̶̴̷̶̵̸̴̷̷̶̸̷̵̷̸̷̷̸̶̷̸̵̶̷̸̨̢̢̡̨̧̡̢̢̢̧̡̢̢̨̧̨̧̢̢̢̨̨̧̢̧̧̛̛̟͚̯̳͍͔̜̞̯̭̙͓͓͍̹̤̱̘͕̮͎̳̰̜͍̗͍̬͎̰̝̟̫̞̱͕̟̺̺̜̟̞̤̝̜̥̼̳̟̬̲͖͓̪̠̖̼̗͈̦̤̳̝̪͔̦̗̠͙̺̰̥̹͎͉̩̺̯̳̟̭̥̠̱̱̬̥̻̲̖̯̼͓̬͕͖̼̮̣̬̠͍̖̬͇̮̭̭̞̳̪̜̞̪͉͔̩̩̺̙̗̼͓̲͙̩̪̩̬̠͔̱͉͕̪̳̲̥̟̺͍̙̠̱̝̗͖̠̜͙̰͙̦̙̼̹̖̮̜̹͍̘̠̱̼̗̺̟̰͚͕̹̪͕̹͕̝͍͎́̋̽͆́̈̅̑̌̌̊̄̍͗͒̀̋͛͐̆͆͑̃́̅̌̆́̂̊̆͛̓̀̄̀̔̉͑́̌͑̂̈́̿̌̂̊̈̈́̇̈́̃̋̉̀̋͗̈̏͂̍͆͑̆̎͐͂̈̽̍̌͌͒̏̓͌̓͒̾͊̓̒̈́̑̔̀̋̑̀̐̽͛̈̀͒͗̽͛̔̈́̉͋̈́͐́͛̉̓̈́͐͗̊̇̀̍͗͆̋̓́̈̌̐͛͊̃̅͊̔̄̿͋̅̈͛̈́̇̌͂̔̉͐͂͆̐̅̾̋͆͑̏̽͌̈͑̈́̋̽̅͆̓͆̽̓̊́̏̈́́̈̆̀̏͛̊̄̀̓̋̂́̊̇̽̓̂̄̽͐̓̽̚͘̕͘̕͘̚͘͘̚͘͘̚͜͜͜͜͜͜͜͠͠͠͝͝͝͠͝͝͠͝͝͝͝͠ͅͅͅͅͅ

@telephon

Copy link
Copy Markdown
Member

Yes, this is what I thought! For this reason I once had the idea to intriduce a class of nested string, which can deal both with this and with something like quotation levels (strings in strings in strings, like arrays).

https://github.com/telephon/Strang

Just for reference.

@telephon

Copy link
Copy Markdown
Member

I suppose that the lexer refactor commit is a preparational thing, independent o the new feature, right?

@JordanHendersonMusic

JordanHendersonMusic commented Mar 23, 2026

Copy link
Copy Markdown
Contributor Author

Well it makes all the stuff easier, along with preparing to fix up the lexer/parser communicating, which is unsafe and I'm pretty certain wrong because it's a lookahead parser and we mutate global state.

The new lexer works in codepoints, so it's trivial to do this kind of thing... And since all unicode use is currently royally broken, there is this nice unicode shaped hole to fill with goodies!!!

I'm also planning (in various degrees of doneness):

  • Making all current uses of unicode (outside strings and symbols) errors as these are all bugs right now.
  • Strings interpolation s"1 + 2 = {1+2}" - perhaps this could use unicode quotes?
  • DSLStrings proposal
  • New accidental type proposal using unicode a#perhaps using some kind of SMFUL integration?

@telephon

Copy link
Copy Markdown
Member

Since this is a large change whose record will matter in the future, I'd suggest you split it up into two pull requests. I see no problem with either of them, but looking at 68007ba this seems absolutely simple, once all the rest is in place.

@JordanHendersonMusic

Copy link
Copy Markdown
Contributor Author

#7394 already did that!

@telephon

Copy link
Copy Markdown
Member

ok thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library"

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants