sclang/lexer: define all newline literals to be '\n', not the platform default by JordanHendersonMusic · Pull Request #7545 · supercollider/supercollider

JordanHendersonMusic · 2026-06-05T11:10:50Z

Previously there was a nasty bug where strings with newlines in them depended on the author's os and editor when written in the class library.

This led to this bug that existed on windows when the class was written by a user.

A {
  ^str { ^"meow
woof"}
}

A.str != "meow
woof";

This is because on windows the newline character is actually two, \r\n, but on unix, and when something is checked into a git repo, and in the sc ide's runtime evaluation mode, the newline character is '\n'.

As well as converting all \r\n into \n, this pr also converts all \r into \n, this is because ancient macs (before 2001) used to use \r as thier newline character and we still have some code that does that.

Further, we redefine that the only whitespace character the ascii literal $ can represent is the ascii space ' '. The error message for this looks as follows:

Lexing Error:
──────────────────────────────────────────────────────────────────────────────────
Error: /home/jordan/.repo/SuperCollider/SCClassLibrary/SCDoc/SCDoc.sc:1:1
──────────────────────────────────────────────────────────────────────────────────
    1 │ $
      ┆ ^ replace with $\n.
    2 │

This is done because previously the lexer would turn this runtime code $ into $ , as it would always insert a space at the end of the evaluated code (else it got stuck in an inifitine loop).

Fixes #7520
Fixes #7523
Fixes #7509

Types of changes

Documentation
Bug fix
New feature

To-do list

Code is tested
All tests are passing
Updated documentation
This PR is ready for review

JordanHendersonMusic · 2026-06-05T11:22:29Z

@dyfer what do you think about this? It converts \r and \r\n to \n, and clears up some ambiguity regarding whitespace in the $ literal so it is always the ascii space. I opted to introduce \0 as a new escape literal here for completeness. Technically this is a breaking change as previously \0 would have just turned into 0. Do you think that is alright?

If so, I'll go ahead and start updating the documentation.

Also, previously if you ended a runtime code evaluation with a $ it would turn into $ (that's the space right there). I've remove this and turned it into an error.

I also discovered the quoted symbols don't work with escape characters. Here is an interesting one! Not too sure what to do about that as it might be a larger breaking change and probably not worth it.

'\n' === 'n' // !!!

dyfer · 2026-06-05T17:08:50Z

Thanks @JordanHendersonMusic

Newline conversions look good to me from the functional standpoint (I haven't looked at the implementation).

One thing I would indeed want to consider further is introducing a new escape sequence \0. I'm sorry, but I don't fully understand the need for it. Can you give a complete code example and erroneous behavior where this issue transpires?

JordanHendersonMusic · 2026-06-05T17:47:04Z

Well I added it more for completeness in the lexer, particularly for ASCII literals. I suppose if I added a unicode escape literal instead you could just do "\u0000". This would provide all the functionality and only one potential place to break stuff.

... Actually after writing that out I think it's a better idea. Still it is potentially one breaking change, but means we ought not to need another one like this.

This can do the warning to error thing, so won't become available for a version or so.

I think I will also add a warning if a user attempts to use any non-supported escape character. Although the whitespace ones should stay as hard errors because they are confusing.

This way we can make all the escapes in quoted symbols (expect the newline) warnings too, because they do nothing.

By the way, the reason I was very hesitant to do the warning to error thing before is because previously we were talking about putting this in the lexer. Right now the lexer has no idea what a version is and should stay that way really. This stuff all goes in the compiler, so it's not an issue.

I really want to try and keep the bits that are pulled out of Lang/ to be version independent and as self contained as possible. The actual compiler can always be a bit of a mess to accommodate these oddities.

dyfer · 2026-06-05T18:00:42Z

Sounds good!

Well I added it more for completeness in the lexer, particularly for ASCII literals. I suppose if I added a unicode escape literal instead you could just do "\u0000". This would provide all the functionality and only one potential place to break stuff.

... Actually after writing that out I think it's a better idea. Still it is potentially one breaking change, but means we ought not to need another one like this.

Yes, that seems to be better, but still, why exactly do we need it?

In any case, a unicode escape sequence would be great, though that would be a breaking change, I think? (i.e. "\u" won't return "u") I'm not opposed, just making sure we all take that into account.

JordanHendersonMusic · 2026-06-05T18:08:19Z

why exactly do we need it?

Well as this pr makes it impossible to have \r, \v, \f in source code, and later I want to make the rest of the bizarre ancient ASCII crap illegal, the user might want a legitimate way to input these characters.

It is unlikely to come up in general sc use, but when I eventually rework the threading issue and revive the c API, stuff like this might come up.

…m default Previously there was a nasty bug where strings with newlines in them depended on the author's os and editor when written in the class library. This led to this bug that existed on windows when the class was written by a user. ``` A { ^str { ^"meow woof"} } A.str != "meow woof"; ``` This is because on windows the newline character is actually two, \r\n, but on unix, and when something is checked into a git repo, and in the sc ide's runtime evaluation mode, the newline character is '\n'. As well as converting all \r\n into \n, this pr also converts all \r into \n, this is because ancient macs (before 2001) used to use \r as thier newline character and we still have some code that does that. Further, we redefine that the only whitespace character the ascii literal `$ ` can represent is the ascii space ' '. The error message for this looks as follows: ``` Lexing Error: ────────────────────────────────────────────────────────────────────────────────── Error: /home/jordan/.repo/SuperCollider/SCClassLibrary/SCDoc/SCDoc.sc:1:1 ────────────────────────────────────────────────────────────────────────────────── 1 │ $ ┆ ^ replace with $\n. 2 │ ``` This is done because previously the lexer would turn this runtime code `$` into `$ `, as it would always insert a space at the end of the evaluated code (else it got stuck in an inifitine loop). Fixes supercollider#7520 Fixes supercollider#7523 Fixes supercollider#7509

JordanHendersonMusic · 2026-06-06T14:49:46Z

@dyfer I've left out the proposed \u to hopefully make reviewing easier. Otherwise, hopefully this is all good. It is the least disruptive way possible to do this, it could cause issue if the user has on purposed put a carriage return/vertical tab/form feed in their code. But this seems really unlikely.

dyfer

Thanks!

I can't review cpp part of this reliably, but I think the functional changes are good.

I have some comments regarding the help files.

dyfer · 2026-06-09T19:30:49Z


 description::
-An ASCII character represented as a signed 8-bit integer (-128 to 127).
+A Char represented as a signed 8-bit integer (-128 to 127).


So is it not an ASCII character? is it a Unicode character? "A Char" is probably least descriptive here.

No it isn't a unicode codepoint either, that's 21 bits. It is just a single byte. Half of which is valid ascii.

So maybe "A character represented as a signed 8-bit integer (-128 to 127)" ?

Actually I think it should be 'A Char is a signed 8-bit integer' ?

The class is called char and not all valid chars are characters. Really it should just be called Byte.

Char's identity was always a mystery to me.
Char is a signed 8-bit integer is pretty good, but it differs from a regular integer since it is most commonly represented as a character.
What else can it represent? When it's value lies outside of ASCII, isn't it essentially a flavor of extended ASCII?

dyfer · 2026-06-09T19:31:08Z

@@ -1,2 +1,2 @@
 class::Char
 summary::ASCII character


If it's not ASCII anymore (?) this should also be updated

It never was because it is a char.

... Sorry I realised that isn't useful! It's a whole byte but ASCII is only 7 bits. So it can represent invalid ASCII too.

dyfer · 2026-06-09T19:31:38Z

 together in strings that use these encodings.

-The SuperCollider IDE uses UTF-8 to decode and display strings.
+The SuperCollider IDE and UTF-8 to decode and display strings.


I don't understand this change...?

Oops! I rewrote this and decided against it!

dyfer · 2026-06-09T19:32:06Z

-12r4a.abc // wrong
-12r4a.ABC // works
-12r4A.ABC // better
+12r4a.ab // wrong
+12r4a.AB // works
+12r4A.AB // better


Why are these changed?

Because they were invalid. Try running the old stuff and you will get a parsing error because 'c' is an unexpected identitfier. I wrote an issue about this weird bit of syntax from radixs a while a ago.

It's unrelated but just a small thing.

dyfer · 2026-06-09T19:33:26Z

+code::
+\1234;
+\892342534;
+\3456meow // not illegal


not illegal?

dyfer · 2026-06-09T19:34:08Z

 See link::Classes/String:: for more information.

+note::
+From version 3.15 and onwards one can no longer have a whitespace literal (except the space) after the dollar sign.


I think this should be From version 3.15 onwards (i.e. remove "and")

JordanHendersonMusic added the comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" label Jun 5, 2026

JordanHendersonMusic force-pushed the topic/redefine-new-line branch from ac49e4b to b82b481 Compare June 6, 2026 13:58

JordanHendersonMusic force-pushed the topic/redefine-new-line branch from f656114 to 2443106 Compare June 6, 2026 14:49

dyfer reviewed Jun 9, 2026

View reviewed changes

JordanHendersonMusic commented Jun 9, 2026

View reviewed changes

Comment thread HelpSource/Classes/Char.schelp Outdated

Comment thread HelpSource/Classes/Char.schelp Outdated

Comment thread HelpSource/Reference/Literals.schelp Outdated

JordanHendersonMusic added 3 commits June 9, 2026 22:41

Update HelpSource/Classes/Char.schelp

b019ee5

Update HelpSource/Classes/Char.schelp

bd6f1e8

Update HelpSource/Reference/Literals.schelp

bf8d17c

Conversation

JordanHendersonMusic commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Types of changes

To-do list

Uh oh!

JordanHendersonMusic commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dyfer commented Jun 5, 2026

Uh oh!

JordanHendersonMusic commented Jun 5, 2026

Uh oh!

dyfer commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JordanHendersonMusic commented Jun 5, 2026

Uh oh!

JordanHendersonMusic commented Jun 6, 2026

Uh oh!

dyfer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JordanHendersonMusic Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JordanHendersonMusic Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JordanHendersonMusic commented Jun 5, 2026 •

edited

Loading

JordanHendersonMusic commented Jun 5, 2026 •

edited

Loading

dyfer commented Jun 5, 2026 •

edited

Loading

JordanHendersonMusic Jun 9, 2026 •

edited

Loading

JordanHendersonMusic Jun 9, 2026 •

edited

Loading