Skip to content

sclang/lexer: define all newline literals to be '\n', not the platform default#7545

Open
JordanHendersonMusic wants to merge 4 commits into
supercollider:developfrom
JordanHendersonMusic:topic/redefine-new-line
Open

sclang/lexer: define all newline literals to be '\n', not the platform default#7545
JordanHendersonMusic wants to merge 4 commits into
supercollider:developfrom
JordanHendersonMusic:topic/redefine-new-line

Conversation

@JordanHendersonMusic

@JordanHendersonMusic JordanHendersonMusic commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Previously there was a nasty bug where strings with newlines in them depended on the author's os and editor when written in the class library.

This led to this bug that existed on windows when the class was written by a user.

A {
  ^str { ^"meow
woof"}
}

A.str != "meow
woof";

This is because on windows the newline character is actually two, \r\n, but on unix, and when something is checked into a git repo, and in the sc ide's runtime evaluation mode, the newline character is '\n'.

As well as converting all \r\n into \n, this pr also converts all \r into \n, this is because ancient macs (before 2001) used to use \r as thier newline character and we still have some code that does that.

Further, we redefine that the only whitespace character the ascii literal $ can represent is the ascii space ' '. The error message for this looks as follows:

Lexing Error:
──────────────────────────────────────────────────────────────────────────────────
Error: /home/jordan/.repo/SuperCollider/SCClassLibrary/SCDoc/SCDoc.sc:1:1
──────────────────────────────────────────────────────────────────────────────────
    1 │ $
      ┆ ^ replace with $\n.
    2 │

This is done because previously the lexer would turn this runtime code $ into $ , as it would always insert a space at the end of the evaluated code (else it got stuck in an inifitine loop).

Fixes #7520
Fixes #7523
Fixes #7509

Types of changes

  • Documentation
  • Bug fix
  • New feature

To-do list

  • Code is tested
  • All tests are passing
  • Updated documentation
  • This PR is ready for review

@JordanHendersonMusic JordanHendersonMusic added the comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" label Jun 5, 2026
@JordanHendersonMusic

JordanHendersonMusic commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@dyfer what do you think about this? It converts \r and \r\n to \n, and clears up some ambiguity regarding whitespace in the $ literal so it is always the ascii space. I opted to introduce \0 as a new escape literal here for completeness. Technically this is a breaking change as previously \0 would have just turned into 0. Do you think that is alright?

If so, I'll go ahead and start updating the documentation.

Also, previously if you ended a runtime code evaluation with a $ it would turn into $ (that's the space right there). I've remove this and turned it into an error.

I also discovered the quoted symbols don't work with escape characters. Here is an interesting one! Not too sure what to do about that as it might be a larger breaking change and probably not worth it.

'\n' === 'n' // !!!

@dyfer

dyfer commented Jun 5, 2026

Copy link
Copy Markdown
Member

Thanks @JordanHendersonMusic

Newline conversions look good to me from the functional standpoint (I haven't looked at the implementation).

One thing I would indeed want to consider further is introducing a new escape sequence \0. I'm sorry, but I don't fully understand the need for it. Can you give a complete code example and erroneous behavior where this issue transpires?

@JordanHendersonMusic

Copy link
Copy Markdown
Contributor Author

Well I added it more for completeness in the lexer, particularly for ASCII literals. I suppose if I added a unicode escape literal instead you could just do "\u0000". This would provide all the functionality and only one potential place to break stuff.

... Actually after writing that out I think it's a better idea. Still it is potentially one breaking change, but means we ought not to need another one like this.

This can do the warning to error thing, so won't become available for a version or so.

I think I will also add a warning if a user attempts to use any non-supported escape character. Although the whitespace ones should stay as hard errors because they are confusing.

This way we can make all the escapes in quoted symbols (expect the newline) warnings too, because they do nothing.

By the way, the reason I was very hesitant to do the warning to error thing before is because previously we were talking about putting this in the lexer. Right now the lexer has no idea what a version is and should stay that way really. This stuff all goes in the compiler, so it's not an issue.

I really want to try and keep the bits that are pulled out of Lang/ to be version independent and as self contained as possible. The actual compiler can always be a bit of a mess to accommodate these oddities.

@dyfer

dyfer commented Jun 5, 2026

Copy link
Copy Markdown
Member

Sounds good!

Well I added it more for completeness in the lexer, particularly for ASCII literals. I suppose if I added a unicode escape literal instead you could just do "\u0000". This would provide all the functionality and only one potential place to break stuff.

... Actually after writing that out I think it's a better idea. Still it is potentially one breaking change, but means we ought not to need another one like this.

Yes, that seems to be better, but still, why exactly do we need it?

In any case, a unicode escape sequence would be great, though that would be a breaking change, I think? (i.e. "\u" won't return "u") I'm not opposed, just making sure we all take that into account.

@JordanHendersonMusic

Copy link
Copy Markdown
Contributor Author

why exactly do we need it?

Well as this pr makes it impossible to have \r, \v, \f in source code, and later I want to make the rest of the bizarre ancient ASCII crap illegal, the user might want a legitimate way to input these characters.

It is unlikely to come up in general sc use, but when I eventually rework the threading issue and revive the c API, stuff like this might come up.

…m default

Previously there was a nasty bug where strings with newlines in them depended on the author's os and editor when written in the class library.

This led to this bug that existed on windows when the class was written by a user.
```
A {
  ^str { ^"meow
woof"}
}

A.str != "meow
woof";
```
This is because on windows the newline character is actually two, \r\n, but on unix, and when something is checked into a git repo, and in the sc ide's runtime evaluation mode, the newline character is '\n'.

As well as converting all \r\n into \n, this pr also converts all \r into \n, this is because ancient macs (before 2001) used to use \r as thier newline character and we still have some code that does that.

Further, we redefine that the only whitespace character the ascii literal `$ ` can represent is the ascii space ' '. The error message for this looks as follows:
```
Lexing Error:
──────────────────────────────────────────────────────────────────────────────────
Error: /home/jordan/.repo/SuperCollider/SCClassLibrary/SCDoc/SCDoc.sc:1:1
──────────────────────────────────────────────────────────────────────────────────
    1 │ $
      ┆ ^ replace with $\n.
    2 │
```

This is done because previously the lexer would turn this runtime code `$` into `$ `, as it would always insert a space at the end of the evaluated code (else it got stuck in an inifitine loop).

Fixes supercollider#7520
Fixes supercollider#7523
Fixes supercollider#7509
@JordanHendersonMusic

Copy link
Copy Markdown
Contributor Author

@dyfer I've left out the proposed \u to hopefully make reviewing easier. Otherwise, hopefully this is all good. It is the least disruptive way possible to do this, it could cause issue if the user has on purposed put a carriage return/vertical tab/form feed in their code. But this seems really unlikely.

@dyfer dyfer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I can't review cpp part of this reliably, but I think the functional changes are good.

I have some comments regarding the help files.


description::
An ASCII character represented as a signed 8-bit integer (-128 to 127).
A Char represented as a signed 8-bit integer (-128 to 127).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it not an ASCII character? is it a Unicode character? "A Char" is probably least descriptive here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't a unicode codepoint either, that's 21 bits. It is just a single byte. Half of which is valid ascii.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe "A character represented as a signed 8-bit integer (-128 to 127)" ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it should be 'A Char is a signed 8-bit integer' ?

The class is called char and not all valid chars are characters. Really it should just be called Byte.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Char's identity was always a mystery to me.
Char is a signed 8-bit integer is pretty good, but it differs from a regular integer since it is most commonly represented as a character.
What else can it represent? When it's value lies outside of ASCII, isn't it essentially a flavor of extended ASCII?

@@ -1,2 +1,2 @@
class::Char
summary::ASCII character

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not ASCII anymore (?) this should also be updated

@JordanHendersonMusic JordanHendersonMusic Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It never was because it is a char.

... Sorry I realised that isn't useful! It's a whole byte but ASCII is only 7 bits. So it can represent invalid ASCII too.

Comment thread HelpSource/Classes/Char.schelp Outdated
together in strings that use these encodings.

The SuperCollider IDE uses UTF-8 to decode and display strings.
The SuperCollider IDE and UTF-8 to decode and display strings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I rewrote this and decided against it!

Comment on lines -89 to +91
12r4a.abc // wrong
12r4a.ABC // works
12r4A.ABC // better
12r4a.ab // wrong
12r4a.AB // works
12r4A.AB // better

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changed?

@JordanHendersonMusic JordanHendersonMusic Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they were invalid. Try running the old stuff and you will get a parsing error because 'c' is an unexpected identitfier. I wrote an issue about this weird bit of syntax from radixs a while a ago.

It's unrelated but just a small thing.

Comment thread HelpSource/Reference/Literals.schelp Outdated
code::
\1234;
\892342534;
\3456meow // not illegal

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not illegal?

Comment thread HelpSource/Classes/Char.schelp Outdated
See link::Classes/String:: for more information.

note::
From version 3.15 and onwards one can no longer have a whitespace literal (except the space) after the dollar sign.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be From version 3.15 onwards (i.e. remove "and")

Comment thread HelpSource/Classes/Char.schelp Outdated
Comment thread HelpSource/Classes/Char.schelp Outdated
Comment thread HelpSource/Reference/Literals.schelp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library"

Projects

None yet

2 participants