sclang/lexer: define all newline literals to be '\n', not the platform default#7545
Conversation
|
@dyfer what do you think about this? It converts \r and \r\n to \n, and clears up some ambiguity regarding whitespace in the If so, I'll go ahead and start updating the documentation. Also, previously if you ended a runtime code evaluation with a I also discovered the quoted symbols don't work with escape characters. Here is an interesting one! Not too sure what to do about that as it might be a larger breaking change and probably not worth it. |
|
Thanks @JordanHendersonMusic Newline conversions look good to me from the functional standpoint (I haven't looked at the implementation). One thing I would indeed want to consider further is introducing a new escape sequence |
|
Well I added it more for completeness in the lexer, particularly for ASCII literals. I suppose if I added a unicode escape literal instead you could just do "\u0000". This would provide all the functionality and only one potential place to break stuff. ... Actually after writing that out I think it's a better idea. Still it is potentially one breaking change, but means we ought not to need another one like this. This can do the warning to error thing, so won't become available for a version or so. I think I will also add a warning if a user attempts to use any non-supported escape character. Although the whitespace ones should stay as hard errors because they are confusing. This way we can make all the escapes in quoted symbols (expect the newline) warnings too, because they do nothing. By the way, the reason I was very hesitant to do the warning to error thing before is because previously we were talking about putting this in the lexer. Right now the lexer has no idea what a version is and should stay that way really. This stuff all goes in the compiler, so it's not an issue. I really want to try and keep the bits that are pulled out of Lang/ to be version independent and as self contained as possible. The actual compiler can always be a bit of a mess to accommodate these oddities. |
|
Sounds good!
Yes, that seems to be better, but still, why exactly do we need it? In any case, a unicode escape sequence would be great, though that would be a breaking change, I think? (i.e. "\u" won't return "u") I'm not opposed, just making sure we all take that into account. |
Well as this pr makes it impossible to have \r, \v, \f in source code, and later I want to make the rest of the bizarre ancient ASCII crap illegal, the user might want a legitimate way to input these characters. It is unlikely to come up in general sc use, but when I eventually rework the threading issue and revive the c API, stuff like this might come up. |
ac49e4b to
b82b481
Compare
…m default
Previously there was a nasty bug where strings with newlines in them depended on the author's os and editor when written in the class library.
This led to this bug that existed on windows when the class was written by a user.
```
A {
^str { ^"meow
woof"}
}
A.str != "meow
woof";
```
This is because on windows the newline character is actually two, \r\n, but on unix, and when something is checked into a git repo, and in the sc ide's runtime evaluation mode, the newline character is '\n'.
As well as converting all \r\n into \n, this pr also converts all \r into \n, this is because ancient macs (before 2001) used to use \r as thier newline character and we still have some code that does that.
Further, we redefine that the only whitespace character the ascii literal `$ ` can represent is the ascii space ' '. The error message for this looks as follows:
```
Lexing Error:
──────────────────────────────────────────────────────────────────────────────────
Error: /home/jordan/.repo/SuperCollider/SCClassLibrary/SCDoc/SCDoc.sc:1:1
──────────────────────────────────────────────────────────────────────────────────
1 │ $
┆ ^ replace with $\n.
2 │
```
This is done because previously the lexer would turn this runtime code `$` into `$ `, as it would always insert a space at the end of the evaluated code (else it got stuck in an inifitine loop).
Fixes supercollider#7520
Fixes supercollider#7523
Fixes supercollider#7509
f656114 to
2443106
Compare
|
@dyfer I've left out the proposed \u to hopefully make reviewing easier. Otherwise, hopefully this is all good. It is the least disruptive way possible to do this, it could cause issue if the user has on purposed put a carriage return/vertical tab/form feed in their code. But this seems really unlikely. |
dyfer
left a comment
There was a problem hiding this comment.
Thanks!
I can't review cpp part of this reliably, but I think the functional changes are good.
I have some comments regarding the help files.
|
|
||
| description:: | ||
| An ASCII character represented as a signed 8-bit integer (-128 to 127). | ||
| A Char represented as a signed 8-bit integer (-128 to 127). |
There was a problem hiding this comment.
So is it not an ASCII character? is it a Unicode character? "A Char" is probably least descriptive here.
There was a problem hiding this comment.
No it isn't a unicode codepoint either, that's 21 bits. It is just a single byte. Half of which is valid ascii.
There was a problem hiding this comment.
So maybe "A character represented as a signed 8-bit integer (-128 to 127)" ?
There was a problem hiding this comment.
Actually I think it should be 'A Char is a signed 8-bit integer' ?
The class is called char and not all valid chars are characters. Really it should just be called Byte.
There was a problem hiding this comment.
Char's identity was always a mystery to me.
Char is a signed 8-bit integer is pretty good, but it differs from a regular integer since it is most commonly represented as a character.
What else can it represent? When it's value lies outside of ASCII, isn't it essentially a flavor of extended ASCII?
| @@ -1,2 +1,2 @@ | |||
| class::Char | |||
| summary::ASCII character | |||
There was a problem hiding this comment.
If it's not ASCII anymore (?) this should also be updated
There was a problem hiding this comment.
It never was because it is a char.
... Sorry I realised that isn't useful! It's a whole byte but ASCII is only 7 bits. So it can represent invalid ASCII too.
| together in strings that use these encodings. | ||
|
|
||
| The SuperCollider IDE uses UTF-8 to decode and display strings. | ||
| The SuperCollider IDE and UTF-8 to decode and display strings. |
There was a problem hiding this comment.
I don't understand this change...?
There was a problem hiding this comment.
Oops! I rewrote this and decided against it!
| 12r4a.abc // wrong | ||
| 12r4a.ABC // works | ||
| 12r4A.ABC // better | ||
| 12r4a.ab // wrong | ||
| 12r4a.AB // works | ||
| 12r4A.AB // better |
There was a problem hiding this comment.
Because they were invalid. Try running the old stuff and you will get a parsing error because 'c' is an unexpected identitfier. I wrote an issue about this weird bit of syntax from radixs a while a ago.
It's unrelated but just a small thing.
| code:: | ||
| \1234; | ||
| \892342534; | ||
| \3456meow // not illegal |
| See link::Classes/String:: for more information. | ||
|
|
||
| note:: | ||
| From version 3.15 and onwards one can no longer have a whitespace literal (except the space) after the dollar sign. |
There was a problem hiding this comment.
I think this should be From version 3.15 onwards (i.e. remove "and")
Previously there was a nasty bug where strings with newlines in them depended on the author's os and editor when written in the class library.
This led to this bug that existed on windows when the class was written by a user.
This is because on windows the newline character is actually two, \r\n, but on unix, and when something is checked into a git repo, and in the sc ide's runtime evaluation mode, the newline character is '\n'.
As well as converting all \r\n into \n, this pr also converts all \r into \n, this is because ancient macs (before 2001) used to use \r as thier newline character and we still have some code that does that.
Further, we redefine that the only whitespace character the ascii literal
$can represent is the ascii space ' '. The error message for this looks as follows:This is done because previously the lexer would turn this runtime code
$into$, as it would always insert a space at the end of the evaluated code (else it got stuck in an inifitine loop).Fixes #7520
Fixes #7523
Fixes #7509
Types of changes
To-do list