Skip to content

Escape Mechanisms in Identifiers #2810

@mehmetoguzderin

Description

@mehmetoguzderin

Today, the way to spell code points for WGSL within identifiers and anywhere else is the direct input (one can use escape mechanisms found in the languages containing WGSL strings, such as JavaScript, but the WGSL spec has no notion of escaping inside it directly). Character escaping is a valuable utility to spell code points inside source files due to a reduction in direct input and visual recognition (beyond just an internationalization concern). This escaping mechanism(s) usually gets support on a few levels: identifiers, string literals, source, RegEx, etc., but where it is of immediate relevance in WGSL is the identifiers, where we support XID Identifier code points. These mechanisms are for both declaration and use of these identifier names.

Three ways of escaping in JavaScript:

  • \x??: The hex number must consist of two code points, where valid values are in [00, FF] (inclusive range). This range primarily encompasses ASCII characters.
  • \u????: The hex number must consist of four code points, where valid values are in [0000, FFFF] (inclusive range). This range primarily encompasses more common code points.
  • \u{?...?}: The hex number must consist of one or more code points, where valid values are in [0, F...F] (inclusive range). This range is for all Unicode code points.

Some languages, such as Rust, opt not to support \u????, where they only support \x?? and u{?...?}. However, I would argue that for JavaScript string literal and WGSL source code copy-paste compatibility, it is valuable to support all three methods of escaping in JavaScript. And I have not been able to spot support for \x{?...?}.

A figurative RegEx (one sad thing is that we will not be able to embed the XID Range restrictions with the code points represented in these escapes, where it would probably take an almost infinitely long RegEx string):

/(([_\p{XID_Start}]|(\\x[0-9a-fA-F][0-9a-fA-F])|(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])|(\\u\{[0-9a-fA-F]+\}))([\p{XID_Continue}]|(\\x[0-9a-fA-F][0-9a-fA-F])|(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])|(\\u\{[0-9a-fA-F]+\}))+)|([\p{XID_Start}]|(\\x[0-9a-fA-F][0-9a-fA-F])|(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])|(\\u\{[0-9a-fA-F]+\}))/uy

In anticipation of w3c/i18n-activity#1511 feedback, we think this feature can be a great post-V1 enhancement since there is no clashing aspect within the WGSL spec, and there is a way to input the non-ASCII characters (direct input).

Metadata

Metadata

Assignees

No one assigned

    Labels

    i18n-needs-resolutionIssue the Internationalization Group has raised and looks for a response on.wgslWebGPU Shading Language Issues

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions