Skip to content

Prepare for BOM-less UTF-8 default character encoding with respect to $OutputEncoding and console code page #4681

@mklement0

Description

@mklement0

BOM-less UTF-8 character encoding is coming as the default for PowerShell Core on all platforms.

Two attendant changes are required:

  • Preference variable $OutputEncoding, which currently defaults to ASCII, must default to [System.Text.UTF8Encoding]::new() (UTF-8 with no BOM), or, perhaps preferably, not predefine this variable and default to that encoding (the internally used default) in its absence.

    • $OutputEncoding tells PowerShell what character encoding to use when sending output to external utilities.
  • Console / terminal character encoding:

    • On Windows, [Console]::InputEncoding and [Console]::OutputEncoding must both be set to [System.Text.UTF8Encoding]::new(), which is the equivalent of configuring a console window to use code page 65001 (UTF-8) or executing chcp 65001 before PowerShell is launched.

      • [Console]::OutputEncoding tells PowerShell what encoding to assume when reading output from external utilities.

      • On Windows, the Start Menu shortcut that is created during installation should be preconfigured to open a console window with code page 65001.

        • Conceivably, PowerShell should automatically switch to the 65001 code page in case it is launched from a console window with a different active code page (such as from cmd.exe), though it is worth noting that this change in encoding by default remains in effect until the window is closed (even after exiting PowerShell and returning to cmd.exe; perhaps a warning could be issued on startup).
    • On Unix platforms with UTF-8-based locales, which are the norm these days, no action is required.

      • To be determined: How should the rare event of being invoked from a terminal with a different active character encoding be handled? Changing the encoding on the fly, as on Windows, is not guaranteed to work. Perhaps a warning on startup is sufficient.

Before the above is implemented, the interim workaround to make a console window / terminal use UTF-8 consistently is the following command:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new()

Environment data

PowerShell Core v6.0.0-beta.6

Metadata

Metadata

Assignees

Labels

Resolution-FixedThe issue is fixed.WG-Enginecore PowerShell engine, interpreter, and runtime

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions