-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Steps to reproduce
Invoke-RestMethod 'http://api.forismatic.com/api/1.0/?method=getQuote&format=json&lang=ru' -verboseExpected behavior
It should detect the utf-8 encoding, and produce the same output as this:
$resp = Invoke-WebRequest 'http://api.forismatic.com/api/1.0/?method=getQuote&format=json&lang=ru'
$char = $resp.RawContentStream.ToArray()
$str = [Text.Encoding]::UTF8.GetString($char)
ConvertFrom-Json $stri.e. something like this:
VERBOSE: GET http://api.forismatic.com/api/1.0/?method=getQuote&format=json with 0-byte payload
VERBOSE: received 536-byte response of content type application/json
VERBOSE: Content encoding: utf-8
quoteText : Именно внутренний диалог прижимает к земле людей в повседневной жизни. Мир для нас такой-то и такой-то или этакий и этакий лишь потому, что мы сами себе говорим о нем, что он такой-то и такой-то или этакий и этакий.
quoteAuthor : Карлос Кастанеда
senderName :
senderLink :
quoteLink : http://forismatic.com/ru/6309006412/
Actual behavior
It falls back to iso-8859-1 encoding and produces gobbledygook with a lot of Ð's in it. Also, it utterly fails to produce a number for the #-byte response string in the verbose output.
VERBOSE: GET http://api.forismatic.com/api/1.0/?method=getQuote&format=json with 0-byte payload
VERBOSE: received -byte response of content type application/json
VERBOSE: Content encoding: iso-8859-1
quoteText : �о в��ком� п�ибежи�� об�а�а���� л�ди, м��им�е ���а�ом: к го�ам
и к ле�ам, к де�ев��м в �о�е, к г�обни�ам.
quoteAuthor : ��дда �а��ама
senderName :
senderLink :
quoteLink : http://forismatic.com/ru/804c7d14d9/
Discussion
When calling an HTTP endpoint that returns a header: Content-Type: application/json the WebCmdlets are incorrectly defaulting to iso-8859-1 rather than a proper unicode encoding, and are disregarding the application/json RFC's simple specification for how to determine the content encoding.
- The JSON standard ECMA-404 (PDF) clearly states that JSON must be unicode
- The application/json RFC (in section 3) clearly indicates how the encoding should be determined from the first 4 bytes of the content.
NOTE: Please don't work around this by just defaulting to utf-8. I'm sure that 90% of the time, you could probably get away with that, but it's not actually correct, and the RFC implementation is trivial.
ALSO NOTE: The WebCmdlets do respect the ; charset=utf-8 attribute if it's present on the content-type header -- which makes sense, but isn't technically standards compliant for an application/* content-type, as far as I can tell.
To get started: ProcessResponse and TryGetEncoding
See also #5528 which was a specific instance of this problem. @lipkau was incorrectly convinced by early responders that the problem was in the webserver, but it's actually in PowerShell's cmdlets. If you invoke the rest API against the Atlassian wiki, you can see the problem happening in the Verbose stream:
$r = IRM $url -Credential $mycred -Authentication basic -Verbose
VERBOSE: GET https://powershell.atlassian.net/wiki/rest/api/content/13009245?expand=space,version with 0-byte payload
VERBOSE: received -byte response of content type application/json
VERBOSE: Content encoding: iso-8859-1The content is actually correctly utf-8 encoded (as you could tell from the positions of the nulls in the first 4 bytes), and iso-8859-1 is never a valid encoding for application/json, period.
PS C:\Program Files\PowerShell\6.0.0-rc> $PSVersionTable
Name Value
---- -----
PSVersion 6.0.0-rc
PSEdition Core
GitCommitId v6.0.0-rc
OS Microsoft Windows 10.0.15063
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0