@@ -46,8 +46,8 @@ specification does not provide a mechanism for extending any aspect of encodings
4646encoding in use, or on the way a given encoding is to be implemented. For instance, an attack was
4747reported in 2011 where a <a>Shift_JIS</a> lead byte 0x82 was used to “mask” a 0x22 trail byte in a
4848JSON resource of which an attacker could control some field. The producer did not see the problem
49- even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD and
50- therefore changed the overall interpretation as U+0022 is an important delimiter. Decoders of
49+ even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD (�) and
50+ therefore changed the overall interpretation as U+0022 (") is an important delimiter. Decoders of
5151encodings that use multiple bytes for scalar values now require that in case of an illegal byte
5252combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. For the
5353aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate exception to this, the
@@ -786,9 +786,9 @@ duplicated.
786786<a for=/>decoder</a> and one for its <a for=/>encoder</a> .
787787
788788<p> To find the pointers and their corresponding code points in an <a>index</a> ,
789- let <var> lines</var> be the result of splitting the resource's contents on U+000A.
790- Then remove each item in <var> lines</var> that is the empty string or starts with U+0023.
791- Then the pointers and their corresponding code points are found by splitting each item in <var> lines</var> on U+0009.
789+ let <var> lines</var> be the result of splitting the resource's contents on U+000A LF .
790+ Then remove each item in <var> lines</var> that is the empty string or starts with U+0023 (#) .
791+ Then the pointers and their corresponding code points are found by splitting each item in <var> lines</var> on U+0009 TAB .
792792The first subitem is the pointer (as a decimal number) and the second is the corresponding code point (as a hexadecimal number).
793793Other subitems are not relevant.
794794
@@ -1355,7 +1355,7 @@ interface mixin TextDecoderCommon {
13551355 <ol>
13561356 <li><p> Set <var> decoder</var> 's <a for=TextDecoderCommon>BOM seen</a> to true.
13571357
1358- <li><p> If <var> item</var> is U+FEFF, then <a for=iteration>continue</a> .
1358+ <li><p> If <var> item</var> is U+FEFF BOM , then <a for=iteration>continue</a> .
13591359 </ol>
13601360
13611361 <li><p> Append <var> item</var> to <var> output</var> .
@@ -2012,7 +2012,7 @@ constructor steps are:
20122012 <p class=note> {{DOMString}} , as well as an <a for=/>I/O queue</a> of code units rather than scalar
20132013 values, are used here so that a surrogate pair that is split between chunks can be reassembled into
20142014 the appropriate scalar value. The behavior is otherwise identical to {{USVString}} . In particular,
2015- lone surrogates will be replaced with U+FFFD.
2015+ lone surrogates will be replaced with U+FFFD (�) .
20162016
20172017 <li><p> Let <var> output</var> be the <a for=/>I/O queue</a> of bytes « <a>end-of-queue</a> ».
20182018
@@ -2072,13 +2072,13 @@ constructor steps are:
20722072
20732073 <li><p> <a>Restore</a> <var> item</var> to <var> input</var> .
20742074
2075- <li><p> Return U+FFFD.
2075+ <li><p> Return U+FFFD (�) .
20762076 </ol>
20772077
20782078 <li><p> If <var> item</var> is a <a for=/>leading surrogate</a> , then set <var> encoder</var> 's
20792079 <a for=TextEncoderStream>leading surrogate</a> to <var> item</var> and return <a>continue</a> .
20802080
2081- <li><p> If <var> item</var> is a <a for=/>trailing surrogate</a> , then return U+FFFD.
2081+ <li><p> If <var> item</var> is a <a for=/>trailing surrogate</a> , then return U+FFFD (�) .
20822082
20832083 <li><p> Return <var> item</var> .
20842084</ol>
@@ -2810,14 +2810,14 @@ and <var>byte</var>, runs these steps:
28102810 <li><p> If <var> codePoint</var> is an <a>ASCII code point</a> , then return a byte whose value is
28112811 <var> codePoint</var> .
28122812
2813- <li><p> If <var> codePoint</var> is U+00A5, then return byte 0x5C.
2813+ <li><p> If <var> codePoint</var> is U+00A5 (¥) , then return byte 0x5C.
28142814
2815- <li><p> If <var> codePoint</var> is U+203E, then return byte 0x7E.
2815+ <li><p> If <var> codePoint</var> is U+203E (‾) , then return byte 0x7E.
28162816
28172817 <li><p> If <var> codePoint</var> is in the range U+FF61 to U+FF9F, inclusive, then return two bytes
28182818 whose values are 0x8E and <var> codePoint</var> − 0xFF61 + 0xA1.
28192819
2820- <li><p> If <var> codePoint</var> is U+2212, then set it to U+FF0D.
2820+ <li><p> If <var> codePoint</var> is U+2212 (−) , then set it to U+FF0D (-) .
28212821
28222822 <li>
28232823 <p> Let <var> pointer</var> be the <a>index pointer</a> for <var> codePoint</var> in
@@ -2890,10 +2890,10 @@ and <var>byte</var>, runs these steps:
28902890 <a>continue</a> .
28912891
28922892 <dt> 0x5C
2893- <dd><p> Set <a>ISO-2022-JP output</a> to false and return code point U+00A5.
2893+ <dd><p> Set <a>ISO-2022-JP output</a> to false and return code point U+00A5 (¥) .
28942894
28952895 <dt> 0x7E
2896- <dd><p> Set <a>ISO-2022-JP output</a> to false and return code point U+203E.
2896+ <dd><p> Set <a>ISO-2022-JP output</a> to false and return code point U+203E (‾) .
28972897
28982898 <dt> 0x00 to 0x7F, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E
28992899 <dd><p> Set <a>ISO-2022-JP output</a> to false and return a code point whose
@@ -3084,9 +3084,9 @@ and <var>byte</var>, runs these steps:
30843084 <li>
30853085 <p> If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder ASCII">ASCII</a> or
30863086 <a lt="ISO-2022-JP encoder Roman">Roman</a> , and <var> codePoint</var> is U+000E, U+000F, or
3087- U+001B, then return <a>error</a> with U+FFFD.
3087+ U+001B, then return <a>error</a> with U+FFFD (�) .
30883088
3089- <p class=note> This returns U+FFFD rather than <var> codePoint</var> to prevent attacks.
3089+ <p class=note> This returns U+FFFD (�) rather than <var> codePoint</var> to prevent attacks.
30903090 <!-- https://github.com/whatwg/encoding/issues/15 -->
30913091
30923092 <li><p> If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder ASCII">ASCII</a> and
@@ -3095,29 +3095,29 @@ and <var>byte</var>, runs these steps:
30953095
30963096 <li>
30973097 <p> If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder Roman">Roman</a> and
3098- <var> codePoint</var> is an <a>ASCII code point</a> , excluding U+005C and U+007E, or is U+00A5 or
3099- U+203E:
3098+ <var> codePoint</var> is an <a>ASCII code point</a> , excluding U+005C (\) and U+007E (~) , or is
3099+ U+00A5 (¥) or U+ 203E (‾) :
31003100
31013101 <ol>
31023102 <li><p> If <var> codePoint</var> is an <a>ASCII code point</a> , then return a byte whose value is
31033103 <var> codePoint</var> .
31043104
3105- <li><p> If <var> codePoint</var> is U+00A5, then return byte 0x5C.
3105+ <li><p> If <var> codePoint</var> is U+00A5 (¥) , then return byte 0x5C.
31063106
3107- <li><p> If <var> codePoint</var> is U+203E, then return byte 0x7E.
3107+ <li><p> If <var> codePoint</var> is U+203E (‾) , then return byte 0x7E.
31083108 </ol>
31093109
31103110 <li><p> If <var> codePoint</var> is an <a>ASCII code point</a> , and <a>ISO-2022-JP encoder state</a>
31113111 is not <a lt="ISO-2022-JP encoder ASCII">ASCII</a> , then <a>restore</a> <var> codePoint</var> to
31123112 <var> ioQueue</var> , set <a>ISO-2022-JP encoder state</a> to
31133113 <a lt="ISO-2022-JP encoder ASCII">ASCII</a> , and return three bytes 0x1B 0x28 0x42.
31143114
3115- <li><p> If <var> codePoint</var> is either U+00A5 or U+203E, and <a>ISO-2022-JP encoder state</a> is
3116- not <a lt="ISO-2022-JP encoder Roman">Roman</a> , then <a>restore</a> <var> codePoint </var> to
3117- <var> ioQueue</var> , set <a>ISO-2022-JP encoder state</a> to
3115+ <li><p> If <var> codePoint</var> is either U+00A5 (¥) or U+203E (‾) , and
3116+ <a>ISO-2022-JP encoder state</a> is not <a lt="ISO-2022-JP encoder Roman">Roman</a> , then
3117+ <a>restore</a> <var> codePoint </var> to < var> ioQueue</var> , set <a>ISO-2022-JP encoder state</a> to
31183118 <a lt="ISO-2022-JP encoder Roman">Roman</a> , and return three bytes 0x1B 0x28 0x4A.
31193119
3120- <li><p> If <var> codePoint</var> is U+2212, then set it to U+FF0D.
3120+ <li><p> If <var> codePoint</var> is U+2212 (−) , then set it to U+FF0D (-) .
31213121
31223122 <li><p> If <var> codePoint</var> is in the range U+FF61 to U+FF9F, inclusive, then set it to the
31233123 <a>index code point</a> for <var> codePoint</var> − 0xFF61 in
@@ -3238,14 +3238,14 @@ and <var>byte</var>, runs these steps:
32383238 <li><p> If <var> codePoint</var> is an <a>ASCII code point</a> or U+0080, then return a byte whose
32393239 value is <var> codePoint</var> .
32403240
3241- <li><p> If <var> codePoint</var> is U+00A5, then return byte 0x5C.
3241+ <li><p> If <var> codePoint</var> is U+00A5 (¥) , then return byte 0x5C.
32423242
3243- <li><p> If <var> codePoint</var> is U+203E, then return byte 0x7E.
3243+ <li><p> If <var> codePoint</var> is U+203E (‾) , then return byte 0x7E.
32443244
32453245 <li><p> If <var> codePoint</var> is in the range U+FF61 to U+FF9F, inclusive, then return a byte
32463246 whose value is <var> codePoint</var> − 0xFF61 + 0xA1.
32473247
3248- <li><p> If <var> codePoint</var> is U+2212, then set it to U+FF0D.
3248+ <li><p> If <var> codePoint</var> is U+2212 (−) , then set it to U+FF0D (-) .
32493249
32503250 <li><p> Let <var> pointer</var> be the <a>index Shift_JIS pointer</a> for <var> codePoint</var> .
32513251
0 commit comments