Skip to content

Commit 4d5f14f

Browse files
committed
bit more
1 parent d74c8b9 commit 4d5f14f

File tree

1 file changed

+27
-27
lines changed

1 file changed

+27
-27
lines changed

encoding.bs

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ specification does not provide a mechanism for extending any aspect of encodings
4646
encoding in use, or on the way a given encoding is to be implemented. For instance, an attack was
4747
reported in 2011 where a <a>Shift_JIS</a> lead byte 0x82 was used to “mask” a 0x22 trail byte in a
4848
JSON resource of which an attacker could control some field. The producer did not see the problem
49-
even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD and
50-
therefore changed the overall interpretation as U+0022 is an important delimiter. Decoders of
49+
even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD (�) and
50+
therefore changed the overall interpretation as U+0022 (") is an important delimiter. Decoders of
5151
encodings that use multiple bytes for scalar values now require that in case of an illegal byte
5252
combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. For the
5353
aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate exception to this, the
@@ -786,9 +786,9 @@ duplicated.
786786
<a for=/>decoder</a> and one for its <a for=/>encoder</a>.
787787

788788
<p>To find the pointers and their corresponding code points in an <a>index</a>,
789-
let <var>lines</var> be the result of splitting the resource's contents on U+000A.
790-
Then remove each item in <var>lines</var> that is the empty string or starts with U+0023.
791-
Then the pointers and their corresponding code points are found by splitting each item in <var>lines</var> on U+0009.
789+
let <var>lines</var> be the result of splitting the resource's contents on U+000A LF.
790+
Then remove each item in <var>lines</var> that is the empty string or starts with U+0023 (#).
791+
Then the pointers and their corresponding code points are found by splitting each item in <var>lines</var> on U+0009 TAB.
792792
The first subitem is the pointer (as a decimal number) and the second is the corresponding code point (as a hexadecimal number).
793793
Other subitems are not relevant.
794794

@@ -1355,7 +1355,7 @@ interface mixin TextDecoderCommon {
13551355
<ol>
13561356
<li><p>Set <var>decoder</var>'s <a for=TextDecoderCommon>BOM seen</a> to true.
13571357

1358-
<li><p>If <var>item</var> is U+FEFF, then <a for=iteration>continue</a>.
1358+
<li><p>If <var>item</var> is U+FEFF BOM, then <a for=iteration>continue</a>.
13591359
</ol>
13601360

13611361
<li><p>Append <var>item</var> to <var>output</var>.
@@ -2012,7 +2012,7 @@ constructor steps are:
20122012
<p class=note>{{DOMString}}, as well as an <a for=/>I/O queue</a> of code units rather than scalar
20132013
values, are used here so that a surrogate pair that is split between chunks can be reassembled into
20142014
the appropriate scalar value. The behavior is otherwise identical to {{USVString}}. In particular,
2015-
lone surrogates will be replaced with U+FFFD.
2015+
lone surrogates will be replaced with U+FFFD (�).
20162016

20172017
<li><p>Let <var>output</var> be the <a for=/>I/O queue</a> of bytes « <a>end-of-queue</a> ».
20182018

@@ -2072,13 +2072,13 @@ constructor steps are:
20722072

20732073
<li><p><a>Restore</a> <var>item</var> to <var>input</var>.
20742074

2075-
<li><p>Return U+FFFD.
2075+
<li><p>Return U+FFFD (�).
20762076
</ol>
20772077

20782078
<li><p>If <var>item</var> is a <a for=/>leading surrogate</a>, then set <var>encoder</var>'s
20792079
<a for=TextEncoderStream>leading surrogate</a> to <var>item</var> and return <a>continue</a>.
20802080

2081-
<li><p>If <var>item</var> is a <a for=/>trailing surrogate</a>, then return U+FFFD.
2081+
<li><p>If <var>item</var> is a <a for=/>trailing surrogate</a>, then return U+FFFD (�).
20822082

20832083
<li><p>Return <var>item</var>.
20842084
</ol>
@@ -2810,14 +2810,14 @@ and <var>byte</var>, runs these steps:
28102810
<li><p>If <var>codePoint</var> is an <a>ASCII code point</a>, then return a byte whose value is
28112811
<var>codePoint</var>.
28122812

2813-
<li><p>If <var>codePoint</var> is U+00A5, then return byte 0x5C.
2813+
<li><p>If <var>codePoint</var> is U+00A5 (¥), then return byte 0x5C.
28142814

2815-
<li><p>If <var>codePoint</var> is U+203E, then return byte 0x7E.
2815+
<li><p>If <var>codePoint</var> is U+203E (‾), then return byte 0x7E.
28162816

28172817
<li><p>If <var>codePoint</var> is in the range U+FF61 to U+FF9F, inclusive, then return two bytes
28182818
whose values are 0x8E and <var>codePoint</var> &minus; 0xFF61 + 0xA1.
28192819

2820-
<li><p>If <var>codePoint</var> is U+2212, then set it to U+FF0D.
2820+
<li><p>If <var>codePoint</var> is U+2212 (−), then set it to U+FF0D (-).
28212821

28222822
<li>
28232823
<p>Let <var>pointer</var> be the <a>index pointer</a> for <var>codePoint</var> in
@@ -2890,10 +2890,10 @@ and <var>byte</var>, runs these steps:
28902890
<a>continue</a>.
28912891

28922892
<dt>0x5C
2893-
<dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+00A5.
2893+
<dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+00A5 (¥).
28942894

28952895
<dt>0x7E
2896-
<dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+203E.
2896+
<dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+203E (‾).
28972897

28982898
<dt>0x00 to 0x7F, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E
28992899
<dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
@@ -3084,9 +3084,9 @@ and <var>byte</var>, runs these steps:
30843084
<li>
30853085
<p>If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder ASCII">ASCII</a> or
30863086
<a lt="ISO-2022-JP encoder Roman">Roman</a>, and <var>codePoint</var> is U+000E, U+000F, or
3087-
U+001B, then return <a>error</a> with U+FFFD.
3087+
U+001B, then return <a>error</a> with U+FFFD (�).
30883088

3089-
<p class=note>This returns U+FFFD rather than <var>codePoint</var> to prevent attacks.
3089+
<p class=note>This returns U+FFFD (�) rather than <var>codePoint</var> to prevent attacks.
30903090
<!-- https://github.com/whatwg/encoding/issues/15 -->
30913091

30923092
<li><p>If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder ASCII">ASCII</a> and
@@ -3095,29 +3095,29 @@ and <var>byte</var>, runs these steps:
30953095

30963096
<li>
30973097
<p>If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder Roman">Roman</a> and
3098-
<var>codePoint</var> is an <a>ASCII code point</a>, excluding U+005C and U+007E, or is U+00A5 or
3099-
U+203E:
3098+
<var>codePoint</var> is an <a>ASCII code point</a>, excluding U+005C (\) and U+007E (~), or is
3099+
U+00A5 (¥) or U+203E (‾):
31003100

31013101
<ol>
31023102
<li><p>If <var>codePoint</var> is an <a>ASCII code point</a>, then return a byte whose value is
31033103
<var>codePoint</var>.
31043104

3105-
<li><p>If <var>codePoint</var> is U+00A5, then return byte 0x5C.
3105+
<li><p>If <var>codePoint</var> is U+00A5 (¥), then return byte 0x5C.
31063106

3107-
<li><p>If <var>codePoint</var> is U+203E, then return byte 0x7E.
3107+
<li><p>If <var>codePoint</var> is U+203E (‾), then return byte 0x7E.
31083108
</ol>
31093109

31103110
<li><p>If <var>codePoint</var> is an <a>ASCII code point</a>, and <a>ISO-2022-JP encoder state</a>
31113111
is not <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, then <a>restore</a> <var>codePoint</var> to
31123112
<var>ioQueue</var>, set <a>ISO-2022-JP encoder state</a> to
31133113
<a lt="ISO-2022-JP encoder ASCII">ASCII</a>, and return three bytes 0x1B 0x28 0x42.
31143114

3115-
<li><p>If <var>codePoint</var> is either U+00A5 or U+203E, and <a>ISO-2022-JP encoder state</a> is
3116-
not <a lt="ISO-2022-JP encoder Roman">Roman</a>, then <a>restore</a> <var>codePoint</var> to
3117-
<var>ioQueue</var>, set <a>ISO-2022-JP encoder state</a> to
3115+
<li><p>If <var>codePoint</var> is either U+00A5 (¥) or U+203E (‾), and
3116+
<a>ISO-2022-JP encoder state</a> is not <a lt="ISO-2022-JP encoder Roman">Roman</a>, then
3117+
<a>restore</a> <var>codePoint</var> to <var>ioQueue</var>, set <a>ISO-2022-JP encoder state</a> to
31183118
<a lt="ISO-2022-JP encoder Roman">Roman</a>, and return three bytes 0x1B 0x28 0x4A.
31193119

3120-
<li><p>If <var>codePoint</var> is U+2212, then set it to U+FF0D.
3120+
<li><p>If <var>codePoint</var> is U+2212 (−), then set it to U+FF0D (-).
31213121

31223122
<li><p>If <var>codePoint</var> is in the range U+FF61 to U+FF9F, inclusive, then set it to the
31233123
<a>index code point</a> for <var>codePoint</var> &minus; 0xFF61 in
@@ -3238,14 +3238,14 @@ and <var>byte</var>, runs these steps:
32383238
<li><p>If <var>codePoint</var> is an <a>ASCII code point</a> or U+0080, then return a byte whose
32393239
value is <var>codePoint</var>.
32403240

3241-
<li><p>If <var>codePoint</var> is U+00A5, then return byte 0x5C.
3241+
<li><p>If <var>codePoint</var> is U+00A5 (¥), then return byte 0x5C.
32423242

3243-
<li><p>If <var>codePoint</var> is U+203E, then return byte 0x7E.
3243+
<li><p>If <var>codePoint</var> is U+203E (‾), then return byte 0x7E.
32443244

32453245
<li><p>If <var>codePoint</var> is in the range U+FF61 to U+FF9F, inclusive, then return a byte
32463246
whose value is <var>codePoint</var> &minus; 0xFF61 + 0xA1.
32473247

3248-
<li><p>If <var>codePoint</var> is U+2212, then set it to U+FF0D.
3248+
<li><p>If <var>codePoint</var> is U+2212 (−), then set it to U+FF0D (-).
32493249

32503250
<li><p>Let <var>pointer</var> be the <a>index Shift_JIS pointer</a> for <var>codePoint</var>.
32513251

0 commit comments

Comments
 (0)