Magnus Westerlund
2011-04-29 17:00:48 UTC
Hi,
On request I have performed a transport directorate review of this
document. As the document is in WG last call I have included any issue I
have found in the document. I will note that I haven't followed the WG.
I assume some issues are highly debated and have gone through consensus
process. If that is the case indicate so and preferably some summary for
my information.
Regarding TSV-Dir reviews you may read more about those here:
http://trac.tools.ietf.org/area/tsv/trac/wiki/TSV-Directorate-Reviews
My Summary: Document not ready for publication multiple issues that
needs to be resolved.
Note, the issues are not ordered in any particular order. Thus small and
big issues may follow each other.
1. I have a question regarding the copyright boiler plate and the
acknowledgment section. If a large number of people contributed text to
the first version provided to IETF, does the WG have ensured that all
contributors of text have provided the rights to IETF, or does WHATWG
have such requirements on contributors that IETF has in fact received
all the copyrights it needs not only to publish a document but also
transfer it other organizations if needed?
2. Section 1.3 and 9. I really don't see how the Sec-WebSocket-Origin
header provides any security without additional mechanisms. Yes, a
trusted browser implementation will set it correctly. However, as non
browser or hacked browser can fake this header the server has no direct
trust in this value. To me it appears that one would need to make
additional bindings to some other contexts to be able to trust the value
of this header. Or am I missing some important part here?
If not, then shouldn't the security considerations and description be
expanded to explain how you can use this header to actually accomplish
some real security?
3. Section 1.3, Page 7,
For this header, the server has to take the value (as present in the
header, e.g. the base64-encoded [RFC4648] version minus leading and
trailing whitespace), and concatenate this with the GUID "258EAFA5-
E914-47DA-95CA-C5AB0DC85B11" in string form, which is unlikely to be
used by network endpoints that do not understand the WebSocket
protocol.
Can you please expand the "GUID" acronym and if this GUID conforms to
any particular format for GUIDs then can you provide a reference to that
format?
4. Section 1.3, Page 7,
"SHA-1 hash, base64-encoded, of this concatenation is
then returned in the server's handshake [FIPS.180-2.2002]."
I would suggest that you indicate how many bits of the hash you uses.
SHA-1 hashes are commonly truncated to shorter forms.
5. Section 1.4:
"The closing handshake is intended to replace the TCP closing
handshake (FIN/ACK), on the basis that the TCP closing handshake is
not always reliable end-to-end, especially in the presence of man-in-
the-middle proxies and other intermediaries."
As the closing handshake doesn't replace the TCP closing, it complements
or augments it to work over intermediaries I would suggest to change the
wording.
6. Section 1.5:
"and re-implements the closing handshake in-band."
Similar to the previous comment. I think "re-implements" is the wrong
word here, complements or something else. You are not in fact changing
the TCP stack, but it does sounds like it.
7. Section 1.9, 10:
I definitely think this specification should create an registry for
protocol identifiers. As the header likely allows for very long strings
there is no risk of every filling the registry so a RFC 5226 policy of
"First come, first served" appears suitable. Then one only needs to
define the amount of information one needs when requesting a
registration. Likely the template can be this simple:
Protocol ID string:
Reference: <if something exist>
Contact Person:
I do think that one should move quite a lot of the discussion in Section
1.9 into a more relevant normative section where the actual header is
defined.
8. Lack of formal specification of the following HTTP headers:
Sec-WebSocket-Protocol, Sec-WebSocket-Origin, Sec-WebSocket-Accept. They
all lack any format definition in the format of ABNF. In general I think
it would be good to actually give each of the HTTP headers its own
section with definition.
9. Section 2.1:
I am missing a reference to UTF-8 and the U+0041 notation. I guess STD63
would do the trick.
10. Section 2.1:
The usage of *bold* _underline_ and /italic/ for different purpose in
the document should be explained. Especially the /italic/ for variable
names is important. Please do remember that a number of people may read
the specification in programs that does not embelish the text where we
get to see the star, the under score and the forward slash.
11. Section 3.
I am missing an introduction to the section where it explains that
websocket defines its own two schemes that are usable in various
environments where one would like to indicate to where to open a websocket.
12. Section 3.
Why isn't the URI scheme definition present in this section? That would
help a lot before going into the parsing rules.
13. Section 3.1:
"These steps return either a /host/, a /port/, a
/resource name/, and a /secure/ flag, or they fail."
This is the first of many usage of "fail" to indicate some abort
procedure. I assume the one in section 7.1.4. However, that does not
seem to be consistent as parsing of websocket URI results inconclusive
results even prior to establishing any connection. I would recommend
that one check all the occurrences of "fail" and see if one can be more
specific on what is supposed to do. And in cases it is a common
procedure, please include a reference to that section that explains that
procedure.
14. Section 3.3 and 10.1 and 10.2:
o The /host/ MUST be ASCII-only (i.e. it MUST have been punycode-
encoded [RFC3492] already if necessary, and MUST NOT contain any
characters above U+007E).
o The /resource name/ string MUST be a non-empty string of
characters in the range U+0021 to U+007E and MUST start with a
U+002F SOLIDUS character (/).
For host is quite clear from the RFC 3986 ABNF that other characters
aren't allowed, is this a real-world problem that force on to make this
clear? Or are there any further limitation than what the RFC 3986 ABNF
has for host? In which case the ABNF in the URI scheme template should
be redefined.
The second one is a stricter than the ABNF in the RFC 3986 as
"path-abempty" allows for empty resource. Thus I think the ABNF in the
templace should be updated to reflect the actuall ABNF for the URI
scheme being defined.
15. section 4.1:
As such, in the absence of extensions negotiated during the opening
handshake (Section 5), all reserved bits MUST be 0 and reserved
opcode values MUST NOT be used.
I would recommend a clarification on what to do with reserved bits. My
preference is to say: MUST be set to 0 and ignored on reception. The
reason for this is that if one starts using them with some extensions
they will be separate from zero but still correctly ignored by a non
upgraded interpreter.
But in fact it may be that the above text should in fact be moved to the
definition of RSV1, RSV2, RSV3.
16. Section 4.2:
Opcode: 4 bits
Defines the interpretation of the payload data
I am missing a table for the value. As far as I understand the only
place they are present are in the ABNF.
17. Section 4.2: Payload length
The length of the payload: if 0-125, that is the payload length.
The text fails to tell in what units the payload length is provided in.
I guess it is bytes but it should be written.
18. Section 4.2: Payload Data, Extension Data and Application Data
all these headings says that they are "n" bytes. But in fact they are
n+m, n and m bytes respectively. Would it make sense to make this
clearer by using different integer variables?
19. The usage of ABNF to define binary protocols.
I don't think ABNF for binary protocols are a particular great idea as
it is difficult to express the field size of individual fields.
20. I have run the ABNF trhoug some parser and my own brain and have
found to following errors:
A.
frame-opcode = %x0 ; continuation frame
/ %x1 ; text frame
/ %x2 ; binary frame
/ %3-7 ; reserved for further non-control frames
Missing "x" before in / %3-7 that should be / %x3-7
B.
frame-masking-key = <4>( %0x00-FF ) ; present only if frame-masked
is 1<CR><LF>
The intention here is to have 4 bytes, but by using <4> you have in fact
made the 4 into a comment rather that part of the rule. It should be:
frame-masking-key = 4( %0x00-FF ) ; present only if frame-masked is 1
21. Section 4.3:
The masking key MUST be derived from a strong source of entropy, and
the masking key for a given frame MUST NOT make it simple for a
server to predict the masking key for a subsequent frame.
This appears to be a suitable place to reference RFC 4086
"Randomness Requirements for Security"
22. Section 4.3:
If I understand the masking correctly it for one particular frame it
uses the same mask over and over again over each 32-bits. As the frame
format supports message of size up to 2^63 I think I can perform a cache
poisining attack anyway. This as I will know the boundaries for where
each 32-bits will be used. Thus I create my message that I want to get
inserted into a cache. Then I ensure that this is 32-bits aligned. Then
I create a big message and hope it will not be fragmented is
particularly small frames. Then I guess on new mask values for each
repetition of the message. Depending on the frame sizes the underlying
implementation uses and the size of the message I need for poisining the
attack I can achieve a particular probablility of success. For a 256
byte message and frame sizes of 16 MiB would get a probability of
1/65535 to actually find the right masking key. And as the websocket
using web-app can have the websocket server to be in on the attack the
web-app can get feedback on the framing size structure the browser is
using to tailor the message sizes. Combining this with that the time it
takes to transfer a few gigabyte of data on the current Internet this is
an attack quite feasible.
On request I have performed a transport directorate review of this
document. As the document is in WG last call I have included any issue I
have found in the document. I will note that I haven't followed the WG.
I assume some issues are highly debated and have gone through consensus
process. If that is the case indicate so and preferably some summary for
my information.
Regarding TSV-Dir reviews you may read more about those here:
http://trac.tools.ietf.org/area/tsv/trac/wiki/TSV-Directorate-Reviews
My Summary: Document not ready for publication multiple issues that
needs to be resolved.
Note, the issues are not ordered in any particular order. Thus small and
big issues may follow each other.
1. I have a question regarding the copyright boiler plate and the
acknowledgment section. If a large number of people contributed text to
the first version provided to IETF, does the WG have ensured that all
contributors of text have provided the rights to IETF, or does WHATWG
have such requirements on contributors that IETF has in fact received
all the copyrights it needs not only to publish a document but also
transfer it other organizations if needed?
2. Section 1.3 and 9. I really don't see how the Sec-WebSocket-Origin
header provides any security without additional mechanisms. Yes, a
trusted browser implementation will set it correctly. However, as non
browser or hacked browser can fake this header the server has no direct
trust in this value. To me it appears that one would need to make
additional bindings to some other contexts to be able to trust the value
of this header. Or am I missing some important part here?
If not, then shouldn't the security considerations and description be
expanded to explain how you can use this header to actually accomplish
some real security?
3. Section 1.3, Page 7,
For this header, the server has to take the value (as present in the
header, e.g. the base64-encoded [RFC4648] version minus leading and
trailing whitespace), and concatenate this with the GUID "258EAFA5-
E914-47DA-95CA-C5AB0DC85B11" in string form, which is unlikely to be
used by network endpoints that do not understand the WebSocket
protocol.
Can you please expand the "GUID" acronym and if this GUID conforms to
any particular format for GUIDs then can you provide a reference to that
format?
4. Section 1.3, Page 7,
"SHA-1 hash, base64-encoded, of this concatenation is
then returned in the server's handshake [FIPS.180-2.2002]."
I would suggest that you indicate how many bits of the hash you uses.
SHA-1 hashes are commonly truncated to shorter forms.
5. Section 1.4:
"The closing handshake is intended to replace the TCP closing
handshake (FIN/ACK), on the basis that the TCP closing handshake is
not always reliable end-to-end, especially in the presence of man-in-
the-middle proxies and other intermediaries."
As the closing handshake doesn't replace the TCP closing, it complements
or augments it to work over intermediaries I would suggest to change the
wording.
6. Section 1.5:
"and re-implements the closing handshake in-band."
Similar to the previous comment. I think "re-implements" is the wrong
word here, complements or something else. You are not in fact changing
the TCP stack, but it does sounds like it.
7. Section 1.9, 10:
I definitely think this specification should create an registry for
protocol identifiers. As the header likely allows for very long strings
there is no risk of every filling the registry so a RFC 5226 policy of
"First come, first served" appears suitable. Then one only needs to
define the amount of information one needs when requesting a
registration. Likely the template can be this simple:
Protocol ID string:
Reference: <if something exist>
Contact Person:
I do think that one should move quite a lot of the discussion in Section
1.9 into a more relevant normative section where the actual header is
defined.
8. Lack of formal specification of the following HTTP headers:
Sec-WebSocket-Protocol, Sec-WebSocket-Origin, Sec-WebSocket-Accept. They
all lack any format definition in the format of ABNF. In general I think
it would be good to actually give each of the HTTP headers its own
section with definition.
9. Section 2.1:
I am missing a reference to UTF-8 and the U+0041 notation. I guess STD63
would do the trick.
10. Section 2.1:
The usage of *bold* _underline_ and /italic/ for different purpose in
the document should be explained. Especially the /italic/ for variable
names is important. Please do remember that a number of people may read
the specification in programs that does not embelish the text where we
get to see the star, the under score and the forward slash.
11. Section 3.
I am missing an introduction to the section where it explains that
websocket defines its own two schemes that are usable in various
environments where one would like to indicate to where to open a websocket.
12. Section 3.
Why isn't the URI scheme definition present in this section? That would
help a lot before going into the parsing rules.
13. Section 3.1:
"These steps return either a /host/, a /port/, a
/resource name/, and a /secure/ flag, or they fail."
This is the first of many usage of "fail" to indicate some abort
procedure. I assume the one in section 7.1.4. However, that does not
seem to be consistent as parsing of websocket URI results inconclusive
results even prior to establishing any connection. I would recommend
that one check all the occurrences of "fail" and see if one can be more
specific on what is supposed to do. And in cases it is a common
procedure, please include a reference to that section that explains that
procedure.
14. Section 3.3 and 10.1 and 10.2:
o The /host/ MUST be ASCII-only (i.e. it MUST have been punycode-
encoded [RFC3492] already if necessary, and MUST NOT contain any
characters above U+007E).
o The /resource name/ string MUST be a non-empty string of
characters in the range U+0021 to U+007E and MUST start with a
U+002F SOLIDUS character (/).
For host is quite clear from the RFC 3986 ABNF that other characters
aren't allowed, is this a real-world problem that force on to make this
clear? Or are there any further limitation than what the RFC 3986 ABNF
has for host? In which case the ABNF in the URI scheme template should
be redefined.
The second one is a stricter than the ABNF in the RFC 3986 as
"path-abempty" allows for empty resource. Thus I think the ABNF in the
templace should be updated to reflect the actuall ABNF for the URI
scheme being defined.
15. section 4.1:
As such, in the absence of extensions negotiated during the opening
handshake (Section 5), all reserved bits MUST be 0 and reserved
opcode values MUST NOT be used.
I would recommend a clarification on what to do with reserved bits. My
preference is to say: MUST be set to 0 and ignored on reception. The
reason for this is that if one starts using them with some extensions
they will be separate from zero but still correctly ignored by a non
upgraded interpreter.
But in fact it may be that the above text should in fact be moved to the
definition of RSV1, RSV2, RSV3.
16. Section 4.2:
Opcode: 4 bits
Defines the interpretation of the payload data
I am missing a table for the value. As far as I understand the only
place they are present are in the ABNF.
17. Section 4.2: Payload length
The length of the payload: if 0-125, that is the payload length.
The text fails to tell in what units the payload length is provided in.
I guess it is bytes but it should be written.
18. Section 4.2: Payload Data, Extension Data and Application Data
all these headings says that they are "n" bytes. But in fact they are
n+m, n and m bytes respectively. Would it make sense to make this
clearer by using different integer variables?
19. The usage of ABNF to define binary protocols.
I don't think ABNF for binary protocols are a particular great idea as
it is difficult to express the field size of individual fields.
20. I have run the ABNF trhoug some parser and my own brain and have
found to following errors:
A.
frame-opcode = %x0 ; continuation frame
/ %x1 ; text frame
/ %x2 ; binary frame
/ %3-7 ; reserved for further non-control frames
Missing "x" before in / %3-7 that should be / %x3-7
B.
frame-masking-key = <4>( %0x00-FF ) ; present only if frame-masked
is 1<CR><LF>
The intention here is to have 4 bytes, but by using <4> you have in fact
made the 4 into a comment rather that part of the rule. It should be:
frame-masking-key = 4( %0x00-FF ) ; present only if frame-masked is 1
21. Section 4.3:
The masking key MUST be derived from a strong source of entropy, and
the masking key for a given frame MUST NOT make it simple for a
server to predict the masking key for a subsequent frame.
This appears to be a suitable place to reference RFC 4086
"Randomness Requirements for Security"
22. Section 4.3:
If I understand the masking correctly it for one particular frame it
uses the same mask over and over again over each 32-bits. As the frame
format supports message of size up to 2^63 I think I can perform a cache
poisining attack anyway. This as I will know the boundaries for where
each 32-bits will be used. Thus I create my message that I want to get
inserted into a cache. Then I ensure that this is 32-bits aligned. Then
I create a big message and hope it will not be fragmented is
particularly small frames. Then I guess on new mask values for each
repetition of the message. Depending on the frame sizes the underlying
implementation uses and the size of the message I need for poisining the
attack I can achieve a particular probablility of success. For a 256
byte message and frame sizes of 16 MiB would get a probability of
1/65535 to actually find the right masking key. And as the websocket
using web-app can have the websocket server to be in on the attack the
web-app can get feedback on the framing size structure the browser is
using to tailor the message sizes. Combining this with that the time it
takes to transfer a few gigabyte of data on the current Internet this is
an attack quite feasible.