[hybi] TSV-Directorate review of draft-ietf-hybi-thewebsocketprotocol-07

Discussion:

Magnus Westerlund

2011-04-29 17:00:48 UTC

Hi,

On request I have performed a transport directorate review of this
document. As the document is in WG last call I have included any issue I
have found in the document. I will note that I haven't followed the WG.
I assume some issues are highly debated and have gone through consensus
process. If that is the case indicate so and preferably some summary for
my information.

Regarding TSV-Dir reviews you may read more about those here:
http://trac.tools.ietf.org/area/tsv/trac/wiki/TSV-Directorate-Reviews

My Summary: Document not ready for publication multiple issues that
needs to be resolved.

Note, the issues are not ordered in any particular order. Thus small and
big issues may follow each other.

1. I have a question regarding the copyright boiler plate and the
acknowledgment section. If a large number of people contributed text to
the first version provided to IETF, does the WG have ensured that all
contributors of text have provided the rights to IETF, or does WHATWG
have such requirements on contributors that IETF has in fact received
all the copyrights it needs not only to publish a document but also
transfer it other organizations if needed?

2. Section 1.3 and 9. I really don't see how the Sec-WebSocket-Origin
header provides any security without additional mechanisms. Yes, a
trusted browser implementation will set it correctly. However, as non
browser or hacked browser can fake this header the server has no direct
trust in this value. To me it appears that one would need to make
additional bindings to some other contexts to be able to trust the value
of this header. Or am I missing some important part here?

If not, then shouldn't the security considerations and description be
expanded to explain how you can use this header to actually accomplish
some real security?

3. Section 1.3, Page 7,
For this header, the server has to take the value (as present in the
header, e.g. the base64-encoded [RFC4648] version minus leading and
trailing whitespace), and concatenate this with the GUID "258EAFA5-
E914-47DA-95CA-C5AB0DC85B11" in string form, which is unlikely to be
used by network endpoints that do not understand the WebSocket
protocol.

Can you please expand the "GUID" acronym and if this GUID conforms to
any particular format for GUIDs then can you provide a reference to that
format?

4. Section 1.3, Page 7,
"SHA-1 hash, base64-encoded, of this concatenation is
then returned in the server's handshake [FIPS.180-2.2002]."

I would suggest that you indicate how many bits of the hash you uses.
SHA-1 hashes are commonly truncated to shorter forms.

5. Section 1.4:
"The closing handshake is intended to replace the TCP closing
handshake (FIN/ACK), on the basis that the TCP closing handshake is
not always reliable end-to-end, especially in the presence of man-in-
the-middle proxies and other intermediaries."

As the closing handshake doesn't replace the TCP closing, it complements
or augments it to work over intermediaries I would suggest to change the
wording.

6. Section 1.5:
"and re-implements the closing handshake in-band."

Similar to the previous comment. I think "re-implements" is the wrong
word here, complements or something else. You are not in fact changing
the TCP stack, but it does sounds like it.

7. Section 1.9, 10:

I definitely think this specification should create an registry for
protocol identifiers. As the header likely allows for very long strings
there is no risk of every filling the registry so a RFC 5226 policy of
"First come, first served" appears suitable. Then one only needs to
define the amount of information one needs when requesting a
registration. Likely the template can be this simple:

Protocol ID string:
Reference: <if something exist>
Contact Person:

I do think that one should move quite a lot of the discussion in Section
1.9 into a more relevant normative section where the actual header is
defined.

8. Lack of formal specification of the following HTTP headers:
Sec-WebSocket-Protocol, Sec-WebSocket-Origin, Sec-WebSocket-Accept. They
all lack any format definition in the format of ABNF. In general I think
it would be good to actually give each of the HTTP headers its own
section with definition.

9. Section 2.1:
I am missing a reference to UTF-8 and the U+0041 notation. I guess STD63
would do the trick.

10. Section 2.1:
The usage of *bold* _underline_ and /italic/ for different purpose in
the document should be explained. Especially the /italic/ for variable
names is important. Please do remember that a number of people may read
the specification in programs that does not embelish the text where we
get to see the star, the under score and the forward slash.

11. Section 3.

I am missing an introduction to the section where it explains that
websocket defines its own two schemes that are usable in various
environments where one would like to indicate to where to open a websocket.

12. Section 3.
Why isn't the URI scheme definition present in this section? That would
help a lot before going into the parsing rules.

13. Section 3.1:
"These steps return either a /host/, a /port/, a
/resource name/, and a /secure/ flag, or they fail."

This is the first of many usage of "fail" to indicate some abort
procedure. I assume the one in section 7.1.4. However, that does not
seem to be consistent as parsing of websocket URI results inconclusive
results even prior to establishing any connection. I would recommend
that one check all the occurrences of "fail" and see if one can be more
specific on what is supposed to do. And in cases it is a common
procedure, please include a reference to that section that explains that
procedure.

14. Section 3.3 and 10.1 and 10.2:

o The /host/ MUST be ASCII-only (i.e. it MUST have been punycode-
encoded [RFC3492] already if necessary, and MUST NOT contain any
characters above U+007E).

o The /resource name/ string MUST be a non-empty string of
characters in the range U+0021 to U+007E and MUST start with a
U+002F SOLIDUS character (/).

For host is quite clear from the RFC 3986 ABNF that other characters
aren't allowed, is this a real-world problem that force on to make this
clear? Or are there any further limitation than what the RFC 3986 ABNF
has for host? In which case the ABNF in the URI scheme template should
be redefined.

The second one is a stricter than the ABNF in the RFC 3986 as
"path-abempty" allows for empty resource. Thus I think the ABNF in the
templace should be updated to reflect the actuall ABNF for the URI
scheme being defined.

15. section 4.1:
As such, in the absence of extensions negotiated during the opening
handshake (Section 5), all reserved bits MUST be 0 and reserved
opcode values MUST NOT be used.

I would recommend a clarification on what to do with reserved bits. My
preference is to say: MUST be set to 0 and ignored on reception. The
reason for this is that if one starts using them with some extensions
they will be separate from zero but still correctly ignored by a non
upgraded interpreter.

But in fact it may be that the above text should in fact be moved to the
definition of RSV1, RSV2, RSV3.

16. Section 4.2:
Opcode: 4 bits

Defines the interpretation of the payload data

I am missing a table for the value. As far as I understand the only
place they are present are in the ABNF.

17. Section 4.2: Payload length
The length of the payload: if 0-125, that is the payload length.

The text fails to tell in what units the payload length is provided in.
I guess it is bytes but it should be written.

18. Section 4.2: Payload Data, Extension Data and Application Data
all these headings says that they are "n" bytes. But in fact they are
n+m, n and m bytes respectively. Would it make sense to make this
clearer by using different integer variables?

19. The usage of ABNF to define binary protocols.
I don't think ABNF for binary protocols are a particular great idea as
it is difficult to express the field size of individual fields.

20. I have run the ABNF trhoug some parser and my own brain and have
found to following errors:

A.
frame-opcode = %x0 ; continuation frame
/ %x1 ; text frame
/ %x2 ; binary frame
/ %3-7 ; reserved for further non-control frames

Missing "x" before in / %3-7 that should be / %x3-7

B.
frame-masking-key = <4>( %0x00-FF ) ; present only if frame-masked
is 1<CR><LF>

The intention here is to have 4 bytes, but by using <4> you have in fact
made the 4 into a comment rather that part of the rule. It should be:
frame-masking-key = 4( %0x00-FF ) ; present only if frame-masked is 1

21. Section 4.3:
The masking key MUST be derived from a strong source of entropy, and
the masking key for a given frame MUST NOT make it simple for a
server to predict the masking key for a subsequent frame.

This appears to be a suitable place to reference RFC 4086
"Randomness Requirements for Security"

22. Section 4.3:
If I understand the masking correctly it for one particular frame it
uses the same mask over and over again over each 32-bits. As the frame
format supports message of size up to 2^63 I think I can perform a cache
poisining attack anyway. This as I will know the boundaries for where
each 32-bits will be used. Thus I create my message that I want to get
inserted into a cache. Then I ensure that this is 32-bits aligned. Then
I create a big message and hope it will not be fragmented is
particularly small frames. Then I guess on new mask values for each
repetition of the message. Depending on the frame sizes the underlying
implementation uses and the size of the message I need for poisining the
attack I can achieve a particular probablility of success. For a 256
byte message and frame sizes of 16 MiB would get a probability of
1/65535 to actually find the right masking key. And as the websocket
using web-app can have the websocket server to be in on the attack the
web-app can get feedback on the framing size structure the browser is
using to tailor the message sizes. Combining this with that the time it
takes to transfer a few gigabyte of data on the current Internet this is
an attack quite feasible.

Willy Tarreau

2011-04-29 18:34:47 UTC

Permalink

Hi Magnus,

thank you for this deep review. I can't spend enough time to comment on
many of your points (in part because many are the results of very long
discussions), but there are a few short ones that can already be easily
addressed.

Post by Magnus Westerlund
2. Section 1.3 and 9. I really don't see how the Sec-WebSocket-Origin
header provides any security without additional mechanisms. Yes, a
trusted browser implementation will set it correctly. However, as non
browser or hacked browser can fake this header the server has no direct
trust in this value. To me it appears that one would need to make
additional bindings to some other contexts to be able to trust the value
of this header. Or am I missing some important part here?

Willy Tarreau

2011-05-02 08:55:34 UTC

Permalink

Hi,

On Mon, May 02, 2011 at 10:09:39AM +0200, Magnus Westerlund wrote:

Takeshi Yoshino

2011-05-12 12:18:51 UTC

Permalink

An endpoint MAY send a Ping message any time after the connection

established and before the connection is closed. NOTE: A ping
message may serve either as a keepalive, or to verify that the

remote

endpoint is still responsive.
So one allows to have multiple outstanding PING messages? That sounds

bit strange over a reliable in-order delivery channel such as TCP. And
if one does allow it should there be any rules for answering pings in
order one receives them?

Indeed, we should possibly remind implementors that it's useless to

send

a new ping as long as one is still pending.

I think there is two possible ways. Allow it and simply include a note
that it is mostly useless. Or make it clear that one is not allowed to
send new pings until one have received the pong.

Good point.

I'm posting my question about Ping/Pong -07 here since it's related to this
topic.

Now an endpoint may ignore some Pings (according to the sentence "In the
case multiple Pings have been received ...") in -07. What is this for? It's
very strange that we have this and the requirement about body "The
message bodies (i.e. both ..." together.

I thought the purpose of echoing back the same body is to allow endpoints to
check if a certain Ping is really acked by Pong with the same body (maybe
sequence number or something). But if an endpoint sends Ping "ABC" and then
sends "DEF" without waiting for Pong for "ABC", Pong for "ABC" may not be
sent by the other peer. This cannot be realized. We need to disallow two or
more pending Ping at the same time as Willy says to address this point, too.

Even with this fix, we still have problem. Now an endpoint can send
unsolicited Pongs (according to the text "A Pong message MAY be sent
unsolicited."). We cannot distinguish unsolicited one from one in reply to
Ping. Here, it's not possible to check if Ping "ABC" is pending or not since
the spec doesn't prohibit the other peer sending unsolicited Pong "ABC". How
should we deal with this?

Takeshi

Greg Wilkins

2011-05-12 13:25:52 UTC

Permalink

unsolicited."). We cannot distinguish unsolicited one from one in reply to
Ping. Here, it's not possible to check if Ping "ABC" is pending or not since
the spec doesn't prohibit the other peer sending unsolicited Pong "ABC". How
should we deal with this?

This is all over a reliable transport, so if we send Ping "ABC", then
receive Pong "XYZ", we know that the pong was unsolicited and not a
corrupted "ABC". Thus we have to keep waiting for the Pong "ABC".
Obviously there is a chance that a unsolicited pong matches a ping,
but some careful design of payloads can easily fix that.

cheers

Takeshi Yoshino

2011-05-12 18:36:58 UTC

Permalink

Post by Greg Wilkins
Obviously there is a chance that a unsolicited pong matches a ping,
but some careful design of payloads can easily fix that.

I see. Coordinating body value assignment so that unsolicited one and reply
one don't conflict each other (e.g. "ABC" and "XYZ" as you said) is one
possible solution.

This should be discussed well when we add some keep alive feature to browser
that sends Ping to the server. If we prepare some API for JavaScript to tell
the browser what string to use for Ping, service designers may coordinate
assignment by themselves. If we don't, we should reserve some string to be
used for Ping/unsolicited Pong by browsers to avoid collision.

----

Isn't it simpler to reserve some string, say "NOOP" (or "\x00" to save
bytes), for Ping body and require endpoints to ignore such Ping rather than
diverting Pong for heartbeat?

Thanks,
Takeshi

Greg Wilkins

2011-05-12 22:29:24 UTC

Permalink

Takeshi,

what I have seen before is the use of sequence numbers where the
client side uses even and the server side uses odd, so can tell by the
seq number in a pong where it was generated and thus tell if it is a
response or unsolicited.

cheers

Post by Takeshi Yoshino

Obviously there is a chance that a unsolicited pong matches a ping,
but some careful design of payloads can easily fix that.

I see. Coordinating body value assignment so that unsolicited one and reply
one don't conflict each other (e.g. "ABC" and "XYZ" as you said) is one
possible solution.
This should be discussed well when we add some keep alive feature to browser
that sends Ping to the server. If we prepare some API for JavaScript to tell
the browser what string to use for Ping, service designers may coordinate
assignment by themselves. If we don't, we should reserve some string to be
used for Ping/unsolicited Pong by browsers to avoid collision.
----
Isn't it simpler to reserve some string, say "NOOP" (or "\x00" to save
bytes), for Ping body and require endpoints to ignore such Ping rather than
diverting Pong for heartbeat?
Thanks,
Takeshi

Magnus Westerlund

2011-05-02 08:09:39 UTC

Permalink

Hi,

Please see inline for comments on your answers. I have dropped the ones
that are fine.

Post by Willy Tarreau
Hi Magnus,
thank you for this deep review. I can't spend enough time to comment on
many of your points (in part because many are the results of very long
discussions), but there are a few short ones that can already be easily
addressed.

Patrick McManus

2011-04-29 20:31:20 UTC

Permalink

Hi Magnus - thanks for all the effort you put forward into reviewing the
draft!

For the purposes of the Sec-WebSocket-Origin header, the untrusted party
is the author of the javascript that is executing inside the client. The
client itself of course has privs to contact the server and fake
whatever it wants, it is trying to figure out if it should extend the
communication privilege to the author of the javascript. This header
lets the client and server collaborate on a CORS-like model to constrain
that 3rd party.

-Patrick

Julian Reschke

2011-04-30 13:44:12 UTC

Permalink

Post by Magnus Westerlund
...
1. I have a question regarding the copyright boiler plate and the
acknowledgment section. If a large number of people contributed text to
the first version provided to IETF, does the WG have ensured that all
contributors of text have provided the rights to IETF, or does WHATWG
have such requirements on contributors that IETF has in fact received
all the copyrights it needs not only to publish a document but also
transfer it other organizations if needed?
...

I think the important question is whether there are parts left are
actually identical to what was once brought into this WG. I'm not sure
about that.

Post by Magnus Westerlund
"Kaazing basically invented the protocol as it stands now." - Ian Hickson, Google & HTML5 Author

So it might be a good idea to check with these guys what they think
about the current status.

Best regards, Julian

Iñaki Baz Castillo

2011-04-30 15:18:43 UTC

Permalink

41. Section 9.
The security consideration section does not discuss client
authentication. Is that because this is supposed to happen in the HTTP
GET with the upgrade headers. there is some reference to cookies etc.
But shouldn't there be more on this? So that a server can ensure the
client identity?

Cookies are retrieved from a web page, so if we assume that the
websocket server will be able to validate the client provided cookie
then we are wrong (the websocket server could be a separate server).

Anyhow, this revision of the draft finally states that:

Any status code other than 101 indicates that the WebSocket handshake
has not completed, and that the semantics of HTTP still apply. The
headers follow the status code.

So the websocket server is free to reply a 401 to ask for HTTP
Basic/Digest authentication.

--
Iñaki Baz Castillo
<***@aliax.net>

Willy Tarreau

2011-04-30 15:21:33 UTC

Permalink

Post by IÃ±aki Baz Castillo

Not only that, but cookies are used by infrastructure components too
in order to maintain persistence (association between a client and a
server).

Regards,
Willy

Magnus Westerlund

2011-05-02 08:14:31 UTC

Permalink

Post by Willy Tarreau

Post by IÃ±aki Baz Castillo

Not only that, but cookies are used by infrastructure components too
in order to maintain persistence (association between a client and a
server).

This is far from my area of real expertise, however I think the
specification are going to need to point to the client authentication
mechanism that can be used for websocket. This as authentication is one
of these security mechanisms that one very commonly need.

Cheers

Magnus Westerlund

----------------------------------------------------------------------
Multimedia Technologies, Ericsson Research EAB/TVM
----------------------------------------------------------------------
Ericsson AB | Phone +46 10 7148287
Färögatan 6 | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: ***@ericsson.com
----------------------------------------------------------------------