Discussion:
Reducing the constraints on the HTTP part of WebSockets
(too old to reply)
Ian Hickson
2010-02-23 01:25:49 UTC
Permalink
I've been going through the e-mail from the past few months trying to
figure out a way to take into account all the feedback. However, I've run
into a problem, and would like some feedback from server-side developers
to determine how to proceed.

First some of the relevant background:

* Currently WebSockets has a weak HTTP-to-WebSocket cross-protocol attack
protection in the form of requiring that the first few bytes of the
connection match a particular set of bytes. This is weak, because the
server can just ignore it (and thus be vulnerable); it would be better to
have something that the server has to do to prove it read the handshake
and to prove that the handshake is received was not something that could
be faked by a client speaking another protocol (e.g. a client doing an XHR
request or a scripted <form> post).

* It has been suggested that the limitations on the order of headers
causes serious problems with existing HTTP stacks.

* It has also been suggested that using GET with an Upgrade is harder to
implement on existing HTTP stacks than CONNECT would be.

Given this, I've been examining what we could do by, for instance, adding
some unpredictable values to the headers that the server must process in
order to prove that the handshake was read, as well as considering what
the results would be of using CONNECT instead of Upgrade:, and of moving
the bulk of the handshake into the post-HTTP part of the connection rather
than in the HTTP headers themselves.

Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.

For example, if we make the start of the client-to-server handshake be:

CONNECT host:port HTTP/1.1 [CRLF]
Sec-WebSocket-Unique-Id: 123456789 [CRLF]
[CRLF]
etc...

...where other headers are ignored (allowing intermediaries to fiddle with
the headers as they might), then amateur server-side implementors are very
likely to just scan for the string "Sec-WebSocket-Unique-Id", making them
vulnerable to something like:

GET /Sec-WebSocket-Unique-Id:123456789 HTTP/1.1 [CRLF]
etc...

...or some such -- there are plenty of ways to insert data into the
headers, even if you can't actually insert the header itself.

If we can't find a solution for this, then too bad, but I'd really like to
find a solution that is safe. The requirement is basically that the
unpredictable part come before the first bit of an HTTP connection that
the attacker has any control over, e.g. the method:

WS123456789 host:port HTTP/1.1 [CRLF]
[CRLF]
etc...

Here, the method actually contains the unpredictable key. The idea here is
that the key starts before the first character of a GET's path, so there's
nothing the attacker can do to insert the key in the relevant part of the
payload. We can then make the processing that the server must do with this
be something that makes no sense with text and cannot be performed with a
zero (e.g. convert it to a number and then perform a numeric operation on
the number, like divide another number by this number), which would ensure
that the server really checks that it got the handshake before continuing.

This might work, but it leads to the aforementioned question: Can existing
server-side HTTP stacks handle this like a CONNECT easily, or is this line
of thought a non-starter? Any feedback on this would be most welcome.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Greg Wilkins
2010-02-23 02:59:38 UTC
Permalink
Ian,

thanks for looking at this issue.... comments inline.
Post by Ian Hickson
I've been going through the e-mail from the past few months trying to
figure out a way to take into account all the feedback. However, I've run
into a problem, and would like some feedback from server-side developers
to determine how to proceed.
* Currently WebSockets has a weak HTTP-to-WebSocket cross-protocol attack
protection in the form of requiring that the first few bytes of the
connection match a particular set of bytes. This is weak, because the
server can just ignore it (and thus be vulnerable); it would be better to
have something that the server has to do to prove it read the handshake
and to prove that the handshake is received was not something that could
be faked by a client speaking another protocol (e.g. a client doing an XHR
request or a scripted <form> post).
Isn't sending a 101 sufficient evidence that the server has received
the upgrade request and processed it correctly. Sending 101 is
not something that many servers do currently, so it is a moderately
strong indication that a server is happy to accept websocket (or
is seriously vulnerable to a injection attack, such that the
status code response can be attacked).

I think 101 is a good indication and gives moderate protection.
Having said that- I'm not opposed to the unique idea previously
suggested (see below).
Post by Ian Hickson
* It has been suggested that the limitations on the order of headers
causes serious problems with existing HTTP stacks.
"serious problems" is perhaps overstating it. I would say that the
ordering limitation requires changes to code that is entirely unrelated
to websocket and that may even be out of the control of the websocket
implementers. Thus it causes serious jurisdictional problems for projects
and products rather than any insurmountable technical problem. But
these are still significant problems.
Post by Ian Hickson
* It has also been suggested that using GET with an Upgrade is harder to
implement on existing HTTP stacks than CONNECT would be.
Where did this suggestion come from?

CONNECT is a method that is intended to be used for opening proxied
connections. Servers will either not support it or have it wired up
to some proxy code. Subverting this for a protocol upgrade is
highly undesirable.

The GET with upgrade header is very much the appropriate mechanism to
use. Servers will either already support Upgrade - in which case ws
should be easy, or they will not - in which case they can simply
implement the mechanism as specified by RFC2616.

To use CONNECT for purposes other than proxying would certainly
cause me a lot of grief. I'd have to put in websocket specific
code into the *MANY* smart proxies that have been implemented on
top of Jetty. This is seriously difficult!

For me, subverting the standard handling of CONNECT is more serious
transgression of the RFC than requiring a HTTP Header order.
Post by Ian Hickson
Given this, I've been examining what we could do by, for instance, adding
some unpredictable values to the headers that the server must process in
order to prove that the handshake was read, as well as considering what
the results would be of using CONNECT instead of Upgrade:, and of moving
the bulk of the handshake into the post-HTTP part of the connection rather
than in the HTTP headers themselves.
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
I don't see why any radical change is needed.

The GET with Upgrade is in HTTP precisely to handle upgrades to new
protocols like websocket. Why not just use it as specified in RFC2616
and not place any additional restrictions on header ordering or exact
bytes for response lines and headers.

If there is some security vulnerability in the Upgrade mechanism,
then it really should be taken up with HTTPbis so that it can be
closed generally and not just specifically for websocket.

In the short term, the nonce idea does appear to give an reasonable
level of protection. The browser can insert a unique ID
in the upgrade request:

GET /some/url HTTP/1.1
Host: injectableBunny.com:8080
Sec-Websocket-Id: 1234567890

then expected the upgrade response to contain the Id:

HTTP/1.1 101 Switch Protocols
Sec-Websocket-Id: 1234567890

Any attacking javascript code cannot know the unique ID added
by the browser, so even if it was able to fashion a URL that
injected a fake 101 response, it would not know what ID
to inject.

How is this vulnerable?

In order to succeed, the attacker must be able to:

1) open a websocket with a URL that contains the entire injection.
2) The server must be vulnerable to injection SO BADLY that the
101 status code can be injected at the start of the message
and the reason (if that is to be checked).
3) The attacker must know the unique ID to inject into the response,

1) is feasible.
2) is incredibly unlikely - and if such an injectable server existed, then
browsers would probably be vulnerable to all sorts of other attacks via it.
3) is *impossible*. The ID has not even been generated when the URL is
formed. Only if the ID was predictable could this happen - and now we
are talking about multiple significant vulnerabilities.


regards
Jamie Lokier
2010-02-27 20:42:49 UTC
Permalink
Post by Greg Wilkins
CONNECT is a method that is intended to be used for opening proxied
connections. Servers will either not support it or have it wired up
to some proxy code. Subverting this for a protocol upgrade is
highly undesirable.
The GET with upgrade header is very much the appropriate mechanism to
use. Servers will either already support Upgrade - in which case ws
should be easy, or they will not - in which case they can simply
implement the mechanism as specified by RFC2616.
To use CONNECT for purposes other than proxying would certainly
cause me a lot of grief. I'd have to put in websocket specific
code into the *MANY* smart proxies that have been implemented on
top of Jetty. This is seriously difficult!
I suspect the main difficulty with using GET is that it's not been
used before (has it?), and so I expect many deployed (unchangeable)
HTTP proxies and a fair number of servers to simply break on seeing
it, by wrongly using the same framing a a normal GET request.

Whereas CONNECT is widely used and implemented properly almost
everywhere. (I have still seen servers which don't, and treat it as
having the same framing as other methods, though!).
Post by Greg Wilkins
For me, subverting the standard handling of CONNECT is more serious
transgression of the RFC than requiring a HTTP Header order.
It's not subversion if you think of it as CONNECTing via the HTTP
server to a separate WebSocket server. I have other services which do
this, e.g. for serving rsync on the same port as HTTP. It works and
is used in the standard way, by specifying the target to CONNECT to.

-- Jamie
Greg Wilkins
2010-02-28 06:36:25 UTC
Permalink
Jamie,

I'm not opposed to the use of CONNECT to bypass local
proxies.

ie CONNECT is used to open a connection and then
a GET with Upgrade is sent over the resulting connection.

What I don't like is the use of CONNECT as a replacement
for GET with Upgrade.

Ie I don't want to see a CONNECT issued to get past
a local firewall, then on that connection a CONNECT
issued to establish the websocket connection. The
problem with this is that the second connect may
actually be handled as a CONNECT by intermediaries at
the server end.

I don't think firewalls and loadbalancers are any
more able to handle incoming CONNECT methods than
they are to handle incoming Upgrades. Both may
need updates, but I think CONNECT has a greater
chance of being miss-handled by existing
intermediaries and servers.

cheers
Post by Jamie Lokier
Post by Greg Wilkins
CONNECT is a method that is intended to be used for opening proxied
connections. Servers will either not support it or have it wired up
to some proxy code. Subverting this for a protocol upgrade is
highly undesirable.
The GET with upgrade header is very much the appropriate mechanism to
use. Servers will either already support Upgrade - in which case ws
should be easy, or they will not - in which case they can simply
implement the mechanism as specified by RFC2616.
To use CONNECT for purposes other than proxying would certainly
cause me a lot of grief. I'd have to put in websocket specific
code into the *MANY* smart proxies that have been implemented on
top of Jetty. This is seriously difficult!
I suspect the main difficulty with using GET is that it's not been
used before (has it?), and so I expect many deployed (unchangeable)
HTTP proxies and a fair number of servers to simply break on seeing
it, by wrongly using the same framing a a normal GET request.
Whereas CONNECT is widely used and implemented properly almost
everywhere. (I have still seen servers which don't, and treat it as
having the same framing as other methods, though!).
Post by Greg Wilkins
For me, subverting the standard handling of CONNECT is more serious
transgression of the RFC than requiring a HTTP Header order.
It's not subversion if you think of it as CONNECTing via the HTTP
server to a separate WebSocket server. I have other services which do
this, e.g. for serving rsync on the same port as HTTP. It works and
is used in the standard way, by specifying the target to CONNECT to.
-- Jamie
Maciej Stachowiak
2010-02-28 06:42:53 UTC
Permalink
Post by Greg Wilkins
Jamie,
I'm not opposed to the use of CONNECT to bypass local
proxies.
ie CONNECT is used to open a connection and then
a GET with Upgrade is sent over the resulting connection.
What I don't like is the use of CONNECT as a replacement
for GET with Upgrade.
Ie I don't want to see a CONNECT issued to get past
a local firewall, then on that connection a CONNECT
issued to establish the websocket connection. The
problem with this is that the second connect may
actually be handled as a CONNECT by intermediaries at
the server end.
I don't think firewalls and loadbalancers are any
more able to handle incoming CONNECT methods than
they are to handle incoming Upgrades. Both may
need updates, but I think CONNECT has a greater
chance of being miss-handled by existing
intermediaries and servers.
I think Ian's hypothesis is that issuing a CONNECT instead of a GET
with Upgrade is more likely to cleanly fail with unaware
intermediaries than GET with Upgrade; he's mentioned having some data
that the latter sometimes appears to sort of work but then fails
mysteriously.

Perhaps there is some data gathering we can do to see which works
better, or if there is yet another approach. It seems to me like this
is a question best answered by data, not a priori reasoning.

Regards,
Maciej
Justin Erenkrantz
2010-02-28 07:49:18 UTC
Permalink
Perhaps there is some data gathering we can do to see which works better, or
if there is yet another approach. It seems to me like this is a question
best answered by data, not a priori reasoning.
I can tell you that the design headaches for httpd introduced by
relying solely upon CONNECT rather than */Upgrade are going to be
pretty brutal. CONNECT is handled by mod_proxy_connect and mod_proxy.
It's simply not a part of the core and trying to 'override' the
meaning of CONNECT in this way is just odd. CONNECT means "I want to
talk to another TCP port and I want to use you as an intermediary" not
"I want to switch protocols on the same port."

Again, I think it is crucial to adopt a requirement that, until the
upgrade is completed, we do not impose additional restrictions or
meaning not present in any HTTP specifications or interoperability
guidelines. -- justin
Maciej Stachowiak
2010-02-28 18:20:12 UTC
Permalink
Post by Justin Erenkrantz
Perhaps there is some data gathering we can do to see which works better, or
if there is yet another approach. It seems to me like this is a question
best answered by data, not a priori reasoning.
I can tell you that the design headaches for httpd introduced by
relying solely upon CONNECT rather than */Upgrade are going to be
pretty brutal. CONNECT is handled by mod_proxy_connect and mod_proxy.
It's simply not a part of the core and trying to 'override' the
meaning of CONNECT in this way is just odd. CONNECT means "I want to
talk to another TCP port and I want to use you as an intermediary" not
"I want to switch protocols on the same port."
I'm not very familiar with the Web server you're working with (Apache
I assume?). But I would assume that if something is not part of the
core, but rather handled by a pluggable module, then it would be
easier, not harder, to handle it differently. I would assume you could
substitute a different module.

Regards,
Maciej
Justin Erenkrantz
2010-02-28 18:30:35 UTC
Permalink
I'm not very familiar with the Web server you're working with (Apache I
assume?).
Yes, Apache HTTP Server.
But I would assume that if something is not part of the core, but
rather handled by a pluggable module, then it would be easier, not harder,
to handle it differently. I would assume you could substitute a different
module.
No, it's not. The core protocols (like HTTP, SMTP, NNTP, FTP, etc.)
are not really governed by a module, but rather by input/output
filters. (It's technically sort of a module, but those aren't
"modules" in the sense of mod_python, PHP, CGI, etc.) Apache (like
most HTTP servers I have dealt with) abstracts out the wire-level
protocol - so handlers (most 3rd party modules) do not need to know
about the details of HTTP. But, you can't really chain "handlers"
together...so redefining CONNECT to either be a passthrough or an
internal protocol switch is just weird. So, handling a WS upgrade as
a CONNECT - but to do so as a *loopback* - is at completely the
wrong-level of abstraction from an HTTP server architectural
perspective. By the time httpd gets into processing CONNECT, we've
already entered the codepath for proxy-land - not changing
protocols-land.

I suspect other HTTP servers may have similar design clashes here, but
I'm sure those developers can speak up about that. -- justin
Jamie Lokier
2010-03-01 03:23:20 UTC
Permalink
Post by Justin Erenkrantz
But I would assume that if something is not part of the core, but
rather handled by a pluggable module, then it would be easier, not harder,
to handle it differently. I would assume you could substitute a different
module.
[regarding Apache http server]
Post by Justin Erenkrantz
No, it's not. The core protocols (like HTTP, SMTP, NNTP, FTP, etc.)
are not really governed by a module, but rather by input/output
filters. (It's technically sort of a module, but those aren't
"modules" in the sense of mod_python, PHP, CGI, etc.) Apache (like
most HTTP servers I have dealt with) abstracts out the wire-level
protocol - so handlers (most 3rd party modules) do not need to know
about the details of HTTP. But, you can't really chain "handlers"
together...so redefining CONNECT to either be a passthrough or an
internal protocol switch is just weird. So, handling a WS upgrade as
a CONNECT - but to do so as a *loopback* - is at completely the
wrong-level of abstraction from an HTTP server architectural
perspective. By the time httpd gets into processing CONNECT, we've
already entered the codepath for proxy-land - not changing
protocols-land.
I suspect other HTTP servers may have similar design clashes here, but
I'm sure those developers can speak up about that.
All true, no doubt, but doesn't GET + Upgrade: meet the same design clash?

A new HTTP method (plus Upgrade header) seems less likely to tickle
browser and (unfortunately) proxy issues to me than using GET.

-- Jamie
Greg Wilkins
2010-03-01 08:24:53 UTC
Permalink
Post by Jamie Lokier
All true, no doubt, but doesn't GET + Upgrade: meet the same design clash?
A new HTTP method (plus Upgrade header) seems less likely to tickle
browser and (unfortunately) proxy issues to me than using GET.
I think the point is that the handling should be tied to the Upgrade
header rather than adding new meanings to existing methods.

I don't have a big problem with having a new method, other than to
note that I've seen many implementations break with unexpected
method names. Even Servlet2.5 security constraints are constrained
by the XSLT to only work with standard method names.

But please don't reuse CONNECT in a new/strange way.

The problem with using CONNECT, is that it is more often than
not handled completely differently to GET, POST, PUT, HEAD.

The later methods are not normally used to switch handling,
instead the URL provided with them is used to pick a handler,
which then will interpret the method. So the URL can
be used to route the request to a handler that will
know websocket is supported at that URL and it can handle
the Upgrade header (and perhaps check the method).


But with CONNECT, it is the method that determines the handling
and thus it is often handled differently in a server. There
is no URL to select a handler - which may interpret websocket
stuff.

It's the wrong verb to be using.

Perhaps POST is the better method to use - as at least that
will not be retried by HTTP layers.

cheers
Jamie Lokier
2010-03-01 08:41:45 UTC
Permalink
Post by Greg Wilkins
The problem with using CONNECT, is that it is more often than
not handled completely differently to GET, POST, PUT, HEAD.
The later methods are not normally used to switch handling,
instead the URL provided with them is used to pick a handler,
which then will interpret the method. So the URL can
be used to route the request to a handler that will
know websocket is supported at that URL and it can handle
the Upgrade header (and perhaps check the method).
A request handler is usually different from the component which does
transport framing, i.e. chunking the response body, parsing the
request body according to Content-Length.

For example with CGI handlers invoked from Apache httpd, they cannot
process Upgrade: can they? For non-Upgrade requests, Apache itself
will be chunking the response body output from the CGI handler, and
looking for the end of the request body if the header indicate one. I
would be very surprised if Apache works with Upgrade: and a CGI
handler the way we would need it to - and the same for most other servers.

More concerning are proxies - those which don't mangle headers enough
to block the connection, but are still looking for HTTP message
framing and don't check for the Upgrade: header - because that header
has never been used before (has it?). Or if they do, it's a code path
that nobody ever tested before.
Post by Greg Wilkins
But with CONNECT, it is the method that determines the handling
and thus it is often handled differently in a server. There
is no URL to select a handler - which may interpret websocket
stuff.
It's the wrong verb to be using.
I agree that it's the wrong verb from a HTTP perspective.

I just wonder which choice is likely to trigger bugs - especially I'm
concerned about proxies passing the handshake and mangling subsequent
data.
Post by Greg Wilkins
Perhaps POST is the better method to use - as at least that
will not be retried by HTTP layers.
A few years ago, I did some tests sending a POST and reading back the
response body at the same time as sending the request body. By not
specifying a length, or by making it large, you could actually get a
two-way connection. This was without an Upgrade: header.
Unfortunately it wasn't reliable in all clients that I tried. But I
didn't notice any problem with proxies or servers (not that I tried
many).

-- Jamie
Greg Wilkins
2010-03-01 10:01:25 UTC
Permalink
Post by Jamie Lokier
Post by Greg Wilkins
The problem with using CONNECT, is that it is more often than
not handled completely differently to GET, POST, PUT, HEAD.
The later methods are not normally used to switch handling,
instead the URL provided with them is used to pick a handler,
which then will interpret the method. So the URL can
be used to route the request to a handler that will
know websocket is supported at that URL and it can handle
the Upgrade header (and perhaps check the method).
A request handler is usually different from the component which does
transport framing, i.e. chunking the response body, parsing the
request body according to Content-Length.
In the jetty handling of websocket, we treat the Upgrade request
like any other request and route it via authentication,
authorization, filters eventually to a servlet.

It is only within the servlet context, that a servlet
has all the information required to instantiate a websocket
endpoint (which is an application entity):

+ an authenticated/authorized HTTP upgrade request
+ the application context initialized (with applicable
class loaders and JEE security realms etc)
+ other application objects and data created as part
of the prior interaction with the website.
+ the sources of the server generated events that
are to be sent (eg JMS endpoints).


So the servlet can analyse the request, decide if it wants to
accept the update and if so it creates the WebSocket endpoint
and sends a 101 response.

The 101 response goes back out via the normal servlet container
path to the HTTP engine and is logged and flushed normally.

This flushing may even be held up by a busy IO layer
and completed asynchronously at a later time.
It is only when the 101 response is completely flushed that
the jetty server triggers it's upgrade handling and looks
for the new connection handler (the WebSocket endpoint
created by the application in this case).


For a java web server, this is entirely the correct approach.
The upgrade request is a HTTP request and should be
handled as such.

The endpoint that we wish to connect the websocket with
is an application entity - and the Jetty HTTP layer has
no idea how to create such an application end point (and
no access to the classloaders and other contexts needed
to do so).

Of course other servers may have entirely different
application architectures.

But the point that I'm making, is that at websocket
upgrade request is a HTTP request and it is not unreasonable
for a server to handle it as such- and only start
websocket processing after the HTTP 101 response is
sent.

Any requirement to handle HTTP upgrade in unusual
non-normal ways, is going to cause grief to containers
such as Jetty - who have a reasonable desire to
handle HTTP as HTTP, up until the 101 is sent.
Post by Jamie Lokier
For example with CGI handlers invoked from Apache httpd, they cannot
process Upgrade: can they?
They certainly can - and they can indicate their acceptance by
sending a 101. It is what happens after the 101 that is tricky.
Post by Jamie Lokier
For non-Upgrade requests, Apache itself
will be chunking the response body output from the CGI handler, and
looking for the end of the request body if the header indicate one. I
would be very surprised if Apache works with Upgrade: and a CGI
handler the way we would need it to - and the same for most other servers.
Obviously apache will need to be updated to respect 101 responses and
allow the connection to be handled differently AFTER the 101 response
is sent.
Post by Jamie Lokier
More concerning are proxies - those which don't mangle headers enough
to block the connection, but are still looking for HTTP message
framing and don't check for the Upgrade: header - because that header
has never been used before (has it?). Or if they do, it's a code path
that nobody ever tested before.
Upgrade is not well utilized - but I expect we will see more of it
in the future - eg to Upgrade http to https Also things like SPDY
are using this approach.

So Jetty (for example), had half hearted upgrade support and now
we have good upgrade support. We can use this for upgrade
to SSL, SPDY, BWTP etc. etc.

I expect many other web infrastructure will be the same. Once
they support Upgrade for one of these reasons, then they will
utilize that support for the other use-cases.

I REALLY REALLY REALLY REALLY don't want to have to implement
something special for every new use-case.
Post by Jamie Lokier
Post by Greg Wilkins
But with CONNECT, it is the method that determines the handling
and thus it is often handled differently in a server. There
is no URL to select a handler - which may interpret websocket
stuff.
It's the wrong verb to be using.
I agree that it's the wrong verb from a HTTP perspective.
I just wonder which choice is likely to trigger bugs - especially I'm
concerned about proxies passing the handshake and mangling subsequent
data.
I'm a bit cautious about GET - because it can be retried.
Using something like POST would probably be better.

Using something like WSTP would probably trigger bugs...
but maybe only in the cases that we want fail fast?

I believe that using CONNECT will also generate bugs
and incorrectly trigger proxy actions. It will also
generate a long series of complaints from me :)


cheers
Maciej Stachowiak
2010-03-01 10:39:47 UTC
Permalink
Post by Greg Wilkins
Post by Jamie Lokier
Post by Greg Wilkins
But with CONNECT, it is the method that determines the handling
and thus it is often handled differently in a server. There
is no URL to select a handler - which may interpret websocket
stuff.
It's the wrong verb to be using.
I agree that it's the wrong verb from a HTTP perspective.
I just wonder which choice is likely to trigger bugs - especially I'm
concerned about proxies passing the handshake and mangling subsequent
data.
I'm a bit cautious about GET - because it can be retried.
Using something like POST would probably be better.
Using something like WSTP would probably trigger bugs...
but maybe only in the cases that we want fail fast?
I believe that using CONNECT will also generate bugs
and incorrectly trigger proxy actions.
Let's be agnostic about it in the requirements document (I hope we
have rough consensus on that), and determine which approach generates
more bugs using empirical methods for purposes of the spec.
Post by Greg Wilkins
It will also generate a long series of complaints from me :)
At this point, I'm not sure there is a plausible design for the
WebSocket protocol that avoids this problem.

Regards,
Maciej
Greg Wilkins
2010-03-01 11:48:30 UTC
Permalink
Post by Greg Wilkins
I believe that using CONNECT will also generate bugs
and incorrectly trigger proxy actions.
Let's be agnostic about it in the requirements document (I hope we have
rough consensus on that), and determine which approach generates more
bugs using empirical methods for purposes of the spec.
If we agree on the requirement text that says something
along the lines of "possible and practical to use existing HTTP
code", then I think that captures the concerns
about the use of CONNECT.
Post by Greg Wilkins
It will also generate a long series of complaints from me :)
At this point, I'm not sure there is a plausible design for the
WebSocket protocol that avoids this problem.
Gee - no smiley?

Note, in this case I'm arguing in support of the current design
of websocket.


cheers
Scott Ferguson
2010-03-01 17:19:47 UTC
Permalink
Post by Greg Wilkins
In the jetty handling of websocket, we treat the Upgrade request
like any other request and route it via authentication,
authorization, filters eventually to a servlet.
...
Post by Greg Wilkins
For a java web server, this is entirely the correct approach.
The upgrade request is a HTTP request and should be
handled as such.
Resin's approach is essentially the same. The initial request is a
normal Java servlet request, letting applications use their full
framework/security/etc infrastructure to handle/dispatch the request. We
upgrade the request based on the application servlet's direction.

Once the web socket handler is launched the processing is different, of
course. But by that point, web sockets just looks like an
<InputStream,OutputStream> pair with associated threading/event issues.

-- Scott
Justin Erenkrantz
2010-03-01 18:45:07 UTC
Permalink
Post by Greg Wilkins
In the jetty handling of websocket, we treat the Upgrade request
like any other request and route it via authentication,
authorization, filters eventually to a servlet.
...
Post by Greg Wilkins
For a java web server, this is entirely the correct approach.
The upgrade request is a HTTP request and should be
handled as such.
Resin's approach is essentially the same. The initial request is a normal
Java servlet request, letting applications use their full
framework/security/etc infrastructure to handle/dispatch the request. We
upgrade the request based on the application servlet's direction.
Once the web socket handler is launched the processing is different, of
course. But by that point, web sockets just looks like an
<InputStream,OutputStream> pair with associated threading/event issues.
By and large, even though it's implemented in C, I would expect Apache
httpd to be similar - once the 101 is accepted and sent by the server,
all HTTP filters would be flushed and replaced with WebSocket filters
- which would abstract out the framing and other
useless-to-application bits. -- justin
Justin Erenkrantz
2010-02-28 07:42:21 UTC
Permalink
Post by Greg Wilkins
What I don't like is the use of CONNECT as a replacement
for GET with Upgrade.
+1.

My concern here is that CONNECT is typically implemented as a proxy
hand-off to a raw TCP connection on another server. I would envision
that most servers would want to handle the new protocol directly
rather than act as a dumb proxy. So, the code paths on the servers
that would handle GET/Upgrade versus CONNECT are likely to be
completely different and orthogonal here. -- justin
Pieter Hintjens
2010-02-23 07:03:52 UTC
Permalink
Post by Ian Hickson
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
Ian, I don't understand what you're saying here.

Are you saying that HTTP is insecure? (I assume that HTTP is
compatible with HTTP, and fits HTTP stacks).

If there is an insecurity in HTTP, I'd think that is something that
should be demonstrated and proven, and then fixed generally.

Or are you saying that HTTP cannot be fixed, should not be fixed, or
simply won't be fixed?

Could you clarify this please, because your proposal to take over
CONNECT seems to rest on a major assumption that you've not quite
stated.

Thanks,
-
Pieter
Tim Bray
2010-02-23 07:12:12 UTC
Permalink
Post by Ian Hickson
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
Sorry, I'm missing something. Could you outline the nature of the
attack that's worrying you in a little more detail? I assume TLS is
in effect, and my impression is that dealing with various flavors of
man-in-the-middle attach in the context of HTTP/TLS is something
that's reasonably well-understood.

I tend to agree with Greg Wilkins that GET + Upgrade feels like a
better fit than CONNECT with the strata of specs & software that are
already in place. -T
Ian Hickson
2010-02-23 07:18:31 UTC
Permalink
Post by Tim Bray
Post by Ian Hickson
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
Sorry, I'm missing something. Could you outline the nature of the
attack that's worrying you in a little more detail?
I was referring to the cross-protocol attacks described by Maciej in an
e-mail earlier this month:

http://www.ietf.org/mail-archive/web/hybi/current/msg01198.html
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Greg Wilkins
2010-02-23 07:32:40 UTC
Permalink
Post by Ian Hickson
Post by Tim Bray
Post by Ian Hickson
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
Sorry, I'm missing something. Could you outline the nature of the
attack that's worrying you in a little more detail?
I was referring to the cross-protocol attacks described by Maciej in an
http://www.ietf.org/mail-archive/web/hybi/current/msg01198.html
Ian,

there has been a moderately warm response to the suggestion made
in that post for a nonce (or ID or similar).

So I don't see why there is need to move away from the standard
Upgrade mechanism. Indeed that conversation was motivated by
the suggestion that we should make the Upgrade request/response
a totally standard upgrade conversation, without limitations
on header ordering or response reason strings.

I believe the nonce or unique ID proposal completely handles
attack 1) in Maciej's email because an attacker cannot know
what the nonce or ID is, so they cannot form an injecting
websocket request.

I do not believe that 2) is actually a security issue.
It just says that you should authenticate and authorize
requests/connections before doing anything with them that
you might regret.

Because the websocket is almost standard HTTP, it can
already take advantage of many existing authentication
mechanisms: BASIC, DIGEST, OPENID, OAUTH etc.
This is a huge benefit of using standard HTTP for the
handshake.


regards
Maciej Stachowiak
2010-02-23 07:54:10 UTC
Permalink
Post by Greg Wilkins
Post by Ian Hickson
Post by Tim Bray
Post by Ian Hickson
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
Sorry, I'm missing something. Could you outline the nature of the
attack that's worrying you in a little more detail?
I was referring to the cross-protocol attacks described by Maciej in an
http://www.ietf.org/mail-archive/web/hybi/current/msg01198.html
Ian,
there has been a moderately warm response to the suggestion made
in that post for a nonce (or ID or similar).
Ian and I asked a security expert to review that proposal and he
suggested a more robust handshake protocol which ended up looking
nothing at all like HTTP. I believe Ian is trying to find a happy
medium that is more robust security-wise. I'll ask him if we can post
his exact proposal, but it's probably not suitable for use as-is
because it completely breaks any semblance of HTTP compatibility.

I suspect besides the security issues, the existing handshake might
not be good enough at failing hard when sent through unaware proxies.
Post by Greg Wilkins
So I don't see why there is need to move away from the standard
Upgrade mechanism. Indeed that conversation was motivated by
the suggestion that we should make the Upgrade request/response
a totally standard upgrade conversation, without limitations
on header ordering or response reason strings.
I believe the nonce or unique ID proposal completely handles
attack 1) in Maciej's email because an attacker cannot know
what the nonce or ID is, so they cannot form an injecting
websocket request.
I do not believe that 2) is actually a security issue.
It just says that you should authenticate and authorize
requests/connections before doing anything with them that
you might regret.
It is definitely a security issue. It's likely to be a more serious
issue in practice than (1). The spec says you do not need to fully
check the handshake for correctness. If you look only at the
credential-bearing parts of the handshake request, then you can be
exploited via XMLHttpRequest from a browser.
Post by Greg Wilkins
Because the websocket is almost standard HTTP, it can
already take advantage of many existing authentication
mechanisms: BASIC, DIGEST, OPENID, OAUTH etc.
This is a huge benefit of using standard HTTP for the
handshake.
These won't necessarily help. The threat model is that client JS in
the browser uses cross-site XHR to send you an HTTP request with
convincing-looking headers (possibly including full credentials) and a
body that consists of well-formed WebSocket messages. If the server
does not check correctness of the handshake, then an attacker can
violate integrity by connecting with XHR, when an attempt to connect
with WebSocket API would have been denied due to client-side checks.
WebSocket client-side denial would happen before sending any messages,
but XHR wouldn't work that way.

At minimum: the spec needs to require checking correctness of the
handshake, if the server accepts any incoming messages and may perform
a side effect in response.

Better: the spec should *always* require that servers check well-
formedness of the client handshake, because the potential security
risk outweighs the minor benefit from slightly simplifying things for
a WebSocket server that only ever sends outgoing messages and does not
read incoming ones.

Ideally: the handshake should be designed so that generating the
correct handshake response essentially requires you to check handshake
correctness, or at the very least sets things up so that is the server
developer's path of least resistance. That is what we are going for
with the design.

I note that the risk Ian raised can be mitigated even for regexp-based
processing by including a start-of-line assertion in one's regexp.
Putting the nonce in the method name doesn't seem to materially help
because if you can't be bothered to check start-of-line, how can we be
sure you'd check start-of-transmission?

Regards,
Maciej
Greg Wilkins
2010-02-23 13:32:09 UTC
Permalink
Post by Maciej Stachowiak
Ian and I asked a security expert to review that proposal and he
suggested a more robust handshake protocol which ended up looking
nothing at all like HTTP. I believe Ian is trying to find a happy medium
that is more robust security-wise. I'll ask him if we can post his exact
proposal, but it's probably not suitable for use as-is because it
completely breaks any semblance of HTTP compatibility.
Rather that his proposal, I'd really like to see his analysis of what
the vulnerability of the unique-id proposal is.

I simply do not understand how an attacker can inject a nonce
that has not yet been generated.

regards
Greg Wilkins
2010-02-23 23:02:45 UTC
Permalink
Post by Maciej Stachowiak
Post by Greg Wilkins
I do not believe that 2) is actually a security issue.
It just says that you should authenticate and authorize
requests/connections before doing anything with them that
you might regret.
It is definitely a security issue. It's likely to be a more serious
issue in practice than (1). The spec says you do not need to fully check
the handshake for correctness. If you look only at the
credential-bearing parts of the handshake request, then you can be
exploited via XMLHttpRequest from a browser.
The more I think about this one, the more I don't understand the
problem.

Hostile javascript code can fake a websocket server using XMLHttpRequest.

But why would it do that? why would it just not use the real websocket
API. It's being shipped in Chrome now, and will soon be in FF and opera
then eventually the majority browsers.

So if a server accepts a legal looking websocket handshake and message
and then proceeds to do something without sufficient authorization, then
that is just a problem.... and using XMLHttpRequest does not give any
attack vectors that can be achieved just by using the websocket API.

Why make a request that looks like a websocket request, when you have
the API to send a real websocket request????


regards
Jamie Lokier
2010-02-27 20:37:03 UTC
Permalink
Post by Greg Wilkins
Hostile javascript code can fake a websocket server using XMLHttpRequest.
But why would it do that? why would it just not use the real websocket
API. It's being shipped in Chrome now, and will soon be in FF and opera
then eventually the majority browsers.
Because with XHR the browser still thinks it's a HTTP connection, but
the server does not, so the browser will reuse the connection for
subsequent GET requests to the same proxy address (including
server-cluster proxies). That allows fake responses to unrelated
subsequent requests, as well as collecting the request data such as cookies.

-- Jamie
Ian Hickson
2010-02-25 01:34:33 UTC
Permalink
Post by Maciej Stachowiak
I note that the risk Ian raised can be mitigated even for regexp-based
processing by including a start-of-line assertion in one's regexp.
Putting the nonce in the method name doesn't seem to materially help
because if you can't be bothered to check start-of-line, how can we be
sure you'd check start-of-transmission?
If we can make the entire method be unpredictable (yet still
recognisable), then it is unlikely that implementations will do anything
but simply reading the first few bytes.

Currently what I'm looking at is a slight modification of the current
handshake, with the following components:

GET [resource] HTTP/1.1 [CRLF]
Upgrade: WebSocket [CRLF]
Connection: Upgrade [CRLF]
Host: [host]:[port] [CRLF]
Cookie: ... [CRLF]
Origin: [origin] [CRLF]
Sec-WebSocket-Key1: [aaa] [CRLF]
Sec-WebSocket-Key2: [bbbb] [CRLF]
Sec-WebSocket-Key3: [cc] [CRLF]
Sec-WebSocket-Protocol: .... [CRLF]
[any headers introduced by intermediaries] [CRLF]
[CRLF]
[yyyyyy]

...where [aa], [bb], [cc], and [yy] are used in some way by the server to
prove that the handshake was received and read.

If we just use headers, then it's IMHO too easy to write a server that
implements the handshake incorrectly. For example, one could (in Perl) get
the three keys using code like:

m/Sec-WebSocket-Key1: ([0-9a-f])/ # XXX SECURITY BUG!

...which would be vulnerable to stuffing fake keys into the path (which
comes first, in the GET line).

I'd really like to be able to use methods other than GET, so that the
handshake would look like this:

WS813 /path HTTP/1.1
Sec-WebSocket-Key: 419287
...other headers...

58137

...where the three random numbers have to be combined in some special way
(e.g. MD5) and sent back by the server. By putting something before the
path, we prevent the path from being usable to smuggle data in.

As far as I can tell, there are only a few places you can smuggle data in
using the tools available to attackers: you can change the method to one
of a limited set (GET and POST, maybe HEAD), you can change the path, you
can change the Host: header, you can insert specific numbers using the
Content-Length header, and, if using POST, you can affect the entity body.
(On a same-host attack you can do much more, like setting arbitrary
headers that don't start with Sec-, setting Cookies, using arbitary
methods, etc, but I don't think we need to worry about same-host attacks,
since if the attacker can run scripts on your domain, you're already
vulnerable to more traditional XSS attacks.)

As far as I can tell, the one thing you _can't_ do is send a space
character. If we can't change the method, then maybe we could make the key
part have some sort of semantically-important space character. That would
prevent the path from being used to smuggle in the data. However, this
seems to go against HTTP's semantics. Is that a problem? The "Cookie"
header does seem to rely on runs of SPs not being affected, but is that
reliable?

If it is, we could just make the key be something like:

GET [resource] HTTP/1.1
...
Sec-WebSocket-Key: " 34 3 10 "
...
123456

...where the server, to prove it received this, has to take the digits in
the key, interpreted as a single decimal number with the spaces removed,
then divide it by the number of spaces in the quoted key, and concatenate
that to the number in the entity body, then give the MD5 sum of that. (The
division is important to prevent people from tricking servers by making
the server think that _zero_ spaces were sent, on the assumption that if
the server isn't looking that closely at the handshake, they also won't be
expecting to have to check for division-by-zero.)

So in the case above, it would be the MD5 sum of 34310 divided by five,
concatenated with 123456.

It's not pretty, but does it work?
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Greg Wilkins
2010-02-25 04:14:06 UTC
Permalink
Ian,

I really think you are getting too concerned about poor
server implementations. If a server does not even check
for basic HTTP header correctness, then there are probably
innumerable security errors that could result.

If this sort of thing really is a security problem, then
why is Google rolling out the current websocket impl in
Chrome now?

Having said that, I think your proposal of requiring spaces
to exist in the websocket header values is both legal HTTP
and would protect against the poor server implementations
that you describe. If you want to be paranoid, then that
approach has merit.

By requiring a handshake that is not HTTP, you are just
encouraging implementations to NOT use existing HTTP libraries
and thus to be more likely to make security errors of
the type that you are describing.

Surely we would be better off just saying that a websocket
server MUST check that the handshake request is a legal
HTTP request. Compliance tests could then be written to
try illegal requests like you suggest.


regards
Post by Ian Hickson
Post by Maciej Stachowiak
I note that the risk Ian raised can be mitigated even for regexp-based
processing by including a start-of-line assertion in one's regexp.
Putting the nonce in the method name doesn't seem to materially help
because if you can't be bothered to check start-of-line, how can we be
sure you'd check start-of-transmission?
If we can make the entire method be unpredictable (yet still
recognisable), then it is unlikely that implementations will do anything
but simply reading the first few bytes.
Currently what I'm looking at is a slight modification of the current
GET [resource] HTTP/1.1 [CRLF]
Upgrade: WebSocket [CRLF]
Connection: Upgrade [CRLF]
Host: [host]:[port] [CRLF]
Cookie: ... [CRLF]
Origin: [origin] [CRLF]
Sec-WebSocket-Key1: [aaa] [CRLF]
Sec-WebSocket-Key2: [bbbb] [CRLF]
Sec-WebSocket-Key3: [cc] [CRLF]
Sec-WebSocket-Protocol: .... [CRLF]
[any headers introduced by intermediaries] [CRLF]
[CRLF]
[yyyyyy]
...where [aa], [bb], [cc], and [yy] are used in some way by the server to
prove that the handshake was received and read.
If we just use headers, then it's IMHO too easy to write a server that
implements the handshake incorrectly. For example, one could (in Perl) get
m/Sec-WebSocket-Key1: ([0-9a-f])/ # XXX SECURITY BUG!
...which would be vulnerable to stuffing fake keys into the path (which
comes first, in the GET line).
I'd really like to be able to use methods other than GET, so that the
WS813 /path HTTP/1.1
Sec-WebSocket-Key: 419287
...other headers...
58137
...where the three random numbers have to be combined in some special way
(e.g. MD5) and sent back by the server. By putting something before the
path, we prevent the path from being usable to smuggle data in.
As far as I can tell, there are only a few places you can smuggle data in
using the tools available to attackers: you can change the method to one
of a limited set (GET and POST, maybe HEAD), you can change the path, you
can change the Host: header, you can insert specific numbers using the
Content-Length header, and, if using POST, you can affect the entity body.
(On a same-host attack you can do much more, like setting arbitrary
headers that don't start with Sec-, setting Cookies, using arbitary
methods, etc, but I don't think we need to worry about same-host attacks,
since if the attacker can run scripts on your domain, you're already
vulnerable to more traditional XSS attacks.)
As far as I can tell, the one thing you _can't_ do is send a space
character. If we can't change the method, then maybe we could make the key
part have some sort of semantically-important space character. That would
prevent the path from being used to smuggle in the data. However, this
seems to go against HTTP's semantics. Is that a problem? The "Cookie"
header does seem to rely on runs of SPs not being affected, but is that
reliable?
GET [resource] HTTP/1.1
...
Sec-WebSocket-Key: " 34 3 10 "
...
123456
...where the server, to prove it received this, has to take the digits in
the key, interpreted as a single decimal number with the spaces removed,
then divide it by the number of spaces in the quoted key, and concatenate
that to the number in the entity body, then give the MD5 sum of that. (The
division is important to prevent people from tricking servers by making
the server think that _zero_ spaces were sent, on the assumption that if
the server isn't looking that closely at the handshake, they also won't be
expecting to have to check for division-by-zero.)
So in the case above, it would be the MD5 sum of 34310 divided by five,
concatenated with 123456.
It's not pretty, but does it work?
Roberto Peon
2010-02-25 07:03:03 UTC
Permalink
I believe that it is impossible to protect users from stupid server writers.
In this particular case, for instance, were one using a stupid
string-scanning parser, one *should* scan for:
"\nkey: value"
instead of
"key: value"

Server writers should be trusted to frame the header key/value pairs
properly and scan through them. If they are not able or willing to do this,
they're... fools-- HTTP *requires* this for proper framing of messages.
Doing simple string scanning for phrases, instead of proper header framing
and examination is likely to mis-detect or otherwise get the response
framing incorrect. This creates the off-by-one request-response mismatch
security nightmare where the wrong user gets the wrong response
(particularly bad when cookies are being set).

Adding additional complexity to already complex servers is more likely to
cause problems in the servers whose implementors are actually trying to do
the right things. I'd rather help those trying to do the right thing and let
the fools do whatever foolish thing they will do anyway.

I hope that this is a dead horse I'm whipping. :)

In any case, your second (most recent) proposal seems like it could work
without being too onerous for servers.
(I'm referring to something like: Sec-WebSocket-Key: " 34 3 10 ")
Note that header continuations would have to be treated as spaces, etc. Thus
depending on your level of paranoia, you could also do:
Sec-WebSocket-Key: 34 3\r\n
10\r\n

Note that requiring a compute intensive operation at the server gives DoS
attackers additional avenues of attack, so you should be careful about how
much you're requiring the servers to do (it is far better for the client to
do any heavy lifting, despite asymmetries in client capacities, assuming DoS
attacks occur)

-=R
Post by Greg Wilkins
Ian,
I really think you are getting too concerned about poor
server implementations. If a server does not even check
for basic HTTP header correctness, then there are probably
innumerable security errors that could result.
If this sort of thing really is a security problem, then
why is Google rolling out the current websocket impl in
Chrome now?
Having said that, I think your proposal of requiring spaces
to exist in the websocket header values is both legal HTTP
and would protect against the poor server implementations
that you describe. If you want to be paranoid, then that
approach has merit.
By requiring a handshake that is not HTTP, you are just
encouraging implementations to NOT use existing HTTP libraries
and thus to be more likely to make security errors of
the type that you are describing.
Surely we would be better off just saying that a websocket
server MUST check that the handshake request is a legal
HTTP request. Compliance tests could then be written to
try illegal requests like you suggest.
regards
Post by Ian Hickson
Post by Maciej Stachowiak
I note that the risk Ian raised can be mitigated even for regexp-based
processing by including a start-of-line assertion in one's regexp.
Putting the nonce in the method name doesn't seem to materially help
because if you can't be bothered to check start-of-line, how can we be
sure you'd check start-of-transmission?
If we can make the entire method be unpredictable (yet still
recognisable), then it is unlikely that implementations will do anything
but simply reading the first few bytes.
Currently what I'm looking at is a slight modification of the current
GET [resource] HTTP/1.1 [CRLF]
Upgrade: WebSocket [CRLF]
Connection: Upgrade [CRLF]
Host: [host]:[port] [CRLF]
Cookie: ... [CRLF]
Origin: [origin] [CRLF]
Sec-WebSocket-Key1: [aaa] [CRLF]
Sec-WebSocket-Key2: [bbbb] [CRLF]
Sec-WebSocket-Key3: [cc] [CRLF]
Sec-WebSocket-Protocol: .... [CRLF]
[any headers introduced by intermediaries] [CRLF]
[CRLF]
[yyyyyy]
...where [aa], [bb], [cc], and [yy] are used in some way by the server to
prove that the handshake was received and read.
If we just use headers, then it's IMHO too easy to write a server that
implements the handshake incorrectly. For example, one could (in Perl)
get
Post by Ian Hickson
m/Sec-WebSocket-Key1: ([0-9a-f])/ # XXX SECURITY BUG!
...which would be vulnerable to stuffing fake keys into the path (which
comes first, in the GET line).
I'd really like to be able to use methods other than GET, so that the
WS813 /path HTTP/1.1
Sec-WebSocket-Key: 419287
...other headers...
58137
...where the three random numbers have to be combined in some special way
(e.g. MD5) and sent back by the server. By putting something before the
path, we prevent the path from being usable to smuggle data in.
As far as I can tell, there are only a few places you can smuggle data in
using the tools available to attackers: you can change the method to one
of a limited set (GET and POST, maybe HEAD), you can change the path, you
can change the Host: header, you can insert specific numbers using the
Content-Length header, and, if using POST, you can affect the entity
body.
Post by Ian Hickson
(On a same-host attack you can do much more, like setting arbitrary
headers that don't start with Sec-, setting Cookies, using arbitary
methods, etc, but I don't think we need to worry about same-host attacks,
since if the attacker can run scripts on your domain, you're already
vulnerable to more traditional XSS attacks.)
As far as I can tell, the one thing you _can't_ do is send a space
character. If we can't change the method, then maybe we could make the
key
Post by Ian Hickson
part have some sort of semantically-important space character. That would
prevent the path from being used to smuggle in the data. However, this
seems to go against HTTP's semantics. Is that a problem? The "Cookie"
header does seem to rely on runs of SPs not being affected, but is that
reliable?
GET [resource] HTTP/1.1
...
Sec-WebSocket-Key: " 34 3 10 "
...
123456
...where the server, to prove it received this, has to take the digits in
the key, interpreted as a single decimal number with the spaces removed,
then divide it by the number of spaces in the quoted key, and concatenate
that to the number in the entity body, then give the MD5 sum of that.
(The
Post by Ian Hickson
division is important to prevent people from tricking servers by making
the server think that _zero_ spaces were sent, on the assumption that if
the server isn't looking that closely at the handshake, they also won't
be
Post by Ian Hickson
expecting to have to check for division-by-zero.)
So in the case above, it would be the MD5 sum of 34310 divided by five,
concatenated with 123456.
It's not pretty, but does it work?
_______________________________________________
hybi mailing list
https://www.ietf.org/mailman/listinfo/hybi
Dave Cridland
2010-02-25 10:22:01 UTC
Permalink
Post by Maciej Stachowiak
Post by Maciej Stachowiak
I note that the risk Ian raised can be mitigated even for
regexp-based
Post by Maciej Stachowiak
processing by including a start-of-line assertion in one's regexp.
Putting the nonce in the method name doesn't seem to materially
help
Post by Maciej Stachowiak
because if you can't be bothered to check start-of-line, how can
we be
Post by Maciej Stachowiak
sure you'd check start-of-transmission?
If we can make the entire method be unpredictable (yet still
recognisable), then it is unlikely that implementations will do anything
but simply reading the first few bytes.
This sentiment seems to come up rather a lot - I think we need to
make the general assumption that implementors know what they're
doing, because we've written a specification to tell them.

Handling the handshake is something that (one assumes) the webserver
implementors will be doing, and at some point you have to assume that
people are going to do at least some things right, and our job should
really be limited to ensuring that what those things are is well
defined.

So assuming that a server implementor will check the method, upgrade
header, etc - what's the threat then?

Dave.
--
Dave Cridland - mailto:***@cridland.net - xmpp:***@dave.cridland.net
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Ian Hickson
2010-02-25 10:52:50 UTC
Permalink
Post by Ian Hickson
If we can make the entire method be unpredictable (yet still
recognisable), then it is unlikely that implementations will do
anything but simply reading the first few bytes.
This sentiment seems to come up rather a lot - I think we need to make
the general assumption that implementors know what they're doing,
because we've written a specification to tell them.
The target audience of the Web Socket draft (which may or may not be the
same target audience as the draft produced by this working group will be
written for) is Web authors of the caliber of those who write CGI scripts
or HTML+JS pages today.

If we look at how well HTML+JS authors have suceeded at following the HTML
specifications, the results aren't encouraging. Depending on how strict
you are in defining conformance errors, at least 70% to 95% of pages on
the Web today are syntactically invalid HTML.

Therefore I do not for a minute think that we can rely on such authors to
implement things just because the spec says to. This is why the Web Socket
protocol is designed to be as fool-proof as possible.

This doesn't mean it's inappropriate for more professional deployments, of
course. It just means that we have to consider more than just the
professional programmers when designing the protocol. It's easier to take
a protocol designed for amateurs and add features that professionals
desire (e.g. compression) than it is to take a protocol designed for
professionals and make it idiot-proof.


(Again, though, this isn't supposed to be a statement of what the group's
goals should be. If the working group wants to work on something that only
targets professionals, that's fine -- it's just not what I'm working on.)
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Dave Cridland
2010-02-25 11:03:02 UTC
Permalink
Post by Ian Hickson
Post by Dave Cridland
Post by Ian Hickson
If we can make the entire method be unpredictable (yet still
recognisable), then it is unlikely that implementations will do
anything but simply reading the first few bytes.
This sentiment seems to come up rather a lot - I think we need to
make
Post by Dave Cridland
the general assumption that implementors know what they're doing,
because we've written a specification to tell them.
The target audience of the Web Socket draft (which may or may not be the
same target audience as the draft produced by this working group will be
written for) is Web authors of the caliber of those who write CGI scripts
or HTML+JS pages today.
I don't see why.

I understand that the folks that will be handling the actual
WebSocket itself will be closer to the general public (especially at
first, when they'll more than likely have to write framing libraries,
hence my continued argument that this should be much simpler).

But handling the initial HTTP upgrade handshake will surely be done
by the people who write IIS and Apache, and I think we can safely
assume that these people are likely to be on this list, and reading
the specifications carefully.

Dave.
--
Dave Cridland - mailto:***@cridland.net - xmpp:***@dave.cridland.net
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Ian Hickson
2010-02-25 11:35:58 UTC
Permalink
Post by Dave Cridland
Post by Ian Hickson
The target audience of the Web Socket draft (which may or may not be
the same target audience as the draft produced by this working group
will be written for) is Web authors of the caliber of those who write
CGI scripts or HTML+JS pages today.
I don't see why.
Because the more people can use this, the better the Web can be. Web
Sockets should be usable by someone with a weekend of hacking, without
them having to get their shared host provider to install anything. This is
just like CGIs today.
Post by Dave Cridland
But handling the initial HTTP upgrade handshake will surely be done by
the people who write IIS and Apache, and I think we can safely assume
that these people are likely to be on this list, and reading the
specifications carefully.
I do not expect most deployments of WebSockets to involve an HTTP server
at all. There's really no need, except if you're in a situation where you
happen to only have one IP address and you want everything to work over
port 443 (since that's the most likely to work given firewalls). I think
it's worth handling that case, but I don't think it'll be the common case
in terms of number of deployments. It might be the common case in terms of
number of connections, though, since most deployments will see very few
users, just like most Web pages today see very few users; qv. the long tail.


Again, though, that's just what I personally want to work on. I'm not
saying that the working group has to target this segment. It may be that
the working group is not interested in this segment at all.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Dave Cridland
2010-02-25 12:32:01 UTC
Permalink
Post by Ian Hickson
Post by Dave Cridland
Post by Ian Hickson
The target audience of the Web Socket draft (which may or may
not be
Post by Dave Cridland
Post by Ian Hickson
the same target audience as the draft produced by this working
group
Post by Dave Cridland
Post by Ian Hickson
will be written for) is Web authors of the caliber of those who
write
Post by Dave Cridland
Post by Ian Hickson
CGI scripts or HTML+JS pages today.
I don't see why.
Because the more people can use this, the better the Web can be.
Yes, true. With these groundbreaking asynchronous connection-oriented
virtual streams, it can even be almost as good as the actual Internet.
Post by Ian Hickson
Web
Sockets should be usable by someone with a weekend of hacking,
without
them having to get their shared host provider to install anything. This is
just like CGIs today.
CGIs, today (and PHP, and whatever else) still need the framework to
support them (ie, a webserver that supports CGI, PHP, etc). On a
shared host provider, you typically don't get to install servers, you
pick one that supports what you need - CGI, PHP, or in this case
WebSocket servlets. Even really quite creative people rarely
reimplement HTTP to "do web stuff", so I fail to see why you think
this will radically change with WebSockets.
Post by Ian Hickson
Post by Dave Cridland
But handling the initial HTTP upgrade handshake will surely be
done by
Post by Dave Cridland
the people who write IIS and Apache, and I think we can safely
assume
Post by Dave Cridland
that these people are likely to be on this list, and reading the
specifications carefully.
I do not expect most deployments of WebSockets to involve an HTTP server
at all. There's really no need, except if you're in a situation where you
happen to only have one IP address and you want everything to work over
port 443 (since that's the most likely to work given firewalls). I think
it's worth handling that case, but I don't think it'll be the
common case
in terms of number of deployments. It might be the common case in terms of
number of connections, though, since most deployments will see very few
users, just like most Web pages today see very few users; qv. the long tail.
Okay, so for the sake of example I'll pretend to agree that the
average weekend hacker will willingly implement their own HTTP
framework rather than just grab some existing one. I think you're
bordering on delusional, here, but I'll go along with this.

What you seem to be attempting to do is make the protocol
sufficiently hard to implement that you'll either put people off
doing so, or else force them somehow into doing it "right", such that
the resulting weekend hacking project is secure against various XSS
style attacks and other potential hazards.

The thing is, I think people will be just fine with knowing that if
they knock something together without reading the spec - particularly
the Security Considerations section - then it might turn out to be
insecure in some respects. I think you're basically safe from people
hunting you down with pitchforks and burning torches, screaming at
you because they got something working, but it isn't secure against
some XSS attack.

On the other hand, people will - and indeed, I am - complaining
you're making quick hack deployment needlessly onerous by having to
hurl about strange crypto and bizarre NIH framing.

Dave.
--
Dave Cridland - mailto:***@cridland.net - xmpp:***@dave.cridland.net
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Justin Erenkrantz
2010-02-28 03:29:39 UTC
Permalink
CGIs, today (and PHP, and whatever else) still need the framework to support
them (ie, a webserver that supports CGI, PHP, etc). On a shared host
provider, you typically don't get to install servers, you pick one that
supports what you need - CGI, PHP, or in this case WebSocket servlets. Even
really quite creative people rarely reimplement HTTP to "do web stuff", so I
fail to see why you think this will radically change with WebSockets.
While it did not originate with me, but from another Apache HTTP
Server developer, it has been suggested to me that Apache httpd should
- by default - actively block CGIs which attempt to implement "custom"
wire-level protocols on top of HTTP - like WS. Any custom protocol
implementation sitting inside of a CGI is going to be liable to be
error prone and have devastating side-effects on the scaling
properties of the server. If you are a mass vhost provider and
suddenly all of your network communication must be funneled through a
variety of long-lived CGIs (of differing origins!), there is going to
be some immense pain inflicted on the server. So, the addition of
such a protective feature would likely receive serious consideration
from myself and other members of ***@httpd.

As I've maintained before, I'm still hopeful that sanity will prevail
and we can correct the fundamental failings in the latest WS drafts.
-- justin
KOMATSU Kensaku
2010-02-25 12:59:27 UTC
Permalink
Hi Hixie.

I,m new comer here. And I'm (HTML + JS today) developing websocket
pipelining demonstration (http://bloga.jp/koma/ws/pipelinetest.html :
sorry for japanese and my friend "Makoto Inoue" also comment about
this demonstration in Kaazing blog
http://www.kaazing.com/blog/?p=310).

In your aspect, you're so excellent, it'll be easy to implement
websocket server and client, and you may think amateur programmer also
implement it.

But, in my perspective, it is hard to implement it for amateur progremmer.

For example, Fujishima (pywebsocket developer as you know) is
excellent developer. And other my friend(japan), who tries to
developing websocket server is working so hard to catch up with
spec. My argument is that.... it's not so easy to implement like cgi.

My argue is that don't be so nervous about this topic. And please make
excellent specification( not so care about invalid cgi progremmer).

IMO, we want to make new generation web service with excellent websockets.
Post by Ian Hickson
Post by Dave Cridland
Post by Ian Hickson
The target audience of the Web Socket draft (which may or may not be
the same target audience as the draft produced by this working group
will be written for) is Web authors of the caliber of those who write
CGI scripts or HTML+JS pages today.
I don't see why.
Because the more people can use this, the better the Web can be. Web
Sockets should be usable by someone with a weekend of hacking, without
them having to get their shared host provider to install anything. This is
just like CGIs today.
Post by Dave Cridland
But handling the initial HTTP upgrade handshake will surely be done by
the people who write IIS and Apache, and I think we can safely assume
that these people are likely to be on this list, and reading the
specifications carefully.
I do not expect most deployments of WebSockets to involve an HTTP server
at all. There's really no need, except if you're in a situation where you
happen to only have one IP address and you want everything to work over
port 443 (since that's the most likely to work given firewalls). I think
it's worth handling that case, but I don't think it'll be the common case
in terms of number of deployments. It might be the common case in terms of
number of connections, though, since most deployments will see very few
users, just like most Web pages today see very few users; qv. the long tail.
Again, though, that's just what I personally want to work on. I'm not
saying that the working group has to target this segment. It may be that
the working group is not interested in this segment at all.
--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
_______________________________________________
hybi mailing list
https://www.ietf.org/mailman/listinfo/hybi
Scott Ferguson
2010-02-25 17:01:41 UTC
Permalink
Post by Ian Hickson
Post by Dave Cridland
Post by Ian Hickson
The target audience of the Web Socket draft (which may or may not be
the same target audience as the draft produced by this working group
will be written for) is Web authors of the caliber of those who write
CGI scripts or HTML+JS pages today.
I don't see why.
Because the more people can use this, the better the Web can be. Web
Sockets should be usable by someone with a weekend of hacking, without
them having to get their shared host provider to install anything. This is
just like CGIs today.
CGIs don't handle HTTP, they run behind a web server like Apache which
handles the network connections, HTTP parsing, threading, configuration,
etc.
Post by Ian Hickson
Post by Dave Cridland
But handling the initial HTTP upgrade handshake will surely be done by
the people who write IIS and Apache, and I think we can safely assume
that these people are likely to be on this list, and reading the
specifications carefully.
We are on the list and we're commenting, but we're being ignored :)
Post by Ian Hickson
I do not expect most deployments of WebSockets to involve an HTTP server
at all. There's really no need, except if you're in a situation where you
happen to only have one IP address and you want everything to work over
port 443 (since that's the most likely to work given firewalls). I think
it's worth handling that case, but I don't think it'll be the common case
in terms of number of deployments. It might be the common case in terms of
number of connections, though, since most deployments will see very few
users, just like most Web pages today see very few users; qv. the long tail.
That's crazy. App developers don't waste their time parsing low-level
bytes from the network or managing sockets. And if their ISP lets them
run arbitrary programs listening to arbitrary ports, they can certainly
download a server that does the same thing instead of writing their own.

I'm not certain you understand the complexity of a WebSocket
application. Not parsing. Parsing is trivial. Grammars are trivial.

But the threading and synchronization issues are not trivial at all. On
the server side, not only do you need the trivial parsing of the wire
bytes, but you need to organize the threads and APIs so the application
is safe, simple and maintainable. Along with resolving the higher-level
issues that Greg brought up like fragmenting and ordering (i.e. large
messages not freezing the system.) And the boring low-level details like
dealing with sockets that you need to get right. Someone writing their
own server needs to create those APIs, and resolve those issues, and
they're non-obvious API.

And yet you're assuming that someone who can solve those issues cannot
parse a simple regular grammar, or even read a grammar. It's crazy.

-- Scott
Post by Ian Hickson
Again, though, that's just what I personally want to work on. I'm not
saying that the working group has to target this segment. It may be that
the working group is not interested in this segment at all.
Justin Erenkrantz
2010-02-28 03:43:11 UTC
Permalink
That's crazy. App developers don't waste their time parsing low-level bytes
from the network or managing sockets. And if their ISP lets them run
arbitrary programs listening to arbitrary ports, they can certainly download
a server that does the same thing instead of writing their own.
I'm not certain you understand the complexity of a WebSocket application.
 Not parsing. Parsing is trivial. Grammars are trivial.
But the threading and synchronization issues are not trivial at all. On the
server side, not only do you need the trivial parsing of the wire bytes, but
you need to organize the threads and APIs so the application is safe, simple
and maintainable. Along with resolving the higher-level issues that Greg
brought up like fragmenting and ordering (i.e. large messages not freezing
the system.) And the boring low-level details like dealing with sockets that
you need to get right. Someone writing their own server needs to create
those APIs, and resolve those issues, and they're non-obvious API.
And yet you're assuming that someone who can solve those issues cannot parse
a simple regular grammar, or even read a grammar. It's crazy.
+1 to all of the above. -- justin
Greg Wilkins
2010-02-27 08:29:21 UTC
Permalink
Ian,

I think your argument that websocket should be easy to implement
by application developers was probably a lot more valid before
websocket was moved to share port 80 with HTTP.

Now that it is sharing a port with HTTP, I believe the
vast majority of users will be working with HTTP servers
that provide websocket support (just as they provide
CGI or servlet support).

There may be a few that will write stand-alone servers
that are unrelated to HTTP servers - but in this case,
the majority of them will almost certainly use a
websockets library written for their language of choice (eg
a websocket lib for perl can crack open 8888 directly
and listen as a stand-alone web socket server).

So to bring this back to the parallel conversations on
requirements. I think what you are saying is that the
websocket protocol should not require a HTTP server - and
I think we are all mostly in agreement with that and I've
proposed the requirement that"

A web socket server MUST support only those parts of HTTP that are
necessary for a websocket connection to be securely established.


I think you also imply another requirement:

Keep it Simple Stupid!


I really hope that this does not actually need to be
explicitly stated.

We want a solution that allows bidirectional connections to
be established securely with no unnecessary complications.
If the resulting protocol is something that a dyslexic
semi-illiterate perl hacker can work with over the
weekend - then great! But I don't think we should compromise
any real security or interoperability requirements to
specifically target such developers.


cheers
Justin Erenkrantz
2010-02-28 03:40:36 UTC
Permalink
Post by Ian Hickson
I'd really like to be able to use methods other than GET, so that the
  WS813 /path HTTP/1.1
  Sec-WebSocket-Key: 419287
  ...other headers...
  58137
I believe that having the method name dynamic in an attempt to provide
security is going to actually achieve the exact opposite: it will
expose the server to more security risks.

For example, one of the security mechanisms revolving around
protection in Apache httpd use the "LimitExcept" syntax for denying
access to particular regions of the server namespace (it whitelists
the methods that are allowed). If we now rely upon method names being
random, then those checks become fundamentally worthless. -- justin
Graham Klyne
2010-02-23 07:50:55 UTC
Permalink
Hmmm... it seems to me the key claim is "If the attacker can trick a
non-WebSocket server into echoing back chosen text (for example through
something in the URL part of the request), then they could make it give what
appears to be a valid WebSocket handshake response. This could result in
unauthorized access."

Yet, by my reckoning, this can only be a problem if the Javascript code is
trusted, which I don't think can ever completely be the case. That is, if the
server is vulnerable to exposing unauthorized access though this mechanism, it's
simply vulnerable. Period. All the information available to the Javascript
code could also be available to a hand-crafted HTTP client script that could
participate in whatever handshake one chooses to design.

I think this discussion is about trying to create security fences in the wrong
place.

To the extent that a browser is trusted to handle client credentials, it seems
to me that the sandboxing techniques discussed on HTML5 list are better targeted
to dealing with spoofing issues.

(I can't claim my analysis is perfect, and I'm well prepared to be corrected,
but I think that since a security issue has been raised it's appropriate to
question if it's being addressed in a sensible way.)

#g
--
Post by Ian Hickson
Post by Tim Bray
Post by Ian Hickson
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
Sorry, I'm missing something. Could you outline the nature of the
attack that's worrying you in a little more detail?
I was referring to the cross-protocol attacks described by Maciej in an
http://www.ietf.org/mail-archive/web/hybi/current/msg01198.html
Maciej Stachowiak
2010-02-23 08:52:39 UTC
Permalink
Post by Graham Klyne
Hmmm... it seems to me the key claim is "If the attacker can trick a
non-WebSocket server into echoing back chosen text (for example through
something in the URL part of the request), then they could make it give what
appears to be a valid WebSocket handshake response. This could
result in
unauthorized access."
Yet, by my reckoning, this can only be a problem if the Javascript code is
trusted, which I don't think can ever completely be the case.
JavaScript code is definitely not trusted. That's kind of the point.
Though browsers will reliably tell you what Origin it came from, so
you can decide if you actually trust the person who served it.
Post by Graham Klyne
That is, if the server is vulnerable to exposing unauthorized access
though this mechanism, it's
simply vulnerable. Period. All the information available to the Javascript
code could also be available to a hand-crafted HTTP client script that could
participate in whatever handshake one chooses to design.
Hand-crafted scripts lack three important properties of an in-browser
cross-protocol attack:

1) They cannot act with the victim user's automatically added
credentials (Cookies, HTTP auth, client-side certs, etc).
2) They cannot use the browser's potentially privileged network
position to access resources behind firewalls.
3) They do not provide an automatic platform for distributed attacks
(which can take the form of DDOS, brute force to break in, or simply
covering the attacker's tracks.

That's why browsers have the same-origin policy even though of course
a server could receive any content at all from a handcrafted script
run by the attacker.
Post by Graham Klyne
I think this discussion is about trying to create security fences in the wrong
place.
To the extent that a browser is trusted to handle client
credentials, it seems
to me that the sandboxing techniques discussed on HTML5 list are better targeted
to dealing with spoofing issues.
I'm pretty familiar with HTML5 sandboxed iframes (WebKit has an early
experimental implementation) and I do not see how they are applicable
to solving this problem.
Post by Graham Klyne
(I can't claim my analysis is perfect, and I'm well prepared to be
corrected, but I think that since a security issue has been raised
it's appropriate to question if it's being addressed in a sensible
way.)
Perhaps we are not explaining the threat models clearly enough. I'm
not sure what further explanation would help though.

Regards,
Maciej
Graham Klyne
2010-02-23 10:11:22 UTC
Permalink
Post by Maciej Stachowiak
Hand-crafted scripts lack three important properties of an in-browser
1) They cannot act with the victim user's automatically added
credentials (Cookies, HTTP auth, client-side certs, etc).
2) They cannot use the browser's potentially privileged network position
to access resources behind firewalls.
3) They do not provide an automatic platform for distributed attacks
(which can take the form of DDOS, brute force to break in, or simply
covering the attacker's tracks.
That's why browsers have the same-origin policy even though of course a
server could receive any content at all from a handcrafted script run by
the attacker.
[...]
Post by Maciej Stachowiak
Perhaps we are not explaining the threat models clearly enough. I'm not
sure what further explanation would help though.
Your 3 points above help some, but I'm still not clear:

(1) This seems to be the key concern, and I thought (maybe incorrectly) that
this was the kind of threat that sandboxing was intended to combat (this based
on reading the MSR paper Ian cited a little while ago on the HTML5 list - maybe
current sandbox implementations don't fit within the model presented there).

(2) In what sense does a browser have a "potentially privileged network
position" that is not available to a hand-crafted script?

(3) It seems to me this this becomes a problem only when browsers are either
(a) compromised, in which case all bets are off, or
(b) running permanently and uninterrupted and able to receive or poll for
incoming instructions, which I can see is a possibility. It makes me wonder if
the trend towards browser-as-operating system is wise, as, in consideration of
Ross Anderson's comments about security as "programming Satan's computer", I'm
pretty sure there are an awful lot of devilish details to combat. Considering
how hard it has proven to adequatekly secure operating systems platforms that
have been conceived from the ground up to provide some level of security, I have
doubts about how effectively this can be retrofitted in interoperable fashion to
the range of browsers out there.

But I suppose this is drifting rather off-topic.

#g
John Fallows
2010-02-23 07:40:03 UTC
Permalink
Ian,

Using the Upgrade header with 101 status code seems a much cleaner usage of
the HTTP specification than attempting to re-purpose HTTP CONNECT for the
WebSocket handshake.

Relaxation of the strict ordering constraint on HTTP headers in the existing
WebSocket handshake is also preferable, especially since using a nonce would
seem to overcome any lingering security concerns regarding injection
attacks, as you describe.

-1 HTTP CONNECT
+1 Sec-WebSocket-Unique-Id (possibly renamed)

Kind Regards,
John Fallows
Post by Ian Hickson
I've been going through the e-mail from the past few months trying to
figure out a way to take into account all the feedback. However, I've run
into a problem, and would like some feedback from server-side developers
to determine how to proceed.
* Currently WebSockets has a weak HTTP-to-WebSocket cross-protocol attack
protection in the form of requiring that the first few bytes of the
connection match a particular set of bytes. This is weak, because the
server can just ignore it (and thus be vulnerable); it would be better to
have something that the server has to do to prove it read the handshake
and to prove that the handshake is received was not something that could
be faked by a client speaking another protocol (e.g. a client doing an XHR
request or a scripted <form> post).
* It has been suggested that the limitations on the order of headers
causes serious problems with existing HTTP stacks.
* It has also been suggested that using GET with an Upgrade is harder to
implement on existing HTTP stacks than CONNECT would be.
Given this, I've been examining what we could do by, for instance, adding
some unpredictable values to the headers that the server must process in
order to prove that the handshake was read, as well as considering what
the results would be of using CONNECT instead of Upgrade:, and of moving
the bulk of the handshake into the post-HTTP part of the connection rather
than in the HTTP headers themselves.
Unfortunately, I'm finding it very difficult to come up with a design that
is compatible with HTTP, fits HTTP stacks, and is secure.
CONNECT host:port HTTP/1.1 [CRLF]
Sec-WebSocket-Unique-Id: 123456789 [CRLF]
[CRLF]
etc...
...where other headers are ignored (allowing intermediaries to fiddle with
the headers as they might), then amateur server-side implementors are very
likely to just scan for the string "Sec-WebSocket-Unique-Id", making them
GET /Sec-WebSocket-Unique-Id:123456789 HTTP/1.1 [CRLF]
etc...
...or some such -- there are plenty of ways to insert data into the
headers, even if you can't actually insert the header itself.
If we can't find a solution for this, then too bad, but I'd really like to
find a solution that is safe. The requirement is basically that the
unpredictable part come before the first bit of an HTTP connection that
WS123456789 host:port HTTP/1.1 [CRLF]
[CRLF]
etc...
Here, the method actually contains the unpredictable key. The idea here is
that the key starts before the first character of a GET's path, so there's
nothing the attacker can do to insert the key in the relevant part of the
payload. We can then make the processing that the server must do with this
be something that makes no sense with text and cannot be performed with a
zero (e.g. convert it to a number and then perform a numeric operation on
the number, like divide another number by this number), which would ensure
that the server really checks that it got the handshake before continuing.
This might work, but it leads to the aforementioned question: Can existing
server-side HTTP stacks handle this like a CONNECT easily, or is this line
of thought a non-starter? Any feedback on this would be most welcome.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
_______________________________________________
hybi mailing list
https://www.ietf.org/mailman/listinfo/hybi
--
Post by Ian Hickson
|< Kaazing Corporation >|<
John Fallows | CTO | +1.650.960.8148
888 Villa St, Ste 410 | Mountain View, CA 94041, USA
Vladimir Katardjiev
2010-02-23 07:52:50 UTC
Permalink
Heya,

Perhaps a different angle would be of use in this consideration. If I understand correctly, the "cross-protocol attacks" in question refer to (ab)using HTTP request objects (XHR, CORS, et al) to falsely upgrade to WebSockets. So how about the WebSocket handshake violates the nature of those objects instead (they're designed for a single request/response pair)

A four-part handshake (HTTP Request/HTTP Response/WebSocket Request/WebSocket Response) where every step of the way needs to read data from the previous step (i.e. WS Resp needs data from WS Req, which needs data from HTTP Resp, which neesd data from HTTP Req) should violate the request/response model in a way that makes sense for a correctly established Upgrade (so no HTTP violation) but there shouldn't be a way (from within a browser/HTTP environment) to send an appropriately framed WebSocket request based on the HTTP response.

This way there wouldn't be any need to perform any manipulation on the HTTP part of the handshake itself, and the WebSocket part can be designed to fit any requirements. Of course, the downside is another round-trip, plus added complexity. (This latter point is somewhat negated though given that to reliably pass proxies I had to add TLS on the server anyway. In comparison, a second round of handshake looks like a walk in the park)

Vladimir

-----Original Message-----
From: hybi-***@ietf.org [mailto:hybi-***@ietf.org] On Behalf Of Ian Hickson
Sent: den 23 februari 2010 02:26
To: ***@ietf.org
Subject: [hybi] Reducing the constraints on the HTTP part of WebSockets

[...]

Given this, I've been examining what we could do by, for instance, adding some unpredictable values to the headers that the server must process in order to prove that the handshake was read, as well as considering what the results would be of using CONNECT instead of Upgrade:, and of moving the bulk of the handshake into the post-HTTP part of the connection rather than in the HTTP headers themselves.

Unfortunately, I'm finding it very difficult to come up with a design that is compatible with HTTP, fits HTTP stacks, and is secure.

[...]
Maciej Stachowiak
2010-02-23 08:00:40 UTC
Permalink
Post by Vladimir Katardjiev
Heya,
Perhaps a different angle would be of use in this consideration. If
I understand correctly, the "cross-protocol attacks" in question
refer to (ab)using HTTP request objects (XHR, CORS, et al) to
falsely upgrade to WebSockets. So how about the WebSocket handshake
violates the nature of those objects instead (they're designed for a
single request/response pair)
A four-part handshake (HTTP Request/HTTP Response/WebSocket Request/
WebSocket Response) where every step of the way needs to read data
from the previous step (i.e. WS Resp needs data from WS Req, which
needs data from HTTP Resp, which neesd data from HTTP Req) should
violate the request/response model in a way that makes sense for a
correctly established Upgrade (so no HTTP violation) but there
shouldn't be a way (from within a browser/HTTP environment) to send
an appropriately framed WebSocket request based on the HTTP response.
I think the key here is "every step of the way needs to read data from
the previous step" (presumably with the assumption that none of this
data is predictable up-front to client-side JS, which is the potential
attacker in these scenarios). Given that, I am not sure a four-way
handshake buys you much over two-way.

If everything in the HTTP Request and HTTP Response could be predicted
by client-side JS for instance, then you are hosed because it can
precompute the WebSocketRequest. But if those two messages are not
predictable, then I am not sure the extra round trip buys you much.

Note also that it's easier for the client-side JS attacker to control
the part that would appear to be an HTTP body in a cross-protocol
attack than the part that would appear to be the HTTP request header.

Regards,
Maciej
Vladimir Katardjiev
2010-02-23 08:37:35 UTC
Permalink
I think the key here is "every step of the way needs to read data from the previous step" (presumably with the assumption that none of this data is predictable up-front to client-side JS, which is the potential attacker in these scenarios). Given that, I am not sure a four-way handshake buys you much over two-way.
If everything in the HTTP Request and HTTP Response could be predicted by client-side JS for instance, then you are hosed because it can precompute the WebSocketRequest. But if those two messages are not predictable, then I am not sure the extra round trip buys you much.
Well, maybe it'll be clearer if we separate the attack vectors. There are basically two scenarios: XHR to WebSocket server, and WebSocket to arbitrary server.

In a scenario where the client attacker uses XHR/CORS/... to make a request. In this case, the entire request can be considered compromised, so the responsibility of breaking the connection is the server's responsibility. In this case, the server sending back an unpredictable reply is pointless, because the client doesn't need to read it (it's compromised, after all: XHR won't do WebSocket nonce checks, and why would the attacker?). By adding a client response to a server challenge, the WebSocket server can protect itself against these attacks, because it too can send a nonce-value it expects a reply to. In this case, the second round-trip makes such an action possible (otherwise the server has no data it can actually predict and check).

In a scenario where the client attacker uses WebSocket to connect to, e.g. a mail server, sending what amounts to a request body is (or should be) impossible before the browser has relinquished control of the object to the JS code. In this case, sending an additional request/response pair is outside the attacker's control and would likely break the connection, as the WebSocket message is malformed for virtually all intents and purposes. It also would establish that any and all intermediaries are capable of transferring WebSocket frames (or break if they aren't)

Vladimir
Maciej Stachowiak
2010-02-23 08:55:59 UTC
Permalink
Post by Vladimir Katardjiev
Post by Maciej Stachowiak
I think the key here is "every step of the way needs to read data
from the previous step" (presumably with the assumption that none
of this data is predictable up-front to client-side JS, which is
the potential attacker in these scenarios). Given that, I am not
sure a four-way handshake buys you much over two-way.
If everything in the HTTP Request and HTTP Response could be
predicted by client-side JS for instance, then you are hosed
because it can precompute the WebSocketRequest. But if those two
messages are not predictable, then I am not sure the extra round
trip buys you much.
Well, maybe it'll be clearer if we separate the attack vectors.
There are basically two scenarios: XHR to WebSocket server, and
WebSocket to arbitrary server.
In a scenario where the client attacker uses XHR/CORS/... to make a
request. In this case, the entire request can be considered
compromised, so the responsibility of breaking the connection is the
server's responsibility.
Not exactly the entire request, because there is a limit on what kinds
of messages can be sent with cross-site XHR+CORS or with XDR. We can
rely on using something that can't be added to a request by client-
side JS.
Post by Vladimir Katardjiev
In this case, the server sending back an unpredictable reply is
pointless, because the client doesn't need to read it (it's
compromised, after all: XHR won't do WebSocket nonce checks, and why
would the attacker?). By adding a client response to a server
challenge, the WebSocket server can protect itself against these
attacks, because it too can send a nonce-value it expects a reply
to. In this case, the second round-trip makes such an action
possible (otherwise the server has no data it can actually predict
and check).
Indeed, but in the XHR case the server can simply look for something
that could not be sent via cross-site XHR (such as various blacklisted
header fields).
Post by Vladimir Katardjiev
In a scenario where the client attacker uses WebSocket to connect
to, e.g. a mail server, sending what amounts to a request body is
(or should be) impossible before the browser has relinquished
control of the object to the JS code. In this case, sending an
additional request/response pair is outside the attacker's control
and would likely break the connection, as the WebSocket message is
malformed for virtually all intents and purposes. It also would
establish that any and all intermediaries are capable of
transferring WebSocket frames (or break if they aren't)
Really the key in this case is that the server needs to echo back
something that can't easily be predicted or trivially constructed from
the pieces of the client handshake. If you have that I don't think
extra round trips help much.

Regards,
Maciej
Dave Cridland
2010-02-23 14:04:39 UTC
Permalink
Post by Ian Hickson
I've been going through the e-mail from the past few months trying to
figure out a way to take into account all the feedback. However, I've run
into a problem, and would like some feedback from server-side
developers
to determine how to proceed.
Even after reviewing the responses on the list, I'm not (yet)
entirely clear on the form of the attack, but assuming that a rogue
bit of scripting could in principle form any valid HTTP request, and
follow it with abitrary data, then we have an amusing conundrum.

What you're suggesting is that in order to defeat this, the HTTP
request needs to have some feature unavailable to HTTP, and yet the
HTTP request/response has to remain clean, standard, HTTP in order to
work with the existing stacks. I understand, finally, the reasoning
behind the fixed-form that the negotiation takes in the current
draft, but I agree with your implicit suggestion that this isn't a
good solution.

What about a three-way handshake, with client-speaks-first on the
WebSocket itself? Something akin to:

** HTTP **
Client: "Give me a websocket session to foo." [HTTP request, probably
GET with Upgrade]
Server: "Okay. Your magic token is 'bar'" [HTTP response, followed by
switch to WS]
** WS **
Client: "My magic token was 'bar'". [WS special message type]
[... server validates WS session and the subprotocol begins ...]

As a throwaway suggestion, it might even be useful to have, as part
of this "session startup message", a SASL negotiation.

With or without the introduction of SASL, this would add a
one-half-RTT to the WS setup, which may or may not impact the
subprotocol itself (since clients can assume success and pipeline
through).

With SASL - which could be ANONYMOUS of course - this would increase,
but it allows techniques like channel binding to be deployed at the
best place. I'm not expecting much support for the notion of building
SASL into it, though.

Dave.
--
Dave Cridland - mailto:***@cridland.net - xmpp:***@dave.cridland.net
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Willy Tarreau
2010-02-23 12:11:31 UTC
Permalink
Hello Greg,

I hope you don't mind me joining the discussion.
Post by Greg Wilkins
Post by Ian Hickson
* It has also been suggested that using GET with an Upgrade is harder to
implement on existing HTTP stacks than CONNECT would be.
Where did this suggestion come from?
CONNECT is a method that is intended to be used for opening proxied
connections. Servers will either not support it or have it wired up
to some proxy code. Subverting this for a protocol upgrade is
highly undesirable.
The GET with upgrade header is very much the appropriate mechanism to
use. Servers will either already support Upgrade - in which case ws
should be easy, or they will not - in which case they can simply
implement the mechanism as specified by RFC2616.
I was one of those suggesting to Ian to use the CONNECT in complement
of the existing method. While this can look strange at first, it is
not that stupid because of existing products : a wide variety of
HTTP gateways (reverse-proxies, load balancers, ...) are compatible
with proxies. Typically, most if not all HTTP load balancers can be
installed in front of HTTP proxies. As such, they already *do* support
the CONNECT method without doing anything specific. Some commercial
products such as Alteon, as well as some open source software such as
Pound and Haproxy come to mind.

However, most of these products will not handle the specific case of
the 101 status code. The reason is that 1) nobody uses it right now,
and 2) the way it's defined in RFC2616 lets the reader think that if
they don't explcitly need to implement it, they can process it as a
100 response :

RFC2616 page 40 #6.1 :
"HTTP applications are not required
to understand the meaning of all registered status codes, though such
understanding is obviously desirable. However, applications MUST
understand the class of any status code, as indicated by the first
digit, and treat any unrecognized response as being equivalent to the
x00 status code of that class"

But the 100 is just an informational response which can safely be
ignored, leaving the gateway waiting for the next non-1xx response.
Until very recently (discussions with Ian in fact), haproxy did not
support 101 and considered it as 100. Pound does not support 101
either. And to the best of my knowledge, Alteon does not support it
either. Some products will simply consider that the 101 preceeds
another response (=100), others (products grown from HTTP/1.0) will
take it as any definitive status code and will expect a new HTTP
request after the first GET.

Concerning possible issues with the CONNECT method when handled by
an origin server, it was already suggested in RFC2817 that an origin
server might handle it by itself :

RFC2817 page 6 #5.3 :
"An origin server which receives a CONNECT request for itself MAY
respond with a 2xx status code to indicate that a connection is
established."

Also, some servers also support being used as proxies (eg: apache),
so making them support the CONNECT method would probably not be a
big deal.

At first when I discussed with Ian, I really was in favor of using
CONNECT *instead of* the 101+upgrade scheme. Now after some discussion
with him as well as some deeper reading of various docs, I think it
would be very nice to mimmic what was done for TLSv1 (RFC2817) and
suggest that both methods could be supported. Some it might be easier
for some servers to implement the CONNECT method, while for other
ones the 101 might be better. However, when having to deal with large
infrastructures, it is likely that the CONNECT will be able to pass
through multiple equipment without being degraded, while the GET+101
may be stopped by some IPS, firewalls, load balancers, accelerators
or web application firewalls.

I'm not saying that the 101 is not in the spec, it's just that the
spec is a bit fuzzy on how it may be handled if unused and has made
it possible for many equipments not to specifically implement it due
to lack of users.

I don't know what's your opinion on this.

Best regards,
Willy

Continue reading on narkive:
Loading...