Monday 27 March 2017

Scaling Socket.io to multiple Heroku dynos -- how?

I'm trying to scale Socket.IO to multiple dynos on Heroku. I have determined that I need to use a sticky-session module (for multiple processes), enable session affinity on Heroku, and also use the socket.io-redis adapter + socket.io-emitter modules to enable emitting events to clients connected to another dyno.I was pretty confident of sticking to this approach, until I read this: http://ift.tt/2nZlilL article says that when using Socket.IO, here is all the info that is saved:Handshake data, the handshake data includes ALL request headers, query strings, ip address, information, urls and possible custom data that you’ve added during authorization.Ids of all connections that are open, connected and even closed.Room names and each id that has joined the room.Apparently, this is not all saved in Redis:All this data will be synced through pub/sub to every connected Socket.IO server. So if you have 2 node process and they both use Socket.IO stores they will both have all the data of all connections in their own process memory.This causes the following problem:...it can actually be used as an attack vector against Socket.IO servers which are connected using the Socket.IO stores. The attacker only needs to create a script that initiates a handshake request with your server which connects with the longest query string allowed with as much custom HTTP headers possible and you could easily blow up a server in a matter seconds because all this data is serialized using JSON.stringify which will start blocking the event loop for all the serialization and when you receive such a large packet it also needs to parsed again by JSON.parse and eventually it will be stored in the V8 heap of your node’s process. This will eventually cause it explode out of memory due to the V8 heap limitations of 1.7 GB and your whole cluster will be FUBAR.Does anyone know if these are legitimate problems with using Socket.io on multiple dynos? If so, has this been fixed in a subsequent update?One solution I am considering is to implement similar functionality myself. Every dyno of mine would be subscribed to Redis using a unique channel ID. Redis would also store a mapping of client ID to dyno ID, so I can figure out what dyno a client is connected to. If a dyno wants to send a message to a client connected to another dyno, it would figure out which dyno is connected to the client (using Redis), then publish the event on the corresponding Redis channel. This way, every socket's connection info/overhead is saved on just one dyno, with the whole thing tied together using Redis pub/sub.Any advice would be greatly appreciated!

Submitted March 27, 2017 at 09:15AM by STOP_SCREAMING_AT_ME

No comments:

Post a Comment