I have a bit of a weird problem which happens when testing a Node.js application deployed inside a Docker container, into AWS when while load testing it using Gatling (load testing suite) it decides after a short period of time at high load to stop servicing any requests and looks like it's 'freezing'.This doesn't happen when testing locally.So far I've triedlooking for all blocking code and trying to develop my way out of the problem by re-engineering things. Found a load of things I've improved but no 'ahaa' moments.Using V8 tick logs to identify how much time the CPU spends doing various things. No smoking guns found.Using heap dumps to look at memory leaks and object usage. Nothing obvious there.Looking at garbage collection logs and output to look for large garbage collection timings. No luck.These pauses look like they take 2000-3000 milliseconds to complete by which time our server healthchecks think that the underlying application is not working and removes it from a load balancer. The healthchecks just try a /healthcheck router path on the Express application and ensure there is a response.So - without knowing 'what' is causing the problem, are there any other ways to at least identify why the Node process might be in an idle/paused state and not doing anything?It's weird how it can take massive amounts of load in a local environment, but for non obvious reasons in a deployed environment acts differently, and does not identify why. Obviously there are ram/cpu/resource differences but I'd like to at least have Node.js tell me for what reason it's decided to give up the ghost temporarily.
Submitted September 28, 2017 at 09:33AM by voxcast
No comments:
Post a Comment