Tuesday, 26 May 2020

CPU-bound or I/O-bound approach with clustering in a high-stress environment?

Hi!I'm building a web app and we are facing major performance issues, and high CPU overload and stress.In this web app we use Express, Handlebars, Passport, Sequelize with MariaDB, node-redis with Redis and some others modules. We have an i9 9900K, 8 cores/16 threads, and we run pm2 cluster on 16 threads. That should be a powerful machine, but I don't know why there's some bottleneck which causes the CPU to get to 80%-90% or even 100% when 1000 or 2000 requests happen in like 50 seconds. Profiling can't help me, even StackOverflow can't, and I don't know how to handle it anymore, it's over 1 week that I'm trying to fix this CPU stress. Some people on StackOverflow agree with me that what my web app does is lots of data processing, but really simple and shouldn't be stressing this much the CPU.In the entire environment I have chosen to do this approach:Get data from Redis (or query it if doesn't exist) -> JSON.parse -> filter json array with Array.filter and other common array operations to get specific data.Now, we have to get very specific data and a lot of times, at least using those operations 10 times per request, on small or medium arrays of objects, from 6 entries up to 30k entries, hanging from Array.filter to Array.sort to Array.some and Array.find.This happens LOOOOTS of time, and we have to handle 3000 concurrent requests sometimes. This causes high CPU stress and latency, even if I managed to reduce a bit the event loop latency by inspecting it with blocked-at package. Also, looks like, from profiling, that a lot of processing is spent in garbage collecting. That looks like a memory leak.They always say that Node is I/O-bound and not CPU-bound due to event loop. So I thought this may not be the better approach to handle with this kind of data, and it would be better to avoid using lots of Array operations and just do SQL query which are I/O and not CPU.Is it right? Should I use more database queries and caching, instead of doing simple queries and handling large data with array operations? Won't lots of database queries stress my CPU anyway? Will it be better to do 3-4 non-cached queries each request or do 10-15 operations each request on cached queries?Thanks for helping and sorry for such a stupid question, probably, but really I don't know what to do anymore.

Submitted May 26, 2020 at 02:55PM by DanielVip3

No comments:

Post a Comment