Running multiple processes doesn't scale

Question

There are two C++ processes, one thread in each process. The thread handles network traffic (Diameter) from 32 incoming TCP connections, parses it and forwards split messages via 32 outgoing TCP connections. Let's call this C++ process a DiameterFE.
If only one DiameterFE process is running, it can handle 70 000 messages/sec.
If two DiameterFE processes are running, they can handle 35 000 messages/sec each, so the same 70 000 messages/sec in total.
Why don't they scale? What is a bottleneck?

Details: There are 32 Clients (seagull) and 32 servers (seagull) for each Diameter Front End process, running on separate hosts.
A dedicated host is given for these two processes - 2 E5-2670 @ 2.60GHz CPUs x 8 cores/socket x 2 HW threads/core = 32 threads in total.
10 GBit/sec network. Average Diameter message size is 700 bytes.

It looks like only the Cpu0 handles network traffic - 58.7%si. Do I have to explicitly configure different network queues to different CPUs?
The first process (PID=7615) takes 89.0 % CPU, it is running on Cpu0.
The second process (PID=59349) takes 70.8 % CPU, it is running on Cpu8.
On the other hand, Cpu0 is loaded at: 95.2% = 9.7%us + 26.8%sy + 58.7%si,
whereas Cpu8 is loaded only at 70.3% = 14.8%us + 55.5%sy

It looks like the Cpu0 is doing the work also for the second process. There is very high softirq and only on the Cpu0 = 58.7%. Why?

Here is the top output with key "1" pressed:

top - 15:31:55 up 3 days,  9:28,  5 users,  load average: 0.08, 0.20, 0.47
Tasks: 973 total,   3 running, 970 sleeping,   0 stopped,   0 zombie
Cpu0  :  9.7%us, 26.8%sy,  0.0%ni,  4.8%id,  0.0%wa,  0.0%hi, 58.7%si,  0.0%st
...
Cpu8  : 14.8%us, 55.5%sy,  0.0%ni, 29.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
...
Cpu31 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  396762772k total,  5471576k used, 391291196k free,   354920k buffers
Swap:  1048568k total,        0k used,  1048568k free,  2164532k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                      
 7615 test1     20   0 18720 2120 1388 R 89.0  0.0  52:35.76 diameterfe
59349 test1     20   0 18712 2112 1388 R 70.8  0.0 121:02.37 diameterfe                                      
  610 root      20   0 36080 1364 1112 S  2.6  0.0 126:45.58 plymouthd                                      
 3064 root      20   0 10960  788  432 S  0.3  0.0   2:13.35 irqbalance                                      
16891 root      20   0 15700 2076 1004 R  0.3  0.0   0:01.09 top                                      
    1 root      20   0 19364 1540 1232 S  0.0  0.0   0:05.20 init                                      
...

The fix of this issue was to upgrade the kernel to 2.6.32-431.20.3.el6.x86_64 . <BR> After that network interrupts and message queues are distributed among different CPUs. — Neighbour
– Neighbour, Commented Jan 16, 2015 at 16:49

Neighbour · Accepted Answer · 2015-01-16 16:51:06Z

0

The fix of this issue was to upgrade the kernel to 2.6.32-431.20.3.el6.x86_64 .
After that network interrupts and message queues are distributed among different CPUs.

answered Jan 16, 2015 at 16:51

Neighbour

9710 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Running multiple processes doesn't scale

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related