0


There are two C++ processes, one thread in each process. The thread handles network traffic (Diameter) from 32 incoming TCP connections, parses it and forwards split messages via 32 outgoing TCP connections. Let's call this C++ process a DiameterFE.
If only one DiameterFE process is running, it can handle 70 000 messages/sec.
If two DiameterFE processes are running, they can handle 35 000 messages/sec each, so the same 70 000 messages/sec in total.
Why don't they scale? What is a bottleneck?

Details: There are 32 Clients (seagull) and 32 servers (seagull) for each Diameter Front End process, running on separate hosts.
A dedicated host is given for these two processes - 2 E5-2670 @ 2.60GHz CPUs x 8 cores/socket x 2 HW threads/core = 32 threads in total.
10 GBit/sec network. Average Diameter message size is 700 bytes.

It looks like only the Cpu0 handles network traffic - 58.7%si. Do I have to explicitly configure different network queues to different CPUs?
The first process (PID=7615) takes 89.0 % CPU, it is running on Cpu0.
The second process (PID=59349) takes 70.8 % CPU, it is running on Cpu8.
On the other hand, Cpu0 is loaded at: 95.2% = 9.7%us + 26.8%sy + 58.7%si,
whereas Cpu8 is loaded only at 70.3% = 14.8%us + 55.5%sy

It looks like the Cpu0 is doing the work also for the second process. There is very high softirq and only on the Cpu0 = 58.7%. Why?

Here is the top output with key "1" pressed:

top - 15:31:55 up 3 days,  9:28,  5 users,  load average: 0.08, 0.20, 0.47
Tasks: 973 total,   3 running, 970 sleeping,   0 stopped,   0 zombie
Cpu0  :  9.7%us, 26.8%sy,  0.0%ni,  4.8%id,  0.0%wa,  0.0%hi, 58.7%si,  0.0%st
...
Cpu8  : 14.8%us, 55.5%sy,  0.0%ni, 29.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
...
Cpu31 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  396762772k total,  5471576k used, 391291196k free,   354920k buffers
Swap:  1048568k total,        0k used,  1048568k free,  2164532k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                      
 7615 test1     20   0 18720 2120 1388 R 89.0  0.0  52:35.76 diameterfe
59349 test1     20   0 18712 2112 1388 R 70.8  0.0 121:02.37 diameterfe                                      
  610 root      20   0 36080 1364 1112 S  2.6  0.0 126:45.58 plymouthd                                      
 3064 root      20   0 10960  788  432 S  0.3  0.0   2:13.35 irqbalance                                      
16891 root      20   0 15700 2076 1004 R  0.3  0.0   0:01.09 top                                      
    1 root      20   0 19364 1540 1232 S  0.0  0.0   0:05.20 init                                      
...
1
  • The fix of this issue was to upgrade the kernel to 2.6.32-431.20.3.el6.x86_64 . <BR> After that network interrupts and message queues are distributed among different CPUs. Commented Jan 16, 2015 at 16:49

1 Answer 1

0

The fix of this issue was to upgrade the kernel to 2.6.32-431.20.3.el6.x86_64 .
After that network interrupts and message queues are distributed among different CPUs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.