問題描述
perf 關於嵌套函數的最佳結果 (perf top result about nested functions)
我們使用 perf top
來顯示 CPU 使用率。結果顯示了兩個函數
samples pcnt function
‑‑‑‑‑‑ ‑‑‑‑ ‑‑‑‑‑‑‑‑‑
... ... ....
12617.00 6.8% func_outside
8691.00 4.7% func_inside
.....
其實這兩個函數是這樣嵌套的,而且總是1對1嵌套的。
func_outside() {
....
func_inside()
...
}
我是不是應該在perf top
結果,4.7%其實已經包含在6.8%裡面了。如果不計 func_inside 的成本,func_outside 的成本是 2.1%(6.8‑4.7)?
參考解法
方法 1:
Short Answer
No each percentage that is reported is for that specific function only. So the func_inside
samples are not counted in func_outside
Details
The way perf
works is that it periodically collects performance samples. By default perf top
simply checks which function is currently running and then adds that to the sample count for this function.
I was pretty sure this is the case, but wanted to verify that this is how perf top
displays the results so I wrote a quick test program to test its behavior. This program has two functions of interest outer
and inner
. The outer
function calls inner
in a loop, and the amount of work that inner
does is controlled by an argument. When compiling be sure to use O0 to avoid inlining. The command line arguments control the ratio of work between the two functions.
Running with parameters ./a.out 1 1 1000000000
gives results:
49.20% a.out [.] outer
23.69% a.out [.] main
21.32% a.out [.] inner
Running with parameters ./a.out 1 10 1000000000
gives results:
66.06% a.out [.] inner
17.77% a.out [.] outer
9.50% a.out [.] main
Running with parameters ./a.out 1 100 1000000000
gives results:
88.53% a.out [.] inner
2.85% a.out [.] outer
1.09% a.out [.] main
If the count for inner
was included in outer
then the runtime percentage for outer
would always be higher than inner
. But as these results show that is not the case.
The test program I used is below and was compiled with gcc ‑O0 ‑g ‑‑std=c11 test.c
.
#include <stdlib.h>
#include <stdio.h>
long inner(int count) {
long sum = 0;
for(int i = 0; i < count; i++) {
sum += i;
}
return sum;
}
long outer(int count_out, int count_in) {
long sum = 0;
for(int i = 0; i < count_out; i++) {
sum += inner(count_in);
}
return sum;
}
int main(int argc, char **argv) {
if(argc < 4) {
printf("Usage: %s <outer_cnt> <inner_cnt> <loop>\n",argv[0]);
exit(‑1);
}
int outer_cnt = atoi(argv[1]);
int inner_cnt = atoi(argv[2]);
int loops = atoi(argv[3]);
long res = 0;
for(int i = 0; i < loops; i++) {
res += outer(outer_cnt, inner_cnt);
}
printf("res is %ld\n", res);
return 0;
}
(by user3334213、Gabriel Southern)