perf 關於嵌套函數的最佳結果 (perf top result about nested functions)


問題描述

perf 關於嵌套函數的最佳結果 (perf top result about nested functions)

我們使用 perf top 來顯示 CPU 使用率。結果顯示了兩個函數

samples    pcnt    function
‑‑‑‑‑‑     ‑‑‑‑    ‑‑‑‑‑‑‑‑‑
...        ...     ....
12617.00   6.8%    func_outside
 8691.00   4.7%    func_inside
.....

其實這兩個函數是這樣嵌套的,而且總是1對1嵌套的。

func_outside() {
  ....
  func_inside() 
  ... 
}

我是不是應該在perf top 結果,4.7%其實已經包含在6.8%裡面了。如果不計 func_inside 的成本,func_outside 的成本是 2.1%(6.8‑4.7)?


參考解法

方法 1:

Short Answer

No each percentage that is reported is for that specific function only. So the func_inside samples are not counted in func_outside

Details

The way perf works is that it periodically collects performance samples. By default perf top simply checks which function is currently running and then adds that to the sample count for this function.

I was pretty sure this is the case, but wanted to verify that this is how perf top displays the results so I wrote a quick test program to test its behavior. This program has two functions of interest outer and inner. The outer function calls inner in a loop, and the amount of work that inner does is controlled by an argument. When compiling be sure to use O0 to avoid inlining. The command line arguments control the ratio of work between the two functions.

Running with parameters ./a.out 1 1 1000000000 gives results:

49.20%  a.out             [.] outer    
23.69%  a.out             [.] main    
21.32%  a.out             [.] inner    

Running with parameters ./a.out 1 10 1000000000 gives results:

66.06%  a.out             [.] inner    
17.77%  a.out             [.] outer    
 9.50%  a.out             [.] main    

Running with parameters ./a.out 1 100 1000000000 gives results:

88.53%  a.out             [.] inner    
 2.85%  a.out             [.] outer    
 1.09%  a.out             [.] main    

If the count for inner was included in outer then the runtime percentage for outer would always be higher than inner. But as these results show that is not the case.

The test program I used is below and was compiled with gcc ‑O0 ‑g ‑‑std=c11 test.c.

#include <stdlib.h>
#include <stdio.h>

long inner(int count) {
  long sum = 0;
  for(int i = 0; i < count; i++) {
    sum += i;
  }
  return sum;

}

long outer(int count_out, int count_in) {
  long sum = 0;
  for(int i = 0; i < count_out; i++) {
    sum += inner(count_in);
  }
  return sum;
}

int main(int argc, char **argv)  {
  if(argc < 4) {
    printf("Usage: %s <outer_cnt> <inner_cnt> <loop>\n",argv[0]);
    exit(‑1);
  }

  int outer_cnt = atoi(argv[1]);
  int inner_cnt = atoi(argv[2]);
  int loops     = atoi(argv[3]);

  long res = 0;
  for(int i = 0; i < loops; i++) {
    res += outer(outer_cnt, inner_cnt);
  }

  printf("res is %ld\n", res);
  return 0;
}

(by user3334213Gabriel Southern)

參考文件

  1. perf top result about nested functions (CC BY‑SA 2.5/3.0/4.0)

#perf #performance #linux






相關問題

使用 perf 或其他方式獲取 C 程序的運行時間(或其他統計信息) (Getting running time (or other stats) for C Program using perf or otherwise)

使用 perf_events/oprofile 在 Linux 上分析 JIT 的輸出? (Profiling output of JIT on Linux with perf_events/oprofile?)

如何提出高緩存未命中率示例? (How to come up with a high cache miss rate example?)

perf 關於嵌套函數的最佳結果 (perf top result about nested functions)

Haswell 微架構在 perf 中沒有 Stalled-cycles-backend (Haswell microarchitecture don't have Stalled-cycles-backend in perf)

perf 如何使用 offcore 事件? (How does perf use the offcore events?)

外部化 react 和 react-dom 依賴項是否會增加反應應用程序的加載時間 (Does externalising react and react-dom dependencies give a gain in load time of a react app)

使用 PERF_EVENT_IOC_PERIOD 在運行時更改採樣週期 (Usage of PERF_EVENT_IOC_PERIOD to change sampling period during runtime)

理解 perf stat 輸出中的數字 (Make sense of numbers in perf stat output)

perf_event_open 權限被拒絕,除了使用 sudo 或更改 perf_event_paranoid 文件之外,還有其他方法嗎? (Permission denied on perf_event_open, is there another way than to use sudo or changing the perf_event_paranoid file?)

有選擇地為特定參數記錄內核 Ftrace 點 (Logging the kernel Ftrace point selectively for particular arguments)

性能計數器和 IMC 計數器不匹配 (Performance Counters and IMC Counter Not Matching)







留言討論