理解 perf stat 輸出中的數字 (Make sense of numbers in perf stat output)

問題描述

我一直在嘗試使用 perf 來分析我的運行過程，但我無法理解 perf 輸出的一些數字，這是我使用的命令和我得到的輸出：

$ sudo perf stat ‑x, ‑v ‑e branch‑misses,cpu‑cycles,cache‑misses  sleep 1
Using CPUID GenuineIntel‑6‑55‑4
branch‑misses: 7751 444665 444665
cpu‑cycles: 1212296 444665 444665
cache‑misses: 4902 444665 444665
7751,,branch‑misses,444665,100.00,,
1212296,,cpu‑cycles,444665,100.00,,
4902,,cache‑misses,444665,100.00,,

我可以知道數字“444665”發生了什麼事件代表？

參考解法

方法 1:

‑x format of perf stat is described in man page of perf‑stat, section CSV FORMAT. There is fragment of this man page without optional columns:

CSV FORMAT         top

       With ‑x, perf stat is able to output a not‑quite‑CSV format output
       Commas in the output are not put into "". To make it easy to parse it
       is recommended to use a different character like ‑x \;

       The fields are in this order:
       ·   counter value
       ·   unit of the counter value or empty
       ·   event name
       ·   run time of counter
       ·   percentage of measurement time the counter was running

       Additional metrics may be printed with all earlier fields being
       empty.

So, you have value of counter, empty unit of counter, event name, run time, percentage of counter being active (compared to program running time).

By comparing output of these two commands (recommended by Peter Cordes in comment)

perf stat  awk 'BEGIN{for(i=0;i<10000000;i++){}}'
perf stat ‑x \; awk 'BEGIN{for(i=0;i<10000000;i++){}}'

I think than run time is nanoseconds for all time this counter was active. When you run perf stat with non‑conflicting set of events, and there are enough hardware counters to count all required events, run time will be almost total time of profiled program being run on CPU. (Example of too large event set: perf stat ‑x , ‑e cycles,instructions,branches,branch‑misses,cache‑misses,cache‑references,mem‑loads,mem‑stores awk 'BEGIN{for(i=0;i<10000000;i++){}}' ‑ run time will be different for these events, because they were dynamically multiplexed during program execution; and sleep 1 will be too short to have multiplexing to activate.)

For sleep 1 there is very small amount of code to be active on CPU, it is just libc startup code and calling syscall nanosleep for 1 second (check strace sleep 1). So in your output 444665 is in ns or is just 444 microseconds or 0.444 milliseconds or 0.000444 seconds of libc startup for sleep 1 process.

If you want to measure whole system activity for one second, try adding ‑a option of perf stat (profile all processes), optionally with ‑A to separate events for cpu cores (or with ‑I 100 to have periodic printing):

perf stat ‑a   sleep 1
perf stat ‑Aa   sleep 1
perf stat ‑a ‑x ,  sleep 1
perf stat ‑Aa ‑x ,  sleep 1

(by HUSONG LIU、osgx)

參考文件

Make sense of numbers in perf stat output (CC BY‑SA 2.5/3.0/4.0)

理解 perf stat 輸出中的數字 (Make sense of numbers in perf stat output)

問題描述

參考解法

方法 1:

參考文件

相關問題

留言討論

理解 perf stat 輸出中的數字 (Make sense of numbers in perf stat output)

問題描述

參考解法

方法 1:

參考文件

相關問題

使用 perf 或其他方式獲取 C 程序的運行時間（或其他統計信息） (Getting running time (or other stats) for C Program using perf or otherwise)

使用 perf_events/oprofile 在 Linux 上分析 JIT 的輸出？ (Profiling output of JIT on Linux with perf_events/oprofile?)

如何提出高緩存未命中率示例？ (How to come up with a high cache miss rate example?)

perf 關於嵌套函數的最佳結果 (perf top result about nested functions)

Haswell 微架構在 perf 中沒有 Stalled-cycles-backend (Haswell microarchitecture don't have Stalled-cycles-backend in perf)

perf 如何使用 offcore 事件？ (How does perf use the offcore events?)

外部化 react 和 react-dom 依賴項是否會增加反應應用程序的加載時間 (Does externalising react and react-dom dependencies give a gain in load time of a react app)

使用 PERF_EVENT_IOC_PERIOD 在運行時更改採樣週期 (Usage of PERF_EVENT_IOC_PERIOD to change sampling period during runtime)

理解 perf stat 輸出中的數字 (Make sense of numbers in perf stat output)

perf_event_open 權限被拒絕，除了使用 sudo 或更改 perf_event_paranoid 文件之外，還有其他方法嗎？ (Permission denied on perf_event_open, is there another way than to use sudo or changing the perf_event_paranoid file?)

有選擇地為特定參數記錄內核 Ftrace 點 (Logging the kernel Ftrace point selectively for particular arguments)

性能計數器和 IMC 計數器不匹配 (Performance Counters and IMC Counter Not Matching)

留言討論