問題描述
理解 perf stat 輸出中的數字 (Make sense of numbers in perf stat output)
我一直在嘗試使用 perf 來分析我的運行過程,但我無法理解 perf 輸出的一些數字,這是我使用的命令和我得到的輸出:
$ sudo perf stat ‑x, ‑v ‑e branch‑misses,cpu‑cycles,cache‑misses sleep 1
Using CPUID GenuineIntel‑6‑55‑4
branch‑misses: 7751 444665 444665
cpu‑cycles: 1212296 444665 444665
cache‑misses: 4902 444665 444665
7751,,branch‑misses,444665,100.00,,
1212296,,cpu‑cycles,444665,100.00,,
4902,,cache‑misses,444665,100.00,,
我可以知道數字“444665”發生了什麼事件 代表?
參考解法
方法 1:
‑x
format of perf stat
is described in man page of perf‑stat
, section CSV FORMAT. There is fragment of this man page without optional columns:
CSV FORMAT top
With ‑x, perf stat is able to output a not‑quite‑CSV format output
Commas in the output are not put into "". To make it easy to parse it
is recommended to use a different character like ‑x \;
The fields are in this order:
· counter value
· unit of the counter value or empty
· event name
· run time of counter
· percentage of measurement time the counter was running
Additional metrics may be printed with all earlier fields being
empty.
So, you have value of counter, empty unit of counter, event name, run time, percentage of counter being active (compared to program running time).
By comparing output of these two commands (recommended by Peter Cordes in comment)
perf stat awk 'BEGIN{for(i=0;i<10000000;i++){}}'
perf stat ‑x \; awk 'BEGIN{for(i=0;i<10000000;i++){}}'
I think than run time is nanoseconds for all time this counter was active. When you run perf stat
with non‑conflicting set of events, and there are enough hardware counters to count all required events, run time will be almost total time of profiled program being run on CPU. (Example of too large event set: perf stat ‑x , ‑e cycles,instructions,branches,branch‑misses,cache‑misses,cache‑references,mem‑loads,mem‑stores awk 'BEGIN{for(i=0;i<10000000;i++){}}'
‑ run time will be different for these events, because they were dynamically multiplexed during program execution; and sleep 1
will be too short to have multiplexing to activate.)
For sleep 1
there is very small amount of code to be active on CPU, it is just libc startup code and calling syscall nanosleep
for 1 second (check strace sleep 1
). So in your output 444665 is in ns or is just 444 microseconds or 0.444 milliseconds or 0.000444 seconds of libc startup for sleep 1 process.
If you want to measure whole system activity for one second, try adding ‑a
option of perf stat (profile all processes), optionally with ‑A
to separate events for cpu cores (or with ‑I 100
to have periodic printing):
perf stat ‑a sleep 1
perf stat ‑Aa sleep 1
perf stat ‑a ‑x , sleep 1
perf stat ‑Aa ‑x , sleep 1
(by HUSONG LIU、osgx)