Profile Code performance
Profile and performance tuning of CPU codes
Separate debug code from compiled executable
- Compile with debug information
gcc -O3 -g3 -o program.out program.c
GCC debug options: -g{1, 2, 3}
or --gdb{1, 2, 3}
- Extract the debug information from the executable
objcopy --only-keep-debug ./program.out ./program.out.debuginfo
- Strip the debug information from the executable
strip --strip-debug --strip-unneeded ./program.out
- Set executable’s debug information location to be the debug information file:
objcopy --add-gnu-debuglink=./program.out.debuginfo ./program.out
The whole process works even with optimization flags.
Profile memory usage
memusage ./program.out
Creating a bit map:
memusage --data=mem.dat ./program.out
memusagestat mem.dat mem.png
# or
memusagestat -t mem.dat mem.png
Measure using time command
/usr/bin/time -v ./program.out
Get performance stats
Get information including cache miss, page faults, branch prediction, etc.
perf stat ./program.out
# Adding -d option gives more information
perf stat -d -d -d ./program.out
# Add repeat (e.g. 3)
perf stat -d -d -d -r 3 ./program.out