Profile Code performance

Published:
less than 1 minute read

Profile and performance tuning of CPU codes

Separate debug code from compiled executable

  1. Compile with debug information
gcc -O3 -g3 -o program.out program.c

GCC debug options: -g{1, 2, 3} or --gdb{1, 2, 3}

  1. Extract the debug information from the executable
objcopy --only-keep-debug ./program.out ./program.out.debuginfo
  1. Strip the debug information from the executable
strip --strip-debug --strip-unneeded ./program.out
  1. Set executable’s debug information location to be the debug information file:
objcopy --add-gnu-debuglink=./program.out.debuginfo ./program.out

The whole process works even with optimization flags.

Profile memory usage

memusage ./program.out

Creating a bit map:

memusage --data=mem.dat ./program.out
memusagestat mem.dat mem.png
# or
memusagestat -t mem.dat mem.png

Measure using time command

/usr/bin/time -v ./program.out

Get performance stats

Get information including cache miss, page faults, branch prediction, etc.

perf stat ./program.out
# Adding -d option gives more information
perf stat -d -d -d ./program.out
# Add repeat (e.g. 3)
perf stat -d -d -d -r 3 ./program.out