Install and configure Scorep and Scalasca

3 minute read

This post explains how to install, configure, and use Scorep and Scalasca on a Linux machine.

Intro to Scorep and Scalasca

Scorep is a profiling tool for parallel programs, and Scalasca is a tool for analyzing the performance of parallel programs. Scalasca uses Scorep to collect the performance data and then analyzes the data to find the bottlenecks. Scorep can be used without Scalasca, but Scalasca needs Scorep to work.

Read more about Scorep and Scalasca here and here.

Some of the configurations were taken from scorep and scalasca documents and here. Also, this cheatsheet is informative.

Installing Scorep and Scalasca, CubeGUI

Download scorep from here.

# Install Scorep

# Download Scorep (for 8.1 version)
tar -xzf scorep-8.1.tar.gz
cd scorep-8.1

mkdir _build
cd _build
../configure --prefix=/path/to/installation/folder
# for example:
../configure --prefix=$PWD

# If you want to enable CUDA support:
../configure --prefix=/path/to/installation/folder --enable-cuda \

make -j8
make install

Download Scalasca from here.

# Install Scalasca

# Download Scalasca (for 2.6.1 version)
tar -xzf scalasca-2.6.1.tar.gz
cd scalasca-2.6.1

mkdir _build
cd _build
../configure --prefix=/path/to/installation/folder

make -j8
make install

Download Cubebundle from here. It has CubeGUI and CubeLib.

Attention: CubeGUI needs qmake from Qt to be installed. I couldn’t manage to enable it on the cluster. So, I installed CubeGUI on my local machine to read the output files locally.

# Install Cube

# Download Cube (for 4.8.2 version)
tar -xzf CubeBundle-4.8.2.tar.gz
cd CubeBundle-4.8.2

mkdir _build
cd _build
../configure --prefix=/path/to/installation/folder

make -j8
make install

Update the PATH and LD_LIBRARY_PATH variables:

export SCOREP_DIR=/path/to/installation/folder
export SCALASCA_DIR=/path/to/installation/folder
export CUBE_DIR=/path/to/installation/folder

Instrument the application

For compiling the application with Scorep, replace mpicc with scorep mpicc, and nvcc with scorep nvcc for CUDA applications.

On a specific cluster that I work with, I need to load GCC and unload Intel modules before loading CUDA. So, in the link time, I could see something like this:

ld: cannot find intel_fast_memcpy  ...

So, I had to link with the specific libraries manually. So, I added these options to LD_FLAGS:

LD_FLAGS = -L /cvmfs/ -lintlc -lirc

# for compilation:

Run the application and collect traces

I still get some errors for some part, but I can see some traces produced. I will update this if I find a solution.

# export SCOREP_FILTERING_FILE=./filter.scorep

# For mpirun options, scan cannot understand them unless you put them in quotes
scan -q -t mpirun "--mca pml ucx -x UCX_TLS=sm,cuda_copy,cuda_ipc" \
    "--mca btl ^vader,tcp,openib" \
    "--mca coll ^hcoll,ahscuda" \
    -np 2 ./program.out 

# for calculating the score
square -s ./scorep_run_trace