Install and configure Scorep and Scalasca

Published:
3 minute read

This post explains how to install, configure, and use Scorep and Scalasca on a Linux machine.

Intro to Scorep and Scalasca

Scorep is a profiling tool for parallel programs, and Scalasca is a tool for analyzing the performance of parallel programs. Scalasca uses Scorep to collect the performance data and then analyzes the data to find the bottlenecks. Scorep can be used without Scalasca, but Scalasca needs Scorep to work.

Read more about Scorep and Scalasca here and here.

Some of the configurations were taken from scorep and scalasca documents and here. Also, this cheatsheet is informative.

Installing Scorep and Scalasca, CubeGUI

Download scorep from here.

# Install Scorep

# Download Scorep (for 8.1 version)
wget https://perftools.pages.jsc.fz-juelich.de/cicd/scorep/tags/scorep-8.1/scorep-8.1.tar.gz
tar -xzf scorep-8.1.tar.gz
cd scorep-8.1

mkdir _build
cd _build
../configure --prefix=/path/to/installation/folder
# for example:
../configure --prefix=$PWD

# If you want to enable CUDA support:
../configure --prefix=/path/to/installation/folder --enable-cuda \
 --with-libcudart=$CUDA_HOME

make -j8
make install

Download Scalasca from here.

# Install Scalasca

# Download Scalasca (for 2.6.1 version)
wget https://apps.fz-juelich.de/scalasca/releases/scalasca/2.6/dist/scalasca-2.6.1.tar.gz
tar -xzf scalasca-2.6.1.tar.gz
cd scalasca-2.6.1

mkdir _build
cd _build
../configure --prefix=/path/to/installation/folder

make -j8
make install

Download Cubebundle from here. It has CubeGUI and CubeLib.

Attention: CubeGUI needs qmake from Qt to be installed. I couldn’t manage to enable it on the cluster. So, I installed CubeGUI on my local machine to read the output files locally.

# Install Cube

# Download Cube (for 4.8.2 version)
wget http://apps.fz-juelich.de/scalasca/releases/cube/4.8/dist/CubeBundle-4.8.2.tar.gz
tar -xzf CubeBundle-4.8.2.tar.gz
cd CubeBundle-4.8.2

mkdir _build
cd _build
../configure --prefix=/path/to/installation/folder

make -j8
make install

Update the PATH and LD_LIBRARY_PATH variables:

export SCOREP_DIR=/path/to/installation/folder
export SCALASCA_DIR=/path/to/installation/folder
export CUBE_DIR=/path/to/installation/folder
export PATH=$SCOREP_DIR/bin:$SCALASCA_DIR/bin:$CUBE_DIR/bin:$PATH
export LD_LIBRARY_PATH=$SCOREP_DIR/lib:$SCALASCA_DIR/lib:$CUBE_DIR/lib:$LD_LIBRARY_PATH

Instrument the application

For compiling the application with Scorep, replace mpicc with scorep mpicc, and nvcc with scorep nvcc for CUDA applications.

On a specific cluster that I work with, I need to load GCC and unload Intel modules before loading CUDA. So, in the link time, I could see something like this:

ld: cannot find intel_fast_memcpy  ...

So, I had to link with the specific libraries manually. So, I added these options to LD_FLAGS:

LD_FLAGS = -L /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/lib/intel64/ -lintlc -lirc

# for compilation:
$(MPICXX) $(CXX_FLAGS) $(LD_FLAGS) -o $(EXECUTABLE) $(OBJECTS) $(LIBS)

Run the application and collect traces

I still get some errors for some part, but I can see some traces produced. I will update this if I find a solution.

export SCOREP_EXPERIMENT_DIRECTORY="$REPORTS_DIR/scorep_run_trace"
export SCOREP_ENABLE_PROFILING=true
export SCOREP_ENABLE_TRACING=true
export SCOREP_TOTAL_MEMORY=1000MB
# export SCOREP_FILTERING_FILE=./filter.scorep

# For mpirun options, scan cannot understand them unless you put them in quotes
scan -q -t mpirun "--mca pml ucx -x UCX_TLS=sm,cuda_copy,cuda_ipc" \
    "--mca btl ^vader,tcp,openib" \
    "--mca coll ^hcoll,ahscuda" \
    -np 2 ./program.out 

# for calculating the score
square -s ./scorep_run_trace