Task Graph Market

This page gathers a series of task graphs which can be given as input to starpu_replay for replaying real-world applications.

To get starpu_replay, one needs a version of starpu configured with --enable-simgrid . One can then start the different task graph cases. See more details below.

Dense Linear Algebra


Cholesky factorization from the StarPU source code, benchmarked on research platforms.


Dense linear algebra from the Chameleon project, benchmarked on research platforms.

How to run this

./configure --enable-simgrid
make -C src
make -C tools
wget https://files.inria.fr/starpu/market/cholesky.tgz
tar xf cholesky.tgz
cd cholesky
STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec
Read task 14000... done.
Submitted task order 14000... done.
Executed task 11000... done.
9900.77 ms	1976.13 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 10x10/tasks.rec 2> /dev/null
298.476 ms	1121.6 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 20x20/tasks.rec 2> /dev/null
1443.13 ms	1751.45 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 30x30/tasks.rec 2> /dev/null
4357.02 ms	1915.96 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9900.77 ms	1976.13 GF/s

Other scheduling algorithms can be set with STARPU_SCHED:

$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9900.77 ms	1976.13 GF/s
$ STARPU_SCHED=dmdar STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
10506.7 ms	1862.16 GF/s
$ STARPU_SCHED=dmda STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
10510.7 ms	1861.45 GF/s
$ STARPU_SCHED=lws   STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
12403.5 ms	1577.4 GF/s

The scheduling algorithms can be tuned with e.g. STARPU_SCHED_BETA:

$ STARPU_SCHED_BETA=1 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9900.77 ms	1976.13 GF/s
$ STARPU_SCHED_BETA=2 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9895.55 ms	1977.17 GF/s
$ STARPU_SCHED_BETA=10 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
10137.7 ms	1929.94 GF/s
$ STARPU_SCHED_BETA=100 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
13660.1 ms	1432.29 GF/s

The simulation itself is sequential, but you can run several of them in parallel:

( for size in 10 20 30 40 50 60 ; do
STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay ${size}x${size}/tasks.rec 2> /dev/null | sed -e "s/^/$size /" &
done ) | sort

How to generate static scheduling

The examples above were using the StarPU dynamic schedulers. One can inject static scheduling by adding a sched.rec file into the play.

The tasks.rec file is following the recutils format: some paragraphs are separated by an empty line. Each paragraph represents a task to be executed, with a lot of information, some of which is coming from the native execution that was performed when recording the trace:

The performance of tasks on the different execution units can be obtained by running starpu_perfmodel_recdump:

$ STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling starpu_perfmodel_recdump

which first emits in a %rec: timing section a series of paragraphs, one per set of measurements made for the same kind of task on the same data size. Each paragraph contains:

Then the %rec: worker_count section describes the target platform, with one paragraph per kind of execution unit:

Then the %rec: memory_workers section describes the memory layout of the target platform, with one paragraph per memory node:

Workers IDs are numbered starting from 0 and according to the order of the paragraphs in the %rec: worker_count section.

A static schedule can then be expressed by producing a sched.rec file containing one paragraph per task. Each of them must contain a SubmitOrder field containing the submission identifier (as referenced in the SubmitOrder field of tasks.rec). The reason why the JobId is not used is because StarPU may generate internal tasks, which will change job ids. The SubmitOrder, on the contrary, only depends on the application submission loop, and is thus completely stable, making it even possible to inject the static scheduling in a native execution with the real application.

The paragraph can then contain optionally several kinds of scheduling directives, either to force task placement for instance, or to guide the StarPU dynamic scheduler:

For instance, a completely static schedule can be set by setting, for each task, both the SpecificWorker and the Workerorder field, thus respectively specifying for each task on which worker it shall run, and its ordering on that worker. For instance:

SubmitOrder: 0
SpecificWorker: 0
Workerorder: 2

SubmitOrder: 1
SpecificWorker: 1
Workerorder: 0

SubmitOrder: 2
SpecificWorker: 0
Workerorder: 1

will force task 0 and 2 to be executed on worker 0 while task 1 will be executed on worker 1, and 2 will be executed before task 0.

When the SpecificWorker field is set for a task, or its Workers field corresponds to only one memory node, StarPU will automatically prefetch the data during execution. One can however also set prefetches by hand in sched.rec by using a paragraph containing:

This for instance allows not to specify precise task scheduling hints, but provide data prefetch hints which will probably guide the scheduler into a given data placement.