Task Graph Market
This page gathers a series of task graphs which can be given as input
to starpu_replay
for replaying real-world applications.
To get starpu_replay
, one needs a version of starpu configured with
--enable-simgrid
. One can then start the different task graph
cases. See more details below.
Dense Linear Algebra
Cholesky
Cholesky factorization from the StarPU source code, benchmarked on research platforms.
Chameleon
Dense linear algebra from the Chameleon project, benchmarked on research platforms.
How to run this
- First install simgrid. On Debian-based systems you can simply
install the
libsimgrid-dev
andlibboost-dev
packages. - Download the latest 1.3 branch nightly snapshot of StarPU.
- Compile it with simgrid support enabled (no need to build it all,
src/
andtools/
is enough):
cd $STARPU
./configure --enable-simgrid
make -C src
make -C tools
- Download one of the examples from the market above, for instance:
wget https://files.inria.fr/starpu/market/cholesky.tgz
tar xf cholesky.tgz
cd cholesky
- See its
README
file to see some execution examples, and try them, for instance:
STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec
- Which yields:
Read task 14000... done.
Submitted task order 14000... done.
Executed task 11000... done.
9900.77 ms 1976.13 GF/s
-
You can re-run it as many times as desired, the resulting performance will always be the same.
Other matrix sizes can be set with the different
tasks.rec
files:
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 10x10/tasks.rec 2> /dev/null
298.476 ms 1121.6 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 20x20/tasks.rec 2> /dev/null
1443.13 ms 1751.45 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 30x30/tasks.rec 2> /dev/null
4357.02 ms 1915.96 GF/s
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9900.77 ms 1976.13 GF/s
Other scheduling algorithms can be set with STARPU_SCHED
:
$ STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9900.77 ms 1976.13 GF/s
$ STARPU_SCHED=dmdar STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
10506.7 ms 1862.16 GF/s
$ STARPU_SCHED=dmda STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
10510.7 ms 1861.45 GF/s
$ STARPU_SCHED=lws STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
12403.5 ms 1577.4 GF/s
The scheduling algorithms can be tuned with e.g. STARPU_SCHED_BETA
:
$ STARPU_SCHED_BETA=1 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9900.77 ms 1976.13 GF/s
$ STARPU_SCHED_BETA=2 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
9895.55 ms 1977.17 GF/s
$ STARPU_SCHED_BETA=10 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
10137.7 ms 1929.94 GF/s
$ STARPU_SCHED_BETA=100 STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay 40x40/tasks.rec 2> /dev/null
13660.1 ms 1432.29 GF/s
The simulation itself is sequential, but you can run several of them in parallel:
( for size in 10 20 30 40 50 60 ; do
STARPU_SCHED=dmdas STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling $STARPU/tools/starpu_replay ${size}x${size}/tasks.rec 2> /dev/null | sed -e "s/^/$size /" &
done ) | sort
How to generate static scheduling
The examples above were using the StarPU dynamic schedulers. One can inject
static scheduling by adding a sched.rec
file into the play.
The tasks.rec
file is following the recutils format: some paragraphs
are separated by an empty line. Each paragraph represents a task to be executed,
with a lot of information, some of which is coming from the native execution
that was performed when recording the trace:
- The
Model
field identifies the performance model to be used (see more in the next paragraph). - The
JobId
field uniquely identifies the task. - The
SubmitOrder
field also uniquely identifies the task, but according to task submission, which is thus stable. - The
DependsOn
field provides the list of the identifiers of the tasks that this task depends on. - The
Priority
field provides a priority as set by the application (higher is more urging). - The
WorkerId
field provides the worker on which the task was executed when the trace was recorded. - The
MemoryNode
field provides the corresponding memory node on which the task was executed. - The
SubmitTime
field provides the time when the task was submitted by the application. The scheduler usually does not care about this. - The
StartTime
andEndTime
fields provides the time when the task was started and finished. - The
GFlop
field provides the number of billions of floating-point operations performed by the task. - The
Parameters
field provides a description of the task parameters. - The
Handles
field provides the pointers of the task parameters. These can be used to relate data input and output of tasks. - The
Modes
field provides the access mode of the task parameters: (R)ead-only, (R)ead-and-(W)rite, or (W)rite-only. - The
Sizes
field provides the size of the task parameters, in bytes.
The performance of tasks on the different execution units can be obtained by
running starpu_perfmodel_recdump
:
$ STARPU_HOSTNAME=mirage STARPU_PERF_MODEL_DIR=$PWD/sampling starpu_perfmodel_recdump
which first emits in a %rec: timing
section a series of paragraphs,
one per set of measurements made for the same kind of task on the same data
size. Each paragraph contains:
- The
Name
field which is the name of the performance model, as referenced by theModel
field in a task paragraph. - The
Architecture
field describes the architecture on which the set of measurement was made - The
Footprint
field describes the data description footprint, as referenced by theFootprint
field in a task paragraph. It is roughly a summary of the task parameters’ sizes. - The
Size
field provides the total task parameters’ size in bytes. - The
Flops
field provides the number of floating-point operations that were performed by the task. - The
Mean
field provides the average of the measurements in the set. - The
Stddev
field provides the standard deviation of the measurements in the set. - The
Samples
field provides the number of measurements that were made.
Then the %rec: worker_count
section describes the target platform, with
one paragraph per kind of execution unit:
- The
Architecture
field provides the name of the type of execution unit, as referenced in theArchitecture
field of the paragraphs mentioned above. - The
NbWorkers
field provides the number of workers of this type.
Then the %rec: memory_workers
section describes the memory layout of
the target platform, with one paragraph per memory node:
- The
MemoryNode
field provides the memory node number. - The
Name
field provides a user-friendly name for the memory node. - The
Size
field provide the amount of available space in the memory node (-1 if it is considered unbound). - The
Workers
field provides the list of worker IDs using the memory node.
Workers IDs are numbered starting from 0 and according to the order of the
paragraphs in the %rec: worker_count
section.
A static schedule can then be expressed by producing a sched.rec
file
containing one paragraph per task. Each of them must contain a
SubmitOrder
field containing the submission identifier (as referenced
in the SubmitOrder
field of tasks.rec
). The reason why the
JobId
is not used is because StarPU may generate internal tasks, which
will change job ids. The SubmitOrder
, on the contrary, only depends on
the application submission loop, and is thus completely stable, making it even
possible to inject the static scheduling in a native execution with the real
application.
The paragraph can then contain optionally several kinds of scheduling directives, either to force task placement for instance, or to guide the StarPU dynamic scheduler:
Priority
will override the application-provided priority, and possibly be taken into account by a StarPU dynamic scheduler.SpecificWorker
specifies the worker ID on which this task will be executed.Workers
provides a list of workers that the task will be allowed to execute on. This thus allows to restrict the execution location, without necessarily deciding it completely, e.g. to specify a given memory node or worker type.DependsOn
provides a list of submission IDs of tasks that this task should be made to depend on, in addition to the dependencies set intasks.rec
. This thus allows to inject artificial dependencies in the task graph.Workerorder
allows to force a specific ordering of tasks on a given worker (whose ID must be also set withSpecificWorker
). The tasks will be executed in the provided contiguous order, i.e. the worker will wait for task with workerorder 1 to be submitted, then execute it, then wait for task with workerorder 2 to be submitted, then execute it, etc. For a given worker, the Workerorder fields of the tasks made to be executed on it thus have to be strictly contiguous, starting from 1.
For instance, a completely static schedule can be set by setting, for each task,
both the SpecificWorker
and the Workerorder
field, thus
respectively specifying for each task on which worker it shall run, and its
ordering on that worker. For instance:
SubmitOrder: 0
SpecificWorker: 0
Workerorder: 2
SubmitOrder: 1
SpecificWorker: 1
Workerorder: 0
SubmitOrder: 2
SpecificWorker: 0
Workerorder: 1
will force task 0 and 2 to be executed on worker 0 while task 1 will be executed on worker 1, and 2 will be executed before task 0.
When the SpecificWorker
field is set for a task, or its
Workers
field corresponds to only one memory node, StarPU will
automatically prefetch the data during execution. One can however also set
prefetches by hand in sched.rec
by using a paragraph containing:
- A
Prefetch
field which specifies the submission ID of the task for which data should be prefetched. - A
MemoryNode
field which specifies the memory node on which data should be prefetched. - A
Parameters
field which specifies the indexes of the task parameters that should be prefetched (currently only one at a time is supported). - An optional
DependsOn
field to make this prefetch wait for tasks, whose submission IDs are provided.
This for instance allows not to specify precise task scheduling hints, but provide data prefetch hints which will probably guide the scheduler into a given data placement.