StarPU Micro Benchmarks

Table of Contents

Tasks Overhead

  • This is the time to submit a task, from the main thread, with a tag dependency:

    tasks_overhead_per_task_submit.png

  • This is the time to submit a task, from the main thread, with one data dependency:

    tasks_overhead_per_task_submit_1.png

  • This is the time to execute an empty tasks, with a tag dependency:

    tasks_overhead_per_task_execution.png

  • This is the time to execute an empty tasks, with one data:

    tasks_overhead_per_task_execution_1.png

  • This is the total time to submit & execute an empty task, with a tag dependency:

    tasks_overhead_per_task_submit_execution.png

  • This is the total time to submit & execute an empty task, with one data dependency:

    tasks_overhead_per_task_submit_execution_1.png

Synchronous Tasks Overhead

  • This is the total time to submit & execute a synchronous task:

    sync_tasks_overhead_per_task.png

  • This is the total time to submit & execute a synchronous task, with one data dependency:

    sync_tasks_overhead_per_task_1.png

Asynchronous Tasks Overhead

  • This is the total time to submit & execute an asynchronous task without dependencies:

    async_tasks_overhead_per_task.png

  • This is the total time to submit & execute an asynchronous task with one data dependency:

    async_tasks_overhead_per_task_1.png

Bandwidth Overhead

  • This shows the memcpy speed obtained by an increasing number of cores in parallel. The main reference is "alone" where other cores are completely idle (no thread). Secondary references are "nop" and "sync" where other cores are doing resp. "rep; nop", and actively checking for a global termination variable. The other curves show the case when other cores are running the scheduler looking for tasks to perform.
  • There are two sets of curves: the lower ones are obtained by filling the system contiguously. The upper ones ("interleaved") are obtained by skipping every other core, thus more quickly spanning over multiple NUMA nodes and achieving more bandwidth, but also leaving idle every other core, possibly disturbing the memcpy transfer.

bandwidth.svg

Tasks size Overhead

  • This shows the speedup of running small tasks sizes on 60 cores of a 64-core machine. The highest curve (in blue) is for 4096µs tasks, the next curve (in green) is for 2048µs tasks, the next curve (in purple) is for 1024µs tasks, etc.

tasks_size_overhead_total.png

  • eager
tasks_size_overhead_total_eager.png tasks_size_overhead_eager.png
  • modular-eager-prefetching
tasks_size_overhead_total_modular-eager-prefetching.png tasks_size_overhead_modular-eager-prefetching.png
  • modular-eager
tasks_size_overhead_total_modular-eager.png tasks_size_overhead_modular-eager.png
  • prio
tasks_size_overhead_total_prio.png tasks_size_overhead_prio.png
  • modular-prio-prefetching
tasks_size_overhead_total_modular-prio-prefetching.png tasks_size_overhead_modular-prio-prefetching.png
  • modular-prio
tasks_size_overhead_total_modular-prio.png tasks_size_overhead_modular-prio.png
  • modular-eager-prio
tasks_size_overhead_total_modular-eager-prio.png tasks_size_overhead_modular-eager-prio.png
  • ws
tasks_size_overhead_total_ws.png tasks_size_overhead_ws.png
  • modular-ws
tasks_size_overhead_total_modular-ws.png tasks_size_overhead_modular-ws.png
  • lws
tasks_size_overhead_total_lws.png tasks_size_overhead_lws.png
  • graph_test
tasks_size_overhead_total_graph_test.png tasks_size_overhead_graph_test.png
  • dm
tasks_size_overhead_total_dm.png tasks_size_overhead_dm.png
  • dmda
tasks_size_overhead_total_dmda.png tasks_size_overhead_dmda.png
  • dmdar
tasks_size_overhead_total_dmdar.png tasks_size_overhead_dmdar.png
  • dmdap
tasks_size_overhead_total_dmdap.png tasks_size_overhead_dmdap.png
  • dmdas
tasks_size_overhead_total_dmdas.png tasks_size_overhead_dmdas.png
  • modular-heft2
tasks_size_overhead_total_modular-heft2.png tasks_size_overhead_modular-heft2.png
  • modular-heft
tasks_size_overhead_total_modular-heft.png tasks_size_overhead_modular-heft.png
  • modular-heft-prio
tasks_size_overhead_total_modular-heft-prio.png tasks_size_overhead_modular-heft-prio.png
  • modular-heteroprio
tasks_size_overhead_total_modular-heteroprio.png tasks_size_overhead_modular-heteroprio.png
  • modular-gemm
tasks_size_overhead_total_modular-gemm.png tasks_size_overhead_modular-gemm.png
  • dmdasd
tasks_size_overhead_total_dmdasd.png tasks_size_overhead_dmdasd.png
  • heteroprio
tasks_size_overhead_total_heteroprio.png tasks_size_overhead_heteroprio.png
  • random
tasks_size_overhead_total_random.png tasks_size_overhead_random.png
  • modular-random
tasks_size_overhead_total_modular-random.png tasks_size_overhead_modular-random.png
  • modular-random-prefetching
tasks_size_overhead_total_modular-random-prefetching.png tasks_size_overhead_modular-random-prefetching.png
  • modular-random-prio
tasks_size_overhead_total_modular-random-prio.png tasks_size_overhead_modular-random-prio.png
  • modular-random-prio-prefetching
tasks_size_overhead_total_modular-random-prio-prefetching.png tasks_size_overhead_modular-random-prio-prefetching.png
  • peager
tasks_size_overhead_total_peager.png tasks_size_overhead_peager.png
  • pheft
tasks_size_overhead_total_pheft.png tasks_size_overhead_pheft.png
  • eager

tasks_size_overhead_total_2048_eager.png tasks_size_overhead_total_1024_eager.png

  • ws

tasks_size_overhead_total_512_ws.png tasks_size_overhead_total_256_ws.png

  • heft

tasks_size_overhead_total_2048_heft.png tasks_size_overhead_total_1024_heft.png

  • random

tasks_size_overhead_total_2048_random.png tasks_size_overhead_total_1024_random.png

  • misc

tasks_size_overhead_total_2048_misc.png tasks_size_overhead_total_1024_misc.png

Registering a Matrix as a Vector

matrix_as_vector_STARPU_CPU.png

matrix_as_vector_STARPU_CPU_size_1024.png matrix_as_vector_STARPU_CPU_size_2048.png matrix_as_vector_STARPU_CPU_size_4096.png matrix_as_vector_STARPU_CPU_size_8192.png matrix_as_vector_STARPU_CPU_size_16384.png matrix_as_vector_STARPU_CPU_size_32768.png matrix_as_vector_STARPU_CPU_size_65536.png matrix_as_vector_STARPU_CPU_size_131072.png matrix_as_vector_STARPU_CPU_size_262144.png matrix_as_vector_STARPU_CPU_size_524288.png matrix_as_vector_STARPU_CPU_size_1048576.png

Author: gitlab-starpu

Created: 2024-04-26 Fri 14:52

Validate