Publications

A good overview is available in the following Research Report ‘StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines’.

How to cite StarPU
Please reference StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures for a general presentation.
Bibtex Entry Paper

General Presentations

Samuel Thibault. On Runtime Systems for Task-based Programming on Heterogeneous Platforms. Habilitation à diriger des recherches, Université de Bordeaux, December 2018.
Cédric Augonnet. Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System’s Perspective. PhD thesis, Université de Bordeaux, December 2011.
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. CCPE - Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009, 23:187-198, February 2011. [doi:10.1002/cpe.1631]
Cédric Augonnet, Samuel Thibault, and Raymond Namyst. StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines. Research Report RR-7240, INRIA, March 2010.
Cédric Augonnet. StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes. In 19èmes Rencontres Francophones du Parallélisme, Toulouse, France, September 2009. Note: Best Paper Award.
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. In Euro-Par - 15th International Conference on Parallel Processing, volume 5704 of LNCS, Delft, The Netherlands, pages 863-874, August 2009. Springer. [doi:10.1007/978-3-642-03869-3_80]
Cédric Augonnet. Vers des supports d’exécution capables d’exploiter les machines multicoeurs hétérogènes. Master Thesis, Université de Bordeaux, June 2008.
Cédric Augonnet and Raymond Namyst. A unified runtime system for heterogeneous multicore architectures. In Proceedings of the International Euro-Par Workshops 2008, HPPC’08, volume 5415 of LNCS, Las Palmas de Gran Canaria, Spain, pages 174-183, August 2008. Springer. ISBN: 978-3-642-00954-9. [doi:10.1007/978-3-642-00955-6_22]

On Composability

Andra-Ecaterina Hugo. Composability of parallel codes on heterogeneous architectures. Ph.D Thesis, Université de Bordeaux, December 2014.
Andra Hugo, Abdou Guermouche, Pierre-André Wacrenier, and Raymond Namyst. Composing multiple StarPU applications over heterogeneous machines: A supervised approach. International Journal of High Performance Computing Applications, 28:285 - 300, February 2014. [doi:10.1177/1094342014527575]
A.-E Hugo, A Guermouche, P.-A Wacrenier, and R Namyst. A runtime approach to dynamic resource allocation for sparse direct solvers. In 43rd International Conference on Parallel Processing, Minneapolis, United States, September 2014. [doi:10.1109/ICPP.2014.57]
Andra Hugo. Le problème de la composition parallèle : une approche supervisée. In 21èmes Rencontres Francophones du Parallélisme (RenPar’21), Grenoble, France, January 2013.
Andra Hugo, Abdou Guermouche, Raymond Namyst, and Pierre-André Wacrenier. Composing multiple StarPU applications over heterogeneous machines: a supervised approach. In Third International Workshop on Accelerators and Hybrid Exascale Systems, Boston, USA, May 2013. [doi:10.1177/1094342014527575]
Andra Hugo. Composabilité de codes parallèles sur architectures hétérogènes. Master Thesis, Université de Bordeaux, June 2011.

On Parallel Tasks

Terry Cojean. Programmation of heterogeneous architectures using moldable tasks. Ph.D Thesis, Université de Bordeaux, March 2018.
Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst, and Pierre-André Wacrenier. Resource aggregation for task-based Cholesky Factorization on top of modern architectures. Parallel Computing, 83:73-92, November 2016. Note: This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops. [doi:10.1016/j.parco.2018.10.007]
Olivier Beaumont, Terry Cojean, Lionel Eyraud-Dubois, Abdou Guermouche, and Suraj Kumar. Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources. In International Conference on High Performance Computing, Data, and Analytics (HiPC), Hyderabad, India, December 2016. [doi:10.1109/HiPC.2016.045]
Terry Cojean. Exploiting Two-Level Parallelism by Aggregating Computing Resources in Task-Based Applications Over Accelerator-Based Machines. In SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP 2016), Paris, France, April 2016.
Terry Cojean. The StarPU Runtime System at Exascale ?. In RESPA workshop at SC16, Salt Lake City, Utah, United States, November 2016.
Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst, and Pierre-André Wacrenier. Resource aggregation for task-based Cholesky Factorization on top of heterogeneous machines. In HeteroPar’2016 workshop of Euro-Par, Grenoble, France, August 2016.
Terry Cojean, Abdou Guermouche, Andra-Ecaterina Hugo, Raymond Namyst, and Pierre-André Wacrenier. Resource aggregation in task-based applications over accelerator-based multicore machines. In HeteroPar’2016 worshop of Euro-Par, Grenoble, France, August 2016.

On Recursive Tasks

Mathieu Faverge, Nathalie Furmento, Abdou Guermouche, Gwenolé Lucas, Raymond Namyst, Samuel Thibault, and Pierre-André Wacrenier. Programming Heterogeneous Architectures Using Hierarchical Tasks. Concurrency and Computation: Practice and Experience, 2023. [doi:10.1002/cpe.7811]
Samuel Thibault. Vector operations, tiled operations, distributed execution, task graphs, … What next ?. In 15th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop, Talence, France, March 2023.
Mathieu Faverge, Nathalie Furmento, Abdou Guermouche, Gwenolé Lucas, Raymond Namyst, Samuel Thibault, and Pierre-André Wacrenier. Programming Heterogeneous Architectures Using Hierarchical Tasks. In HeteroPar 2022, Glasgow, United Kingdom, pages 12, August 2022. [doi:10.1007/978-3-031-31209-0_7]
Mathieu Faverge, Nathalie Furmento, Abdou Guermouche, Gwenolé Lucas, Samuel Thibault, and Pierre-André Wacrenier. Programmation des architectures hétérogènes à l’aide de tâches hiérarchiques. In COMPAS 2022 - Conférence francophone d’informatique en Parallélisme, Architecture et Système, Amiens, France, July 2022.
Mathieu Faverge, Nathalie Furmento, Gwenolé Lucas, Abdou Guermouche, Raymond Namyst, Samuel Thibault, and Pierre-André Wacrenier. Programming Heterogeneous Architectures Using Hierarchical Tasks. Research Report RR-9466, Inria Bordeaux Sud-Ouest, March 2022.
Arthur Chevalier. Critical resources management and scheduling under StarPU. Master Thesis, Université de Bordeaux, September 2017.
Terry Cojean. The StarPU Runtime System at Exascale ?. In RESPA workshop at SC16, Salt Lake City, Utah, United States, November 2016.

On Scheduling

Maxime Gonthier, Loris Marchal, and Samuel Thibault. Taming data locality for task scheduling under memory constraint in runtime systems. Future Generation Computer Systems, 2023. [doi:10.1016/j.future.2023.01.024]
Maxime Gonthier, Samuel Thibault, and Loris Marchal. Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems. In IPDPS 2022 - 36th IEEE International Parallel & Distributed Processing Symposium, Lyon, France, May 2022. IEEE. [doi:10.1109/IPDPS53621.2022.00073]
Maxime Gonthier, Loris Marchal, and Samuel Thibault. Locality-Aware Scheduling of Independent Tasks for Runtime Systems. In COLOC: 5th workshop on data locality - 7th International European Conference on Parallel and Distributed Computing Workshops, Lisbon, Portugal, August 2021. [doi:10.1007/978-3-031-06156-1_1]
Maxime Gonthier, Loris Marchal, and Samuel Thibault. Locality-Aware Scheduling of Independant Tasks for Runtime Systems. Research Report RR-9394, Inria, 2021.
Bérenger Bramas. Impact study of data locality on task-based applications through the Heteroprio scheduler. PeerJ Computer Science, May 2019. [doi:10.7717/peerj-cs.190]
Christophe Alias, Samuel Thibault, and Laure Gonnord. A Compiler Algorithm to Guide Runtime Scheduling. Research Report RR-9315, INRIA Grenoble ; INRIA Bordeaux, December 2019.
Suraj Kumar. Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources. PhD thesis, Université de Bordeaux, April 2017.
O. Beaumont, L. Eyraud-Dubois, and S. Kumar. Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 768-777, May 2017. [doi:10.1109/IPDPS.2017.71]
Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, and Suraj Kumar. Are Static Schedules so Bad ? A Case Study on Cholesky Factorization. In Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS’16, Chicago, IL, USA, May 2016. IEEE. [doi:10.1109/IPDPS.2016.90]
Johan Janzén, David Black-Schaffer, and Andra Hugo. Partitioning GPUs for Improved Scalability. In IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), October 2016. [doi:10.1109/SBAC-PAD.2016.14]
Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal, and Samuel Thibault. Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms. In HCW’2015 - Heterogeneity in Computing Workshop of IPDPS, Hyderabad, India, May 2015. [doi:10.1109/IPDPSW.2015.35]
Marc Sergent and Simon Archipoff. Modulariser les ordonnanceurs de tâches : une approche structurelle. In Compas’2014, Neuchâtel, Suisse, April 2014.
Cédric Augonnet, Jérôme Clet-Ortega, Samuel Thibault, and Raymond Namyst. Data-Aware Task Scheduling on Multi-Accelerator based Platforms. In The 16th International Conference on Parallel and Distributed Systems (ICPADS), Shanghai, China, December 2010. [doi:10.1109/ICPADS.2010.129]

On Performance Visualization

Alexandre Denis, Emmanuel Jeannot, Philippe Swartvagher, and Samuel Thibault. Tracing task-based runtime systems: Feedbacks from the StarPU case. Concurrency and Computation: Practice and Experience, pp 24, October 2023. [doi:10.1002/cpe.7920]
Lucas Leandro Nesi, Vinicius Garcia Pinto, Lucas Mello Schnorr, and Arnaud Legrand. Summarizing task-based applications behavior over many nodes through progression clustering. In PDP 2023 - 31st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Naples, Italy, pages 1-8, March 2023. [doi:10.1109/PDP59025.2023.00014]
Marcelo Cogo Miletto, Lucas Leandro Nesi, Lucas Mello Schnorr, and Arnaud Legrand. Performance Analysis of Irregular Task-Based Applications on Hybrid Platforms: Structure Matters. Future Generation Computer Systems, 135, October 2022. [doi:10.1016/j.future.2022.05.013]
Vinicius Garcia Pinto, Lucas Leandro Nesi, Marcelo Cogo Miletto, and Lucas Mello Schnorr. Providing In-depth Performance Analysis for Heterogeneous Task-based Applications with StarVZ. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021. [doi:10.1109/IPDPSW52791.2021.00013]
Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic, and Lucas Mello Schnorr. Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, pages 142-151, May 2019. IEEE. [doi:10.1109/CCGRID.2019.00025]
Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault, and Vincent Danjean. A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters. CCPE - Concurrency and Computation: Practice and Experience, 30, April 2018. [doi:10.1002/cpe.4472]
Vinicius Garcia Pinto, Lucas Mello Schnorr, Arnaud Legrand, Samuel Thibault, Luka Stanisic, and Vincent Danjean. Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Hìbridos. In WPerformance - 17o Workshop em Desempenho de Sistemas Computacionais e de Comunicação, Natal, Brazil, July 2018.
Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault, and Vincent Danjean. Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach. In VPA - 3rd Workshop on Visual Performance Analysis, Salt Lake City, USA, November 2016. Note: Held in conjunction with SC16. [doi:10.1109/VPA.2016.008]

On The C Extensions

Ludovic Courtès. C Language Extensions for Hybrid CPU/GPU Programming with StarPU. Research Report RR-8278, INRIA, April 2013.

On OpenMP Support on top of StarPU

Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud, and Samuel Pitoiset. Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method. IEEE Transactions on Parallel and Distributed Systems, April 2017. [doi:10.1109/TPDS.2017.2697857]
Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud, and Samuel Pitoiset. Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method. Research Report RR-8953, Inria, March 2016.
Philippe Virouleau, Pierrick Brunet, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage, and Thierry Gautier. Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite. In IWOMP2014 - 10th International Workshop on OpenMP, Salvador, Brazil, pages 16 - 29, September 2014. Springer. [doi:10.1007/978-3-319-11454-5_2]

On MPI Support

Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Julien Herrmann, and Antoine Jego. Task-based parallel programming for scalable matrix product algorithms. ACM Transactions on Mathematical Software, 2023. [doi:10.1145/3583560]
Romain Lion. Réplication de données pour la tolérance aux pannes dans un support d’exécution distribué à base de tâches. Theses, Université de Bordeaux, December 2022.
Philippe Swartvagher. On the Interactions between HPC Task-based Runtime Systems and Communication Libraries. Theses, Université de Bordeaux, November 2022.
Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M Ciorba, Nathan Debardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N Gansterer, Luc Giraud, Dominik Göddeke, Marco Heisig, Fabienne Jézéquel, Nils Kohl, Sherry Xiaoye, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S Quintana-Ortì, Francesco Rizzi, Ulrich Rüde, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thönnes, Andreas Wagner, and Barbara Wohlmuth. Resiliency in numerical algorithm design for extreme scale simulations. International Journal of High Performance Computing Applications, September 2021. [doi:10.1177/10943420211055188]
Alexandre Denis, Emmanuel Jeannot, Philippe Swartvagher, and Samuel Thibault. Using Dynamic Broadcasts to improve Task-Based Runtime Performances. In Euro-Par - 26th International European Conference on Parallel and Distributed Computing, Warsaw, Poland, August 2020. Rzadca and Malawski, Springer. [doi:10.1007/978-3-030-57675-2_28]
Romain Lion and Samuel Thibault. From tasks graphs to asynchronous distributed checkpointing with local restart. In 2020 IEEE/ACM 10th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), Atlanta, USA, November 2020. [doi:10.1109/FTXS51974.2020.00009]
Romain Lion. Tolérance aux pannes dans l’exécution distribuée de graphes de tâches. In Conférence d’informatique en Parallélisme, Architecture et Système, Anglet, France, June 2019.
Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent, and Samuel Thibault. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model. TPDS - IEEE Transactions on Parallel and Distributed Systems, December 2017. [doi:10.1109/TPDS.2017.2766064]
Marc Sergent. Scalability of a task-based runtime system for dense linear algebra applications. PhD thesis, Université de Bordeaux, December 2016.
Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent, and Samuel Thibault. Harnessing clusters of hybrid nodes with a sequential task-based programming model. In 8th International Workshop on Parallel Matrix Algorithms and Applications, July 2014.
Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Samuel Thibault, and Raymond Namyst. StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators. Research Report RR-8538, INRIA, May 2014.
Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Raymond Namyst, and Samuel Thibault. StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators. In Siegfried Benkner Jesper Larsson Träff and Jack Dongarra, editors, EuroMPI 2012, volume 7490 of LNCS, September 2012. Springer. Note: Poster Session.

On Memory Control

Arthur Chevalier. Critical resources management and scheduling under StarPU. Master Thesis, Université de Bordeaux, September 2017.
Marc Sergent, David Goudin, Samuel Thibault, and Olivier Aumage. Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System. In HIPS - 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments, Chicago, USA, May 2016. [doi:10.1109/IPDPSW.2016.105]

On Performance Model Tuning

Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic, and Samuel Thibault. Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method. Research Report RR-9036, INRIA Bordeaux, February 2017.
Cédric Augonnet, Samuel Thibault, and Raymond Namyst. Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures. In HPPC - Proceedings of the International Euro-Par Workshops, Highly Parallel Processing on a Chip, volume 6043 of LNCS, Delft, The Netherlands, pages 56-65, August 2009. Springer. [doi:10.1007/978-3-642-14122-5_9]

On The Simulation Support through SimGrid

Idriss Daoudi, Philippe Virouleau, Thierry Gautier, Samuel Thibault, and Olivier Aumage. sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects. In IWOMP 2020 - 16th International Workshop on OpenMP, volume 12295 of LNCS, Austin, USA, September 2020. Springer. [doi:10.1007/978-3-030-58144-2_13]
Samuel Thibault, Luka Stanisic, and Arnaud Legrand. Faithful Performance Prediction of a Dynamic Task-based Runtime System, an Opportunity for Task Graph Scheduling. In SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP 2020), Seattle, USA, February 2020.
Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau, and Jean-François Méhaut. Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures. CCPE - Concurrency and Computation: Practice and Experience, pp 16, May 2015. [doi:10.1002/cpe.3555]
Luka Stanisic, Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Florent Lopez, and Brice Videau. Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers. In The 21st IEEE International Conference on Parallel and Distributed Systems, Melbourne, Australia, December 2015. [doi:10.1109/ICPADS.2015.67]
Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau, and Jean-François Méhaut. Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures. In Euro-Par - 20th International Conference on Parallel Processing, Porto, Portugal, August 2014. Springer-Verlag. [doi:10.1007/978-3-319-09873-9_5]
Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau, and Jean-François Méhaut. Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures. Research Report RR-8509, INRIA, March 2014.

On The Cell Support

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Maik Nijhuis. Exploiting the Cell/BE architecture with the StarPU unified runtime system. In SAMOS Workshop - International Workshop on Systems, Architectures, Modeling, and Simulation, volume 5657 of LNCS, Samos, Greece, July 2009. [doi:10.1007/978-3-642-03138-0_36]

Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Julien Herrmann, and Antoine Jego. Task-Based Parallel Programming for Scalable Algorithms: application to Matrix Multiplication. Research Report 9461, Inria Bordeaux - Sud-Ouest, February 2022.
Alexandre Denis, Emmanuel Jeannot, and Philippe Swartvagher. Interferences between Communications and Computations in Distributed HPC Systems. In ICPP 2021 - 50th International Conference on Parallel Processing, Chicago / Virtual, United States, pages 11, August 2021. [doi:10.1145/3472456.3473516]
Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick McCormick, and Alex Aiken. Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ‘20, 2020. IEEE Press. ISBN: 9781728199986. [doi:10.5555/3433701.3433783]
Peter Thoman, Kiril Dichev, Thomas Heller, Roman Iakymchuk, Xavier Aguilar, Khalid Hasanov, Philipp Gschwandtner, Pierre Lemarinier, Stefano Markidis, Herbert Jordan, and others. A taxonomy of task-based parallel programming technologies for high-performance computing. The Journal of Supercomputing, 74(4):1422-1434, 2018. [doi:10.1007/s11227-018-2238-4]
I. D. Mironescu and L. Vintan. Coloured Petri Net modelling of task scheduling on a heterogeneous computational node. In 2014 IEEE 10th International Conference on Intelligent Computer Communication and Processing (ICCP), pages 323-330, 2014. [doi:10.1109/ICCP.2014.6937016]

On Applications

Emmanuel Agullo, Olivier Coulaud, Alexandre Denis, Mathieu Faverge, Alain A. Franc, Jean-Marc Frigerio, Nathalie Furmento, Samuel Thibault, Adrien Guilbaud, Emmanuel Jeannot, Romain Peressoni, and Florent Pruvost. Task-based randomized singular value decomposition and multidimensional scaling. Research Report 9482, Inria Bordeaux - Sud Ouest ; Inrae - BioGeCo, September 2022.
Lazaros Papadopoulos, Dimitrios Soudris, Christoph Kessler, August Ernstsson, Johan Ahlqvist, Nikos Vasilas, Athanasios I Papadopoulos, Panos Seferlis, Charles Prouveur, Matthieu Haefele, Samuel Thibault, Athanasios Salamanis, Theodoros Ioakimidis, and Dionysios Kehagias. EXA2PRO: A Framework for High Development Productivity on Heterogeneous Computing Systems. IEEE Transactions on Parallel and Distributed Systems, August 2021. [doi:10.1109/TPDS.2021.3104257]
Rafael Alvares da Silva Lopes, Samuel Thibault, and Alba Cristina Magalhães Alves de Melo. MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning. In SBAC-PAD 2020 - IEEE 32nd International Symposium on Computer Architecture and High Performance Computing, Porto, Portugal, September 2020. [doi:10.1109/SBAC-PAD49847.2020.00039]
Georgios Tzanos, Vineet Soni, Charles Prouveur, Matthieu Haefele, Stavroula Zouzoula, Lazaros Papadopoulos, Samuel Thibault, Nicolas Vandenbergen, Dirk Pleiter, and Dimitrios Soudris. Applying StarPU runtime system to scientific applications: Experiences and lessons learned. In Parallel Optimization using/for Multi and Many-core High Performance Computing (POMCO), Barcelona, Spain, December 2020.
A. AlOnazi, H. Ltaief, D. Keyes, I. Said, and Samuel Thibault. Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry. In 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, USA, pages 1-11, September 2019. IEEE. [doi:10.1109/CLUSTER.2019.8891054]
Essadki, Mohamed, Jung, Jonathan, Larat, Adam, Pelletier, Milan, and Perrier, Vincent. A Task-Driven Implementation of a Simple Numerical Solver for Hyperbolic Conservation Laws. ESAIM: ProcS, 63:228-247, 2018. [doi:10.1051/proc/201863228]
Dimitrios Soudris, Lazaros Papadopoulos, Christoph W Kessler, Dionysios D Kehagias, Athanasios Papadopoulos, Panos Seferlis, Alexander Chatzigeorgiou, Apostolos Ampatzoglou, Samuel Thibault, Raymond Namyst, Dirk Pleiter, Georgi Gaydadjiev, Tobias Becker, and Matthieu Haefele. EXA2PRO programming environment. In SAMOS XVIII: Architectures, Modeling, and Simulation, Pythagorion, Greece, pages 202-209, July 2018. ACM. [doi:10.1145/3229631.3239369]
Jean Marie Couteyen Carpaye, Jean Roman, and Pierre Brenner. Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping. International Journal of Computational Science and Engineering, pp 1 - 22, 2017. [doi:10.1016/j.jocs.2017.03.008]
Olivier Aumage, Julien Bigot, HÃ©lÃ¨ne Coullon, Christian PÃ©rez, and JÃ©rÃ´me Richard. Combining Both a Component Model and a Task-Based Model for HPC Applications: A Feasibility Study on GYSELA. In 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pages 635-644, 2017. [doi:10.1109/CCGRID.2017.88]
Emmanuel Agullo, Alfredo Buttari, Mikko Byckling, Abdou Guermouche, and Ian Masliah. Achieving high-performance with a sparse direct solver on Intel KNL. Research Report RR-9035, Inria Bordeaux Sud-Ouest ; CNRS-IRIT ; Intel corporation ; Université Bordeaux, February 2017.
Nolwenn Balin, Guillaume Sylvand, and Jérôme Robert. Fast methods applied to BEM solvers for acoustic propagation problems. In 22nd AIAA/CEAS Aeroacoustics Conference, pages 2712, 2016. [doi:10.2514/6.2016-2712]
Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Martin Khannouz, and Luka Stanisic. Task-based fast multipole method for clusters of multicore processors. Research Report RR-8970, Inria Bordeaux Sud-Ouest, October 2016.
E Agullo, L Giraud, A Guermouche, S Nakov, and Jean Roman. Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures. Research Report 8912, Inria Bordeaux Sud-Ouest, May 2016.
Corentin Rossignon. A fine grain model programming for parallelization of sparse linear solver. PhD thesis, Université de Bordeaux, July 2015.
Vìctor Martìnez, David Michéa, Fabrice Dupros, Olivier Aumage, Samuel Thibault, Hideo Aochi, and Philippe Olivier Alexandre Navaux. Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system. In SBAC-PAD - 27th International Symposium on Computer Architecture and High Performance Computing, Florianopolis, Brazil, October 2015. [doi:10.1109/SBAC-PAD.2015.33]
Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, and Toru Takahashi. Task-Based FMM for Multicore Architectures. SIAM Journal on Scientific Computing, 36(1):66-93, 2014. [doi:10.1137/130915662]
Sylvain Henry, Alexandre Denis, Denis Barthou, Marie-Christine Counilh, and Raymond Namyst. Toward OpenCL Automatic Multi-Device Support. In Fernando Silva, Ines Dutra, and Vitor Santos Costa, editors, Euro-Par 2014, Porto, Portugal, August 2014. Springer. [doi:10.1007/978-3-319-09873-9_65]
Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault, and George Bosilca. Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes. In HCW’2014 - Heterogeneity in Computing Workshop of IPDPS, Phoenix, USA, May 2014. IEEE. Note: RR-8446. [doi:10.1109/IPDPSW.2014.9]
Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, and Toru Takahashi. Task-based FMM for heterogeneous architectures. Research Report RR-8513, Inria Bordeaux - Sud-Ouest, April 2014.
Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault, and George Bosilca. Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes. Research Report RR-8446, INRIA, January 2014.
Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent, and Samuel Thibault. Overview of Distributed Linear Algebra on Hybrid Nodes over the StarPU Runtime. SIAM Conference on Parallel Processing for Scientific Computing, February 2014.
Cyril Bordage. Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme. PhD thesis, Université de Bordeaux, December 2013.
Sylvain Henry. Modèles de programmation et supports exécutifs pour architectures hétérogènes. PhD thesis, Université de Bordeaux, November 2013.
Sylvain Henry. ViperVM: a Runtime System for Parallel Functional High-Performance Computing on Heterogeneous Architectures. In 2nd Workshop on Functional High-Performance Computing (FHPC’13), Boston, USA, September 2013. [doi:10.1145/2502323.2502329]
Tetsuya Odajima, Taisuke Boku, Mitsuhisa Sato, Toshihiro Hanawa, Yuetsu Kodama, Raymond Namyst, Samuel Thibault, and Olivier Aumage. Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing. In ICA3PP-2013 - The 13th International Conference on Algorithms and Architectures for Parallel Processing, Vietri sul Mare, Italy, December 2013. [doi:10.1007/978-3-319-03889-6_7]
Satoshi Ohshima, Satoshi Katagiri, Kengo Nakajima, Samuel Thibault, and Raymond Namyst. Implementation of FEM Application on GPU with StarPU. In SIAM CSE13 - SIAM Conference on Computational Science and Engineering 2013, Boston, USA, February 2013. SIAM.
Corentin Rossignon. Optimisation du produit matrice-vecteur creux sur architecture GPU pour un simulateur de reservoir. In 21èmes Rencontres Francophones du Parallélisme (RenPar’21), Grenoble, France, January 2013.
Corentin Rossignon, Pascal Hénon, Olivier Aumage, and Samuel Thibault. A NUMA-aware fine grain parallelization framework for multi-core architecture. In PDSEC - 14th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - 2013, Boston, USA, May 2013. [doi:10.1109/IPDPSW.2013.204]
Sylvain Henry, Denis Barthou, Alexandre Denis, Raymond Namyst, and Marie-Christine Counilh. SOCL: An OpenCL Implementation with Automatic Multi-Device Adaptation Support. Research Report RR-8346, INRIA, August 2013.
Sylvain Henry, Alexandre Denis, and Denis Barthou. Programmation unifiée multi-accélérateur OpenCL. Techniques et Sciences Informatiques, (8-9-10):1233-1249, 2012.
Sidi Ahmed Mahmoudi, Pierre Manneback, Cédric Augonnet, and Samuel Thibault. Traitements d’Images sur Architectures Parallèles et Hétérogènes. Technique et Science Informatiques, 31(8-10):1183-1203, 2012. [doi:10.3166/tsi.31.1183-1203]
Siegfried Benkner, Enes Bajrovic, Erich Marth, Martin Sandrieser, Raymond Namyst, and Samuel Thibault. High-Level Support for Pipeline Parallelism on Many-Core Architectures. In Euro-Par - 18th International Conference on Parallel Processing, Rhodes Island, Greece, August 2012. [doi:10.1007/978-3-642-32820-6_61]
Cyril Bordage. Parallelization on Heterogeneous Multicore and Multi-GPU Systems of the Fast Multipole Method for the Helmholtz Equation Using a Runtime System. In ADVCIMP12, ADVCOMP 2012, The Sixth International Conference on Advanced Engineering Computing and Applications in Sciences, Barcelone, Spain, pages 90-95, September 2012. IARIA.
Christoph Kessler, Usman Dastgeer, Samuel Thibault, Raymond Namyst, Andrew Richards, Uwe Dolinsky, Siegfried Benkner, Jesper Larsson Träff, and Sabri Pllana. Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems. In DATE - Design, Automation and Test in Europe, Dresden, Deutschland, March 2012. ISBN: 978-3-9810801-8-6. [doi:10.1109/DATE.2012.6176582]
Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Uwe Dolinsky, Cédric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney, and Vitaly Osipov. PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. IEEE Micro, 31(5):28-41, September 2011. ISSN: 0272-1732. [doi:10.1109/MM.2011.67]
Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, and Stanimire Tomov. LU factorization for accelerator-based systems. In 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), Sharm El-Sheikh, Egypt, June 2011. [doi:10.1109/AICCSA.2011.6126599]
Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Mathieu Faverge, Hatem Ltaief, Samuel Thibault, and Stanimire Tomov. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators. In 25th IEEE International Parallel & Distributed Processing Symposium (IEEE IPDPS 2011), Anchorage, Alaska, USA, May 2011. [doi:10.1109/IPDPS.2011.90]
Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Andrew Richards, Raymond Namyst, Beverly Bachmayer, Christoph Kessler, David Moloney, and Peter Sanders. The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures. In ParCo, Ghent, Belgium, August 2011.
Usman Dastgeer, Christoph Kessler, and Samuel Thibault. Flexible runtime support for efficient skeleton programming on hybrid systems. In ParCo - Proceedings of the International Conference on Parallel Computing, volume 22 of Advances of Parallel Computing, Gent, Belgium, pages 159-166, August 2011. [doi:10.3233/978-1-61499-041-3-159]
Sylvain Henry. Programmation multi-accélérateurs unifiée en OpenCL. In 20èmes Rencontres Francophones du Parallélisme (RenPar’20), Saint Malo, France, May 2011.
Sidi Ahmed Mahmoudi, Pierre Manneback, Cédric Augonnet, and Samuel Thibault. Détection optimale des coins et contours dans des bases d’images volumineuses sur architectures multicoeurs hétérogènes. In RenPar’20 - 20èmes Rencontres Francophones du Parallélisme, Saint-Malo, France, May 2011.
Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault, and Stanimire Tomov. A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2. Morgan Kaufmann, September 2010. [doi:10.1016/B978-0-12-385963-1.00034-4]
Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Jean Roman, Samuel Thibault, and Stanimire Tomov. Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators. In SAAHPC - Symposium on Application Accelerators in High Performance Computing, Knoxville, USA, July 2010.

Publications

General Presentations

On Composability

On Parallel Tasks

On Recursive Tasks

On Scheduling

On Performance Visualization

On The C Extensions

On OpenMP Support on top of StarPU

On MPI Support

On Memory Control

On Performance Model Tuning

On The Simulation Support through SimGrid

On The Cell Support

Papers related to StarPU

On Applications