image

APEX: Autonomic Performance Environment for eXascale

One of the key components of the US Department of Energy funded XPRESS project was a new approach to performance observation, measurement, analysis and runtime decision making in order to optimize performance. The particular challenges of accurately measuring the performance characteristics of ParalleX [1] (e.g. HPX) applications (as well as other asynchronous multitasking runtime architectures) requires a new approach to parallel performance observation. The traditional model of multiple operating system processes and threads observing themselves in a first-person manner while writing out performance profiles or traces for offline analysis will not adequately capture the full execution context, nor provide opportunities for runtime adaptation. The approach taken in the completed XPRESS project was a new performance measurement system, called (Autonomic Performance Environment for eXascale). APEX includes methods for information sharing between the layers of the software stack, from the hardware through operating and runtime systems, all the way to domain specific or legacy applications. The performance measurement components incorporate relevant information across stack layers, with merging of third-person performance observation of node-level and global resources, remote processes, and both operating and runtime system threads. For a complete design description of APEX, see the publication "APEX: An Autonomic Performance Environment for eXascale" [3]. Since it's original project, APEX has been extended to support many popular runtime systems [11].

In short, APEX is an introspection and runtime adaptation library for asynchronous multitasking runtime systems. However, APEX is not only useful for AMT/AMR runtimes running on future exascale systems - it can be used by any application wanting to perform runtime adaptation to deal with heterogeneous and/or variable environments.

Introspection

APEX provides an API for measuring actions within a runtime. The API includes methods for timer start/stop, as well as sampled counter values. APEX is designed to be integrated into a runtime, library and/or application and provide performance introspection for the purpose of runtime adaptation. While APEX can provide rudimentary post-mortem performance analysis measurement, there are many other performance measurement tools that perform that task more robustly (such as TAU http://tau.uoregon.edu). That said, APEX includes an event listener that integrates with the TAU measurement system, so APEX events can be forwarded to TAU and collected in a TAU profile and/or trace to be used for post-mortem performance anlaysis.

Runtime Adaptation

APEX provides a mechanism for dynamic runtime behavior, either for autotuning or adaptation to changing environment. The infrastruture that provides the adaptation is the Policy Engine, which executes policies either periodically or triggered by events. The policies have access to the performance state as observed by the APEX introspection API. APEX has several built in search strategies, including exhaustive, random, simulated annealing, and hill climibing. APEX is also integrated with Active Harmony http://www.dyninst.org/harmony to provide dynamic search using the Nelder Mead algorithm.

Citing APEX

Please use the following citation: https://doi.org/10.1109/ESPM256814.2022.00008

  1. Thomas Sterling, Daniel Kogler, Matthew Anderson, and Maciej Brodowicz. "SLOWER: A performance model for Exascale computing". Supercomputing Frontiers and Innovations, 1:42–57, September 2014. http://superfri.org/superfri/article/view/10
  2. Koniges, Alice, Jayashree Ajay Candadai, Hartmut Kaiser, Kevin Huck, Jeremy Kemp, Thomas Heller, Matthew Anderson et al. "HPX Applications and Performance Adaptation". No. SAND2015-8999C. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2015. https://www.osti.gov/servlets/purl/1332791
  3. Kevin A. Huck, Allan Porterfield, Nick Chaimov, Hartmut Kaiser, Allen D. Malony, Thomas Sterling, Rob Fowler. "An Autonomic Performance Environment for eXascale", Journal of Supercomputing Frontiers and Innovations, 2015. http://superfri.org/superfri/article/view/64
  4. Grubel, Patricia, Hartmut Kaiser, Kevin Huck, and Jeanine Cook. "Using intrinsic performance counters to assess efficiency in task-based parallel applications." In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1692-1701. IEEE, 2016. https://www.cs.uoregon.edu/research/paracomp/papers/ipdps16/hpcmaspa2016.pdf
  5. Bari, Md Abdullah Shahneous, Nicholas Chaimov, Abid M. Malik, Kevin A. Huck, Barbara Chapman, Allen D. Malony, and Osman Sarood. "Arcs: Adaptive runtime configuration selection for power-constrained openmp applications." In 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 461-470. IEEE, 2016. https://www.cs.uoregon.edu/research/paracomp/papers/cluster16/arcs.pdf
  6. Tohid, R., Bibek Wagle, Shahrzad Shirzad, Patrick Diehl, Adrian Serio, Alireza Kheirkhahan, Parsa Amini et al. "Asynchronous execution of python code on task-based runtime systems." In 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp. 37-45. IEEE, 2018. http://hdc.cs.arizona.edu/papers/espm2_2018_phylanx.pdf
  7. Heller, Thomas, Bryce Adelstein Lelbach, Kevin A. Huck, John Biddiscombe, Patricia Grubel, Alice E. Koniges, Matthias Kretz et al. "Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars." The International Journal of High Performance Computing Applications 33, no. 4 (2019): 699-715. https://journals.sagepub.com/doi/full/10.1177/1094342018819744
  8. Wagle, Bibek, Mohammad Alaul Haque Monil, Kevin Huck, Allen D. Malony, Adrian Serio, and Hartmut Kaiser. "Runtime adaptive task inlining on asynchronous multitasking runtime systems." In Proceedings of the 48th International Conference on Parallel Processing, pp. 1-10. 2019. https://dl.acm.org/doi/abs/10.1145/3337821.3337915
  9. Daiß, Gregor, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin Huck, Hartmut Kaiser, Dominic Marcello, David Pfander, and Dirk Pfüger. "From piz daint to the stars: simulation of stellar mergers using high-level abstractions." In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-37. 2019. https://arxiv.org/abs/1908.03121
  10. Steven R. Brandt, Alex Bigelow, Sayef Azad Sakin, Katy Williams, Katherine E. Isaacs, Kevin Huck, Rod Tohid, Bibek Wagle, Shahrzad Shirzad, and Hartmut Kaiser. 2020. "JetLag: An Interactive, Asynchronous Array Computing Environment". In Practice and Experience in Advanced Research Computing (PEARC '20). Association for Computing Machinery, New York, NY, USA, 8–12. DOI: https://doi.org/10.1145/3311790.3396657
  11. Kevin A. Huck, "Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX," 2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), Dallas, TX, USA, 2022, pp. 20-29. https://doi.org/10.1109/ESPM256814.2022.00008