Inter-loop optimizations in RAJA using loop chains
| dc.contributor.author | Neth, B. | |
| dc.contributor.author | Scogland, T.R.W. | |
| dc.contributor.author | de Supinski, B.R. | |
| dc.contributor.author | Strout, M.M. | |
| dc.date.accessioned | 2021-06-24T23:50:56Z | |
| dc.date.available | 2021-06-24T23:50:56Z | |
| dc.date.issued | 2021-06 | |
| dc.identifier.citation | Neth, B., Scogland, T. R. W., de Supinski, B. R., & Strout, M. M. (2021). Inter-loop optimizations in RAJA using loop chains. Proceedings of the International Conference on Supercomputing, 1–12. | en_US |
| dc.identifier.uri | http://hdl.handle.net/10150/660339 | |
| dc.description.abstract | Typical parallelization approaches such as OpenMP and CUDA provide constructs for parallelizing and blocking for data locality for individual loops. By focusing on each loop separately, these approaches fail to leverage sources of data locality possible due to inter-loop data reuse. The loop chain abstraction provides a framework for reasoning about and applying inter-loop optimizations. In this work, we incorporate the loop chain abstraction into RAJA, a performance portability layer for high-performance computing applications. Using the loop-chain-extended RAJA, or RAJALC, developers can have the RAJA library apply loop transformations like loop fusion and overlapped tiling while maintaining the original structure of their programs. By introducing targeted symbolic evaluation capabilities, we can collect and cache data access information required to verify loop transformations. We evaluate the performance improvement and refactoring costs of our extension. Overall, our results demonstrate 85-98% of the performance improvements of hand-optimized kernels with dramatically fewer code changes. © 2021 Association for Computing Machinery. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Association for Computing Machinery | en_US |
| dc.rights | © 2021 Association for Computing Machinery. | en_US |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en_US |
| dc.subject | C++ | en_US |
| dc.subject | Data locality | en_US |
| dc.subject | Loop chains | en_US |
| dc.subject | Performance portability | en_US |
| dc.subject | Polyhedral analysis | en_US |
| dc.subject | RAJA | en_US |
| dc.subject | Symbolic execution | en_US |
| dc.title | Inter-loop optimizations in RAJA using loop chains | en_US |
| dc.type | Article | en_US |
| dc.contributor.department | University of Arizona | en_US |
| dc.identifier.journal | Proceedings of the International Conference on Supercomputing | en_US |
| dc.description.note | Immediate access | en_US |
| dc.description.collectioninformation | This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu. | en_US |
| dc.eprint.version | Final accepted manuscript | en_US |
| refterms.dateFOA | 2021-06-24T23:50:57Z |
