The UA Campus Repository is experiencing systematic automated, high-volume traffic (bots). Temporary mitigation measures to address bot traffic have been put in place; however, this has resulted in restrictions on searching WITHIN collections or using sidebar filters WITHIN collections. You can still Browse by Title/Author/Year WITHIN collections. Also, you can still search at the top level of the repository (use the search box at the top of every page) and apply filters from that search level. Export of search results has also been restricted at this time. Please contact us at any time for assistance - email repository@u.library.arizona.edu.

Show simple item record

dc.contributor.authorVenkat, Anand
dc.contributor.authorMohammadi, Mahdi Soltan
dc.contributor.authorPark, Jongsoo
dc.contributor.authorRong, Hongbo
dc.contributor.authorBarik, Rajkishore
dc.contributor.authorStrout, Michelle Mills
dc.contributor.authorHall, Mary
dc.date.accessioned2018-12-07T22:17:52Z
dc.date.available2018-12-07T22:17:52Z
dc.date.issued2016
dc.identifier.citationA. Venkat et al., "Automating Wavefront Parallelization for Sparse Matrix Computations," SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, 2016, pp. 480-491. doi: 10.1109/SC.2016.40en_US
dc.identifier.issn978-1-4673-8815-3
dc.identifier.doi10.1109/SC.2016.40
dc.identifier.urihttp://hdl.handle.net/10150/631128
dc.description.abstractThis paper presents a compiler and runtime framework for parallelizing sparse matrix computations that have loop-carried dependences. Our approach automatically generates a runtime inspector to collect data dependence information and achieves wavefront parallelization of the computation, where iterations within a wavefront execute in parallel, and synchronization is required across wavefronts. A key contribution of this paper involves dependence simplification, which reduces the time and space overhead of the inspector. This is implemented within a polyhedral compiler framework, extended for sparse matrix codes. Results demonstrate the feasibility of using automatically-generated inspectors and executors to optimize ILU factorization and symmetric Gauss-Seidel relaxations, which are part of the Preconditioned Conjugate Gradient (PCG) computation. Our implementation achieves a median speedup of 2.97x on 12 cores over the reference sequential PCG implementation, significantly outperforms PCG parallelized using Intel's Math Kernel Library (MKL), and is within 6% of the median performance of manually-parallelized PCG.en_US
dc.description.sponsorshipScientific Discovery through Advanced Computing (SciDAC) program - U.S. Department of Energy Office of Advanced Scientific Computing Research [DE-SC0006947]; NSF [CNS-1302663, CCF-1564074]en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.urlhttp://ieeexplore.ieee.org/document/7877119/en_US
dc.rightsCopyright © 2016, IEEE.en_US
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.titleAutomating Wavefront Parallelization for Sparse Matrix Computationsen_US
dc.typeArticleen_US
dc.contributor.departmentUniv Arizona, Dept Comp Scien_US
dc.identifier.journalSC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSISen_US
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en_US
dc.eprint.versionFinal accepted manuscripten_US
dc.source.beginpage480
dc.source.endpage491
refterms.dateFOA2018-12-07T22:17:53Z


Files in this item

Thumbnail
Name:
Venkat2016.pdf
Size:
478.2Kb
Format:
PDF
Description:
Final Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record