Gpu-based and streaming-enabled implementation of pre-processing flow towards enhancing optical character recognition accuracy and efficiency
Name:
Gpu-based and streaming-enabled ...
Size:
3.390Mb
Format:
PDF
Description:
Final Accepted Manuscript
Affiliation
Department of Bioethics and Medical Humanism, College of Medicine-Phoenix, University of ArizonaIssue Date
2023-09-20
Metadata
Show full item recordPublisher
Springer Science and Business Media LLCCitation
Serhan, G., Parker, D., Dhruv, G., Alexander, F., & Ali, A. (2023). Gpu-based and streaming-enabled implementation of pre-processing flow towards enhancing optical character recognition accuracy and efficiency. Cluster Computing, 1-13.Journal
Cluster ComputingRights
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Research has demonstrated that digital images can be pre-processed through operations such as scaling, rotation, and blurring to enhance the accuracy of optical character recognition (OCR) by emphasizing important features within the image. Our study employed the open-source Tesseract OCR and found that accuracy can be improved through pre-processing techniques including thresholding, rotation, rescaling, erosion, dilation, and noise removal, based on a dataset of 560 phone screen images. However, our CPU-based implementation of this process resulted in an average latency of 48.32 ms per image, which can hinder the processing of millions of images using OCR. To address this challenge, we parallelized the pre-processing flow on the Nvidia P100 GPU and executed it through a streaming approach, which reduced the latency to 0.825 ms and achieved a speedup factor of 58.6x compared to the serial execution. This implementation enables the use of a GPU-based OCR engine to handle multiple sources of data streams with large-scale workloads.Note
12 month embargo; first published 20 September 2023ISSN
1386-7857EISSN
1573-7543Version
Final accepted manuscriptSponsors
National Science Foundationae974a485f413a2113503eed53cd6c53
10.1007/s10586-023-04137-0