AffiliationUniv Arizona, Dept Comp Sci
MetadataShow full item record
PublisherAMER LIBRARY ASSOC
CitationHan, Y., & Wan, X. (2018). Digitization of Text Documents Using PDF/A. Information Technology and Libraries, 37(1), 52-64.
RightsCopyright © 2018 Information Technology and Libraries
Collection InformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at email@example.com.
AbstractThe purpose of this article is to demonstrate a practical use case of PDF/A for digitization of text documents following FADGI's recommendation of using PDF/A as a preferred digitization file format. The authors demonstrate how to convert and combine TIFFs with associated metadata into a single PDF/A-2b file for a document. Using real-life examples and open source software, the authors show readers how to convert TIFF images, extract associated metadata and International Color Consortium (ICC) profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. Providing theoretical analysis and empirical examples, the authors show that PDF/A has many advantages over the traditionally preferred file format, TIFF/JPEG2000, for digitization of text documents.
NoteOpen access journal
VersionFinal published version