Aspects of digitization: The reading order in digitized old newspapers

Vol.5,No.2(2013)
Autumn issue 2013

Abstract

The issue of the newspaper is filled with diverse information associated to the articles, which are supplemented by photographs or images and surrounded by advertisments. Newspaper page contains a number of references and graphic elements that intuitively guides the reader to continuation of the article or illustration. Digitized image preserves the layout of the page but thanks to new technologies and metadata formats such as METS and ALTO we can go into more depth and extract information from newspapers at the level of articles. This requires newspapers digitally "reshape" to the individual zones and then create a logical structure that can be presented together as a separate document, although an article in the paper version is surrounded by other articles and contained in a several pages. Experts in the Czech Republic created a specification of METS and ALTO format profiles that contains the logical structure of newspapers, their binding to the fulltext and the reading order of the segments forming the article. Experts from the Library of Congress, which manages most of library metadata standards including METS and ALTO, are interested in our solution. Now we are working on the official METS profile.


Pavla Švástová

Moravská zemská knihovna Brno


Keywords:
digitization of old newspapers; METS; ALTO; OCR; metadata
Author biography

Pavla Švástová

Moravská zemská knihovna Brno

Moravská zemská knihovna Brno
References

  1. BURIÁNEK, Zdeněk. ABC o grafické úpravě novin a časopisů. 1. vyd. Praha: Orbis, 1960. 222 s.
  2. GARCIA, M.: Pure Design [online]. Miller Media, 2002 [cit. 2013-04-12]. ISBN 0-9724696-0-5. Dostupné z: http://issuu.com/mariogarcia/docs/mario_garcia_pure_design
  3. Nové standardy digitalizace (od roku 2012).Národní digitální knihovna [online]. 2012, 20.2.2013 [cit. 2013-06-10]. Dostupné z: http://ndk.cz/digitalizace/nove-standardy-digitalizace-od-roku-2011
  4. METS: Metadata Encoding & Transmission Standard. The Library od Congress [online]. 2012 [cit. 2013-06-12]. Dostupné z: http://www.loc.gov/standards/mets/
  5. ALTO: Technical Metadata for Optical Character Recognition [online]. 2013 [cit. 2013-06-12]. Dostupné z: http://www.loc.gov/standards/alto/
  6. Ochranné reformátování. Národní knihovna České republiky [online]. 01.12.2012 [cit. 2013-07-01]. Dostupné z: http://wwwold.nkp.cz/pages/page.php3?page=weba_reform.htm
  7. Digitalizace v projektu NDK. Národní digitální knihovna [online]. 09.01.2012 [cit. 2013-07-01]. Dostupné z: http://www.ndk.cz/digitalizace
  8. ŠVÁSTOVÁ, Pavla. Metadata a metadatové standardy užívané v knihovnách. 9.12.2010 [cit. 2013-06-10]. Dostupné z: http://www.slideshare.net/pavluskas/prezentace-pardubice

Metrics

50

Views

31

PDF (Czech) views