Bloomsbury Publishing is a leading independent publishing house established in 1986. It has companies in London, New York, Sydney and Delhi. Its four divisions include: Bloomsbury Academic and Professional Division; Bloomsbury Information; Bloomsbury Adult Publishing; Bloomsbury Children's Publishing.
Bloomsbury had a requirement whereby they wanted to digitise Sir Winston Churchill’s papers, which were held on 35mm roll films. The films were archived in the Churchill Archives Centre in Cambridge.
They needed the roll films to be collected, stored securely whilst work was in progress and digitised to an exact specification to allow the images to be published online.
Size of Project:
Total Roll films: 1,399
Total Images: (Approx) 1 Million
Processing: Scanned greyscale, de-skewed, cropped, indexed, OCR
Period: 6 Months
The project required us to digitise Winston Churchill's papers from 35mm roll film and produce Tiffs and PNG's at 300 & 150dpi as well as XML metadata containing reference numbers. We also gathered OCR data from each page in order to provide a full text search facility.
Now that the project has been completed we often get requests to transcribe certain documents of interest to provide a text copy of the original which is usually handwritten and difficult to read.
Transcription is provided back as TEI spec XML which contains formatting to render the text as close to the original as possible, allowing people who view the content to have a text copy which is easy to read.