It is self–evident that digital archives have transformed the landscape of historical research, especially of the eighteenth and nineteenth centuries, on which the majority of these resources are concentrated. The sheer digital accessibility of rare material, which in physical form can be locked away in the special collections of a single library, has helped to democratise the study of the past. In tandem with this, many digital archives permit word–searching of the content, opening up the material to a form of textual interrogation not permissible in print.


In this gold–rush to digital, there is a danger of taking it all for granted. There is a common misconception that digitizing the archive of a newspaper is simply a case of scanning the pages and putting them on the internet. In this regard, digital archives are a victim of their own success; the best are so simple to use that they look equally simple to put together. Yet behind each digital newspaper archive is a mammoth project involving editorial selection, content processing conundrums and a wide variety of bespoke technical decisions. Jim Mussell has rightly argued that ‘these resources actually constitute a type of edition … Users must be able to analyze how a resource has been put together if they are to understand how the digital representation differs from whatever it republishes.’1 By presenting the methodology used in creating the Daily Mail Historical Archive, my aim here is to bring to users’ attention the history of this digital edition, the transformations that have taken place to the content, and an appreciation of the scale of the whole initiative.

How did we source the material?

The first headache encountered by any project to digitise an historic newspaper is which edition do we use? To those unfamiliar with media history it may come as a surprise to learn that there is no single edition of a daily newspaper. Historically, most papers have published — and continue to publish — multiple editions in one day, including late editions, regional editions and weekly editions. While the content will be broadly the same between the editions, there will be some differences in the selection of news stories, advertising, and even seemingly trivial details such as the masthead. Which, then, is the ‘authoritative’ edition to use for digitization? Perhaps in an ideal world we would digitise all editions of the paper, so that researchers could see it in all its forms. Unsurprisingly, the costs of doing so make this a non–starter in most cases; digitizing just one edition of the paper is expensive enough. But there is also the experience of the end user to consider. Does someone searching a newspaper archive really want their results swollen by multiple editions? No doubt there are scholars who would derive benefit from this, but our experience shows that newspaper archives are used by a broad church of the research community, including family historians, science departments, and schools. Keeping the archive straightforward and non–confusing is vital.


For content dating before the 1970s and the heyday of microfilming, most archives are digitised from the versions of the newspaper or periodical that were bound into annual or semi–annual volumes. These ‘library editions’ were produced by publishers precisely for preservation and collecting purposes. As Laurel Brake points out, these editions normally exclude ‘ephemeral, paratextual matter’ such as advertising wrappers, supplements, and in some cases even covers.2 library editions are, by necessity, the versions of newspapers most widely consulted in print, but it is important to note that they are no more definitive than other editions. From the 1970s onwards, most major UK newspapers instigated microfilming programmes to preserve their paper for archival purposes. Although many continued to publish bound library editions, microfilming represented a shift in the preservation process, with loose editions being filmed on a monthly basis. At the same time, most newspapers microfilmed their back issues of the bound library editions, so that they had a complete corpus of the newspaper on microfilm.

In the case of the Daily Mail Historical Archive, it is this in–house microfilm, owned by Associated Newspapers, which we have used as the basis for digitization. On the microfilm, the final London edition of the newspaper is the edition that has been used for filming, so in the majority of cases this is what the user will be viewing. Occasionally, where the final edition of the day was not available, we have used earlier editions.


Using the microfilm led to some problems caused by the medium. There were a number of issues, particularly in the early years of the paper, where the team determined that the filmed images were unfit for purpose; the original source was torn or damaged and should never have been used for filming; or the issues had been filmed from tightly bound volumes, causing excessive curvature and obscured text. In most such cases, we replaced the images using reels from an alternative microfilm edition created separately by the British Library. It is worth stressing that microfilm is not innately an inferior medium for digitization. If the original filming was done well and from well–preserved original copies, even decades–old microfilm can produce surprisingly good digital images.


Working our way through the microfilm threw up some surprising anomalies too, evidence of selection decisions made by earlier editors. During the General Strike of 1926, Fleet Street did not put out any newspapers. On the microfilm, the Daily Mail Continental Edition, published in Paris, plugs the gap. We elected to keep this in the digital archive, rather than have no edition available for the period of the strike.


Where available, supplements have been included. Supplements are defined here as components of the paper that have separate pagination from the main part of the newspaper. In print, these sometimes appear in the middle of the paper. This makes sense in terms of packaging the newspaper up as a physical object, to make it compact and easily foldable. In the digital edition it does not make sense to replicate the exact place in which these supplementary materials appeared, as it renders the page sequences extremely confusing, and sometimes splits articles in two. Users cannot pull out the magazine and other supplements to read the main paper uninterrupted, as they would in the material world. We have therefore placed supplements after the paper.


The Weekend magazine, included as a Saturday supplement from 1992, has been filmed inconsistently, and as a result we do not have a complete run in the digital archive. For cost reasons, it has not been possible to fill the gaps by scanning the missing issues. With regret, the primary focus of this project is the main paper. There are other necessary omissions. There are Scottish and Irish editions of the paper, as well as the Continental Daily Mail and the short–lived weekly US edition published from 1944 to 1946. There was even a Braille edition. Ultimately it has not been feasible to include these. In a related vein, the weekly Mail on Sunday began publication in 1983. Although under the same ownership as its daily sister paper, the Mail on Sunday is a separate newspaper, with its own editorial and journalistic team. Given this, and that its historical legacy is not as long as the Daily Mail, we felt justified in de–scoping it from the project. If there is significant interest, the Mail on Sunday may be added to the archive as an upgrade in the future. Finally, Alfred Harmsworth created no less than 63 ‘dummy’ editions of the Daily Mail before its official launch on 4 May 1896, to test his ideas about layout, visual impact, and balance of content. Only some of these ‘dummy’ editions survive, and they were never released to the public, so they have not been included in what is intended to be an archive of the ‘official’ newspaper.

Why did we stop digitising the archive at 2004?

Determining the cut–off date for an archive of any newspaper still in print presents a dilemma. One person’s logical end–date is another researcher’s anguish that XYZ news story falls after that date. Commercial realities dominate the decision–making process here. Although the approved budget for the project was generous — the final costs will be well into seven figures in pounds sterling —it was not unlimited. Every additional year that we opted to include added to the total page count, which in turn added to the content–processing costs. The problem became particularly acute from the 1990s onwards, when newspapers such as the Daily Mail exploded in size. It is noteworthy that the Daily Mail Historical Archive has the largest page count of any single–title newspaper archive that Gale has created.


Our original intention had been to digitise the first 100 years (1896 — 1996), but as the project went into pre–production we realised that, by ending four years short of the millennium, we were missing an opportunity to include a complete perspective of the twentieth century.


As it transpired, the main batch of Associated Newspapers microfilm stops in 2004, so the revised cut–off date presented itself, and additional budget was secured. Thereafter, full–text editions exist on various aggregator sites, although the user experience is not the same. If demand exists, it may be possible to incorporate the Daily Mail’s in–house PDFs of the post–2004 years at a later stage, but they cannot simply be uploaded. As with the digital files generated from the microfilm, they require the generation of XML files that describe the content structure and allow them to be a functional part of a digital archive.

Bound volumes of the Daily Mail Atlantic Edition, showing both wireless page volumes and the complete volume for the A boat in a state of disrepair.
Scanning the Atlantic Edition

The Atlantic Edition

Seemingly contradicting our stated policy of not including multiple editions, we have included the Atlantic Edition in the archive (see ‘The Daily Mail Atlantic Edition’). Printed at sea, these issues are extremely rare, and even the British Library does not hold copies. The only known set is held by Associated Newspapers itself. Long–neglected, the bound volumes have suffered rodent, water and other storage damage over the years, and are deteriorating fast. Preservation of this forgotten source for the history of the 1920s was therefore deemed by the team as an important task for future generations.


The Atlantic Edition was scanned from the physical copies, as no microfilm exists. The bindings were removed to permit flat scanning, and the volumes were subsequently re–bound and put into deep vacuum storage to prevent further decay. Some issues were in poor condition or missing entirely, but unfortunately no replacement copies were available.


The organization of the Atlantic Edition presented its own challenges, as the system had been devised by an Associated Newspapers archivist several decades previously, who had taken the secret to the grave; it took time to crack the code.


Upon investigation, we ascertained that the Atlantic Editions were organized as:

  • A boats: the largest Cunard liners (Aquitania, Berengaria, Mauretania)
  • B boats: the smaller ships
  • wireless volumes


The archivist preserved a complete copy of each issue of the A boat, and a complete copy of each issue for the B boats. How the archivist selected which A boat and B boat to use as the preservation copy is unknown. It may have depended on what issues were to hand. A boats are usually the Berengaria or Aquitania, although there are some issues from the Mauretania.

The wireless volumes are more haphazard, sometimes containing just the wireless pages from a given boat, containing many repeats, and sometimes issues from boats that were sailing at the same time as the A or B boat. These give the impression as volumes where everything else was ‘stuffed in’.

Once we realised that many volumes contained repeats, we could have made a justifiable editorial decision to film only the A volumes. However, we realised that some of the B volumes were sailing at times when there was no A ship at sea. We therefore decided to do both the A and B volumes. This means there are some dates when there is more than one Atlantic edition available, allowing for interesting comparisons about how the respective typesetters presented the news. It may also highlight different advertising: passengers on B ships were probably not quite as wealthy as the A passengers.

Of the original 80 volumes (an estimated 300,000 pages), we have scanned 26 (approximately 40,000 pages). Filming all the Atlantic Editions was not a realistic prospect — it would have consumed a quarter of the project budget — but by filming the A and B volumes we have at least been able to capture the two whole runs preserved by the original archivist.


Creating a digital newspaper archive is no small undertaking. Creating an archive with the careful content selection, appropriate imaging quality and richness of data required for the modern researcher is a taller task still.

Such projects are consequently a huge investment in time and resources, but by making our newspaper heritage more widely accessible and discoverable, they rescue the thoughts, words and deeds of past generations from crumbling, unnoticed, to dust. That seems a price worth paying.

1 Jim Mussell, ‘Teaching Nineteenth–Century Periodicals Using Digital Resources: Myths and Methods’, Victorian Periodicals Review, Vol 45, No. 2, pp.201–09.
2 Laurel Brake, ‘The Longevity of “Ephemera”: Library editions of nineteenth–century periodicals and newspapers”, Media History, Volume 18, Issue 1, February 2012, pp. 7–20.