An overview of digital publishing

Why are you reading this? That’s not a rhetorical question (you can even tell me right now on Twitter). I suspect it’s because you’ve begun dabbling in ebooks as a designer or production manager, or you’ve been asked to by your boss or a client. Or you’re simply curious about the ever-increasing volume of talk about ebooks, because you know sooner or later they’re going to be a part of your job.

We know ebooks are a very important part of the publishing landscape right now, especially as publishers try to recover a share of people’s attention from digital devices currently used mostly for music, video and games. But there is another, much more important process happening: ebooks are a by-product of a great human enterprise, the digitisation of literature. Digitised literature will be more easily shared and sold, and in time it will make education and storytelling cheaper and more abundant.

Now is only the beginning. Despite the incredible size and depth of the Internet as we know it, it’s still far, far younger than the world’s paper literature. Ebooks are the most apparent, easily monetised evidence of all our efforts to add the value and volume of paper literature to the great database of knowledge that is the Internet. Everything you learn today should be seen in that context. You are part of the greatest knowledge-curation process of all time.

The Internet we all want is one that easily grows into a more and more powerful, increasingly automated way to create and move large amounts of information, and one that helps us make a living in the process. To do that, we have to fill it with information that will be useful for a long time, and that can be easily found and manipulated by machines. (As with any database: rubbish in, rubbish out.) Every ebook is a piece of that database. It’s not only a freestanding product.

So, if you’re working with ebooks, you need to know how they fit into the Internet, both as consumer products and as an Internet technology.

Today, the ebook industry is most vibrant in the US, followed by the UK and Western Europe. This has been catalysed by more and more people buying dedicated ebook-reading devices, or e-ink ereaders, such as those from Amazon, Barnes & Noble and Kobo. The e-ink has been important because it is profoundly different to read on than on a backlit computer or phone screen. E-ink’s text and images are not made of light, like on a backlit screen, but of thousands of microcapsules that rearrange with every page-refresh into the shapes of letters and pictures. There are usually about 150 to 200 capsules (think pixels or dots) in every inch of screen. Reading on an e-ink screen really is almost like reading on paper. This has made ebooks easier for paper-book lovers to adopt.

The most popular ereaders also make it easy to buy ebooks on the device, by connecting automatically to their parent company. Amazon’s best at this right now: you can browse the Amazon store, buy an ebook, and be reading it within about a minute, anytime, anywhere.

E-ink readers were just the first big win, though. Far more people are downloading ebooks on their phones and tablets, and reading them in apps from Amazon, Apple, Google and others. The screens may be backlit, but they’re easy to read on, and the expense of the device is easier to justify because it can do so much more, like email, visiting websites, playing games, and using a myriad useful apps.

There isn’t space or time here to address many of the common questions and arguments about ereading here. (‘Will ebooks replace print?’, ‘I can’t read a book on a small screen!’, ‘What about bookstores?’, etc.) I highly recommend an article by John Siracusa called ‘The Once and Future Ebook’. It’s a few years old now, but as brilliant as ever. Siracusa talks about the history of ebooks, and where and why they’re important.

What is an ebook?

What is an ebook? That’s easy to answer until you think hard about it. So, first, let’s not think hard about it: an ebook is a book you read on a screen. The author and publisher have distributed it as a digital file.

Now, let’s complicate things. Here’s a list of ebooks features, starting with the most obvious, and then getting into some grey areas:

  1. Ebooks are books read on a screen.
  2. Ebooks are sometimes converted versions of paper books (created from scans or print PDFs), and sometimes they’re designed as ebooks from the start.
  3. Ebooks are stored and delivered in carefully chosen file formats (much like word processing is mostly in Word’s .doc format, and spreadsheeting mostly in Excel’s .xls format). This means they can be read offline – without having to constantly get info from a web server somewhere else.
  4. Ebooks depend on software to be opened and read (like a .doc file depends on a good Word-like program). The quality of this software determines the quality of the reading experience.
  5. Consumers expect ebooks to cost less than print – usually 50%–80% of the print book price. (Pricing is a whole other issue for a separate discussion.)
  6. Ebooks are found and downloaded from the Internet.
  7. Ebooks are sometimes read online, much like a live website. (The online/offline distinction is important for all kinds of reasons.)
  8. Ebooks sometimes include sound, video, or interactive forms, quizzes, or even games.

Those are some fundamentals, just to get us started.

Ebook formats

A file format is a way of storing information. A set of structures, intentions, computer languages and human languages that have been agreed on by a group of computer engineers and subject experts. Each file format stores information differently.

So, file formats are carefully designed by groups of people to achieve very particular aims. There have been many ebook formats, and over time some have proved more resilient than others.

Standardisation

When we standardise something, we all agree to work with the same set of rules. For example, the rules of soccer have been standardised around the world so that anyone can play against anyone else without having to learn new rules. Standards can be official (written down in a formal document) or de facto (generally accepted conventions).

Only in the last few years has the publishing world settled on standards for ebooks.

There are established and emerging standards for file formats, purchasing processes, security, pricing, availability, and so on. The standards we care about most here are around file formats.

Those who come up with new file formats in the early days of a new industry try to use their formats to make loads of money, either by charging a licence fee to software companies for using that format, like MS Word, MP3 and GIF in the early days; or by forcing people to use only their software to open or edit that format. These formats are closed (proprietary), as opposed to being open (free for anyone to use).

Closed formats are those that are still locked down by their owners. No one else is allowed to produce files in those formats without the permission of their owners, who often charge a substantial licence fee. The owners do not publicly publish details (a specification) of how to make files in their formats.

Open formats are free for anyone to create files in, and the specification (how to build the files) is freely available to anyone.

Ultimately the most popular file formats end up being open and free to use. Ironically, the ones that start as open tend to catch on more slowly (e.g. OGG, an alternative to MP3), because they don’t have corporate marketing budgets behind them. While ebook file-format standards are settling, we still have to work with a combination of closed and open formats.

Sometimes, the most popular formats start as proprietary and then become open. The best example is PDF. PDF was owned by Adobe, who in 2008 decided to make it an open format, effectively donating it to the world, a move for which they deserve a lot of credit. They’ve done well to remain the premier PDF software editing company since then.

Major formats

Among major ebook file formats, the main distinguishing features of each format include:

There have been dozens of ebook file formats in ebook stores and publishers’ repositories over the years. Today, there are only three worth learning about:

PDF

We all know PDF as a format for sharing static pages. If you embed fonts (the common default), what you create is exactly what your reader will see. PDF is directly analogous to the printed page. This makes it a nice, safe choice for designers. And it’s easy to create with existing tools (like InDesign, Acrobat Pro, Word and OpenOffice).

PDF is actually a package for various file formats, which gives it its power. It stores both vector and bitmap images, can embed subsets of fonts, contains XML metadata, and can contain flash video and interactive forms that communicate with a remote server. It’s far more powerful than most people realise. Produced well (for instance, tagged behind the scenes with structural information about the document), PDF can even be reflowed by some PDF readers (the text broken out of its paged format and flowed into the available screen area). And it can be easily navigated by reading software for the visually impaired. All these reasons should make it perfect for ebooks.

But it has a downside: it’s too easy, and therefore common, to produce PDFs that are badly created. From a technical point of view, they can be comprehended by humans, but not by machines, rendering them useless to software that tries to navigate or reflow them. So once something’s in PDF, it usually requires a part-manual process to convert it into anything else.

Not all ereading software opens PDF files, and those that can do so badly or only partly (e.g. they don’t support PDF navigation).

EPUB

If you haven’t already, you really need to take a look at an EPUB ebook to get a feel for what it is. (A simple, free example is EBW’s ebook edition of John Siracusa’s article, ‘The Once and Future Ebook’.) You’ll need EPUB-reading software to open it. On whatever device you’re using, just search for ‘EPUB reader’ and you’ll find many options; just pick the most popular or highly rated for now.

Over time, you’re going to have to use several apps to see how different ebooks look in each one. This will be important for testing your ebooks before releasing them.

EPUB has been the fastest-growing format, and the one I’ll talk most about here. It’s an open standard developed by volunteer members of the International Digital Publishing Forum (IDPF). It’s young (officially published in 2007), so it has its technical teething issues, but on the whole it’s simple enough to (eventually) be easy to create, while being sophisticated enough to contain a wide variety of information, including embedded video and SVG (a format for storing vector artwork).

Essentially, EPUB is just a website in a .zip folder. If you took a static website’s files (HTML, CSS and images) and put them in a .zip file, you’d have something much like an EPUB file. EPUB has a few extra features (like the file that creates clickable navigation), but in essence that’s all an EPUB is: web pages in a .zip file. That makes it fairly easy for developers to create software that reads EPUB (because you just adapt existing web-browsing software) and software that saves documents as EPUB. That software is getting better and better (especially InDesign and Sigil, but more on that later). Now, as publishers, we have to learn how to use these new tools. We’re like experienced carpenters learning to work metal for the first time: we have to understand both new tools and new materials.

EPUB also has its inherent failings. As anyone who’s designed for the web will know, the downside of having a format that stores its content much like a website is that it has all the downsides of website design and development: that is, massive inconsistency in the quality of the product and the software that reads and renders it.

Mobi

MOBI (also called PRC) is Amazon’s main format for ebooks distributed on Amazon Kindle. The format was originally developed by Mobipocket, an ebook retailer that Amazon acquired in 2005. If you’re publishing an ebook to the Kindle (e.g. through Amazon’s Digital Text Platform), you can upload a MOBI file confident that the Kindle will display it largely as you intended.

MOBI is actually based on the same predecessor formats as EPUB. So MOBI and EPUB are very similar. This makes it very easy to convert between MOBI and EPUB. For converting the best tool is the free, open-source Calibre.

Digital Rights Management (DRM)

Digital Rights Management is the bugbear of the content industries, from film to music to publishing. It makes for great speculative discussions, so find out more if you like a good fight at your next publishing conference.

All you need to know right now is that DRM is any technical measure that restricts what you can do with a file. Usually, DRM stops you from copying or printing an ebook – it’s the publisher’s or retailer’s attempt to slow piracy. It’s like putting the ebook in a lockbox. Amazon and Apple have their own DRM schemes, and almost everyone else uses Adobe’s. Ebooks locked by an Adobe Content Server can only be opened in software that supports Adobe DRM (such as Adobe Digital Editions), and only when that application has been registered with your Adobe username and password.

Note that while EPUB is an open file format, once it’s wrapped in DRM it essentially becomes proprietary (since the DRM component is proprietary). This causes a lot of confusion among publishers and consumers. People tend to associate ‘open’ with free (as in speech, and sometimes as in beer), and companies that use EPUB like to trumpet that they’re using an open standard. But as soon as they apply DRM to an EPUB file, there is nothing open about the file at all. For instance, in order to make software that opened Adobe-DRMed ebooks, you have to pay Adobe a huge licence fee: EPUB or PDF locked with Adobe DRM is effectively proprietary.

Is that a problem? Well, no one knows for sure, there just isn’t enough data comparing the value and effect of DRMed ebooks with DRM-free ebooks. The obvious downsides of proprietary formats are that:

Some would argue that the money to be made from proprietary formats enables or drives innovation and/or consumer adoption. You decide!