Indexes

Introduction

An index (the long list with the page numbers at the end of a book) is an important part of many non-fiction print books (many ebooks don’t include indexes, and instead rely on ebook app’s search function). Traditionally, indexes are created by professional indexers, who read the final page proofs and compile a list of entries and page numbers manually. They do not only add words to the index, but entire concepts. For instance, a good indexer will include an entry for ‘democracy’ in an index if the concept is discussed, even if the word ‘democracy’ is never used. This is why good indexes cannot be generated by computers.

In this workflow, the challenge is in managing page numbers: our workflow allows you to reflow the content into new formats, so page numbers are never quite fixed. Well, you might decide that a particular format’s pages are fixed enough for allow for manual page numbers in an index, but then you cannot reuse that index in another format.

So here we’ll describe how to create an index where the page numbers are dynamic. That is, the index is a list of concepts that point to a particular point in the book. The page numbers in the index (and clickable hyperlinks in an ebook version) are then generated on the fly whenever you output to PDF. This also means you can index early in the production process, such as during editing, rather than waiting for pages to be finalised.

Note: Be careful not to confuse an index in a book with the index page of a website. When talking about websites, the index page is the home page of a directory. So never give your reference index the file name index (e.g. index.md or index.html). To avoid confusion, in our code we refer to the reference-index when we mean a book’s index.

To create an index you have to do two things:

  1. In the text, tag the words that their index entries will point to.
  2. Create an index document with a list of entries.

How to tag indexed words

To tag a word in the text, we make it a link, and we give that link:

The link target should point to the index entry for that word. This way, clicking the word in the text will take you to the right place in the index. And clicking the page reference in the index will take you to the word in the text.

Here’s an example of the link in the text:

Late that night, [Bob](reference-index.html#bob-1){:.indexed #bob-1} realised the key was in his pocket.

And then when Bob appears later in the book:

Eventually, [Bob](reference-index.html#bob-2){:.indexed #bob-2} called her to confess.

Note that the ID must be unique for every instance of Bob.

How to create the index list

Then in the index itself, you create a list of entries. After each entry, you add links to each instance of that entry you’ve tagged in the text. And you give each link the ID that you’ve pointed to in your tagged word’s link target.

Tech tip: To make it easy for us to manage, we use the same ID for the tagged word and the index entry. You don’t have to us the same ID. For instance, your tagged-word’s ID might be #text-bob-1 and your index entry’s link #index-bob-1. If like us you use the same ID in both cases, remember that your index must be in a separate file from your text, since IDs must be unique within each file.

Here is what your markdown for the index might look like. Here we’ve included examples of sub-entries, and the tag to use for the entire list to style it as an index: {:.reference-index}.

* Alice
[1](1.html#alice-1){:#alice-1}
[2](3.html#alice-2){:#alice-2}
* Bob
[1](1.html#bob-1)
[2](9.html#bob-2)
* key
[1](4.html#key-1)
    - private
    [1](4.html#key-private-1)
    - public
    [1](4.html#key-public-1)
{:.reference-index}

In this example, we’ve used numbers as the link text. On screen, these will stay numbers, e.g.:

Alice 1, 2

The stylesheet will add the commas between entries (so you could globally replace with semicolons or otherwise). In print output (using PrinceXML), the stylesheet will replace those numbers with page references.

Future development

We hope to find more concise ways to create indexes from tags in running text in future. We’d also like to align this work the IDPF’s recommendations on indexes in ebooks.