Indexing your offline-capable pages with the Content Indexing API

Enabling service workers to tell browsers which pages work offline

Jeff Posnick

What is the Content Indexing API?

Using aprogressive web appmeans having access to information people care about—images, videos, articles, and more—regardless of the current state of your network connection. Technologies likeservice workers, theCache Storage API, andIndexedDB provide you with the building blocks for storing and serving data when folks interact directly with a PWA. But building a high-quality, offline-first PWA is only part of the story. If folks don't realize that a web app's content is available while they're offline, they won't take full advantage of the work you put into implementing that functionality.

This is adiscoveryproblem; how can your PWA make users aware of its offline-capable content so that they can discover and view what's available? The Content Indexing API is a solution to this problem. The developer-facing portion of this solution is an extension to service workers, which allows developers to add URLs and metadata of offline-capable pages to a local index maintained by the browser. That enhancement is available in Chrome 84 and later.

Once the index is populated with content from your PWA, as well as any other installed PWAs, it will be surfaced by the browser as shown below.

A screenshot of the Downloads menu item on Chrome's new tab page. — First, select theDownloadsmenu item on Chrome's new tab page.

Media and articles that have been added to the index. — Media and articles that have been added to the index will be shown in the **Articles for You**section.

Additionally, Chrome can proactively recommend content when it detects that a user is offline.

The Content Indexing APIis not an alternative way of caching content.It's a way of providing metadata about pages that are already cached by your service worker, so that the browser can surface those pages when folks are likely to want to view them. The Content Indexing API helps withdiscoverabilityof cached pages.

See it in action

The best way to get a feel for the Content Indexing API is to try a sample application.

Make sure that you're using a supported browser and platform. Currently, that's limited toChrome 84 or later on Android.Go toabout://versionto see what version of Chrome you're running.
Visithttps://contentindex.dev
Click the+button next to one or more of the items on the list.
(Optional) Disable your device's Wi-Fi and cellular data connection, or enable airplane mode to simulate taking your browser offline.
ChooseDownloadsfrom Chrome's menu, and switch to theArticles for Youtab.
Browse through the content that you previously saved.

You can viewthe source of the sample application on GitHub.

Another sample application, aScrapbook PWA, illustrates the use of the Content Indexing API with theWeb Share Target API.Thecode demonstrates a technique for keeping the Content Indexing API in sync with items stored by a web app using theCache Storage API.

Using the API

To use the API your app must have a service worker and URLs that are navigable offline. If your web app does not currently have a service worker, theWorkbox librariescan simplify creating one.

What type of URLs can be indexed as offline-capable?

The API supports indexing URLs corresponding to HTML documents. A URL for a cached media file, for example, can't be indexed directly. Instead, you need to provide a URL for a page that displays media, and which works offline.

A recommended pattern is to create a "viewer" HTML page that could accept the underlying media URL as a query parameter and then display the contents of the file, potentially with additional controls or content on the page.

Web apps can only add URLs to the content index that are under the scope of the current service worker. In other words, a web app could not add a URL belonging to a completely different domain into the content index.

Overview

The Content Indexing API supports three operations: adding, listing, and removing metadata. These methods are exposed from a new property,index,that has been added to the ServiceWorkerRegistration interface.

The first step in indexing content is getting a reference to the current ServiceWorkerRegistration.Usingnavigator.serviceWorker.readyis the most straightforward way:

const registration = await navigator.serviceWorker.ready;

// Remember to feature-detect before using the API:
if ('index' in registration) {
// Your Content Indexing API code goes here!
}

If you're making calls to the Content Indexing API from within a service worker, rather than inside a web page, you can refer to theServiceWorkerRegistration directly viaregistration.It willalready be defined as part of theServiceWorkerGlobalScope.

Adding to the index

Use theadd()method to index URLs and their associated metadata. It's up to you to choose when items are added to the index. You might want to add to the index in response to an input, like clicking a "save offline" button. Or you might add items automatically each time cached data is updated via a mechanism likeperiodic background sync.

await registration.index.add({
// Required; set to something unique within your web app.
id: 'article-123',

// Required; url needs to be an offline-capable HTML page.
url: '/articles/123',

// Required; used in user-visible lists of content.
title: 'Article title',

// Required; used in user-visible lists of content.
description: 'Amazing article about things!',

// Required; used in user-visible lists of content.
icons: [{
src: '/img/article-123.png',
sizes: '64x64',
type: 'image/png',
}],

// Optional; valid categories are currently:
// 'homepage', 'article', 'video', 'audio', or '' (default).
category: 'article',
});

Adding an entry only affects the content index; it does not add anything to the cache.

Edge case: Call`add()`from`window`context if your icons rely on a`fetch`handler

When you calladd(),Chrome will make a request for each icon's URL to ensure that it has a copy of the icon to use when displaying a list of indexed content.

If you calladd()from thewindowcontext (in other words, from your web page), this request will trigger afetchevent on your service worker.
If you calladd()within your service worker (perhaps inside another event handler), the request willnottrigger the service worker'sfetchhandler. The icons will be fetched directly, without any service worker involvement. Keep this in mind if your icons rely on yourfetchhandler, perhaps because they only exist in the local cache and not on the network. If they do, make sure that you only calladd()from thewindowcontext.

Listing the index's contents

ThegetAll()method returns a promise for an iterable list of indexed entries and their metadata. Returned entries will contain all of the data saved with add().

const entries = await registration.index.getAll();
for (const entry of entries) {
// entry.id, entry.launchUrl, etc. are all exposed.
}

Removing items from the index

To remove an item from the index, calldelete()with theidof the item to remove:

await registration.index.delete('article-123');

Callingdelete()only affects the index. It does not delete anything from the cache.

Handling a user delete event

When the browser displays the indexed content, it may include its own user interface with aDeletemenu item, giving people a chance to indicate that they're done viewing previously indexed content. This is how the deletion interface looks in Chrome 80:

When someone selects that menu item, your web app's service worker will receive acontentdeleteevent. While handling this event is optional, it provides a chance for your service worker to "clean up" content, like locally cached media files, that someone has indicated they are done with.

You do not need to callregistration.index.delete()inside your contentdeletehandler; if the event has been fired, the relevant index deletion has already been performed by the browser.

self.addEventListener('contentdelete', (event) => {
// event.id will correspond to the id value used
// when the indexed content was added.
// Use that value to determine what content, if any,
// to delete from wherever your app stores it—usually
// the Cache Storage API or perhaps IndexedDB.
});

Note:Thecontentdeleteevent is only fired when the deletion happens due to interaction with the browser's built-in user interface. It isnotfired when registration.index.delete()is called. If your web app triggers the index deletion using that API method, it should alsoclean up cached contentat the same time.

Feedback about the API design

Is there something about the API that's awkward or doesn't work as expected? Or are there missing pieces that you need to implement your idea?

File an issue on theContent Indexing API explainer GitHub repo,or add your thoughts to an existing issue.

Problem with the implementation?

Did you find a bug with Chrome's implementation?

File a bug athttps://new.crbug.com.Include as much detail as you can, simple instructions for reproducing, and setComponents toBlink>ContentIndexing.

Planning to use the API?

Planning to use the Content Indexing API in your web app? Your public support helps Chrome prioritize features, and shows other browser vendors how critical it is to support them.

Send a tweet to@ChromiumDevusing the hashtag #ContentIndexingAPI and details on where and how you're using it.

What are some security and privacy implications of content indexing?

Check outthe answers provided in response to the W3C'sSecurity and Privacy questionnaire.If you have further questions, please start a discussion via the project'sGitHub repo.

Hero image by Maksym Kaharlytskyi onUnsplash.