PWA系列 -- Cache 技术

PWA

2018-12-11

# 前言

我们在之前推送的文章中为读者们分享了 PWA 这项技术，并有文章分别介绍了 ServiceWorker 和 Web Push 技术。今天我们会为大家分享 PWA 技术中的存储机制 -- Cache 技术。阅读本文，你将会了解 Cache 技术的介绍，使用场景，实例，最后，我们会一起讨论为什么我们会选择 Cache 技术。

# 背景信息

HTML5 提供了一种 Application Cache 机制，使得基于 web 的应用程序可以离线运行。但 Application Cache 机制也存在诸多缺陷，导致它最终只能成为 HTML5 规范的 non-normative 特性。Firefox 已经将 Application Cache 列为已废弃的标准，并且计划移除相关支持的代码。ServiceWorker 最开始的一个目标是能替代 Application Cache 提供更加良好的离线体验，但很明显，随着时间发展，我们可以发现 ServiceWorker 的能力越来越强大，其能力已远远超越 Application Cache。而 ServiceWorker 里面的 CacheStorage 的能力却与 Application Cache 越来越像，它能够提供精细的存储控制能力，Fetch + Cache 已能较好的替代 Application Cache，即使在不支持 ServiceWorker 的浏览器（比如，Safari），通过支持 Fetch 和 Cache，也能较好的进行存储控制，这估计也是 Fetch 和 CacheStorage 会从 ServiceWorker 独立出来的原因之一。本文重点讨论 CacheStorage 相关技术。

# 接口说明

CacheStorage 管理一系列 Cache 对象，它提供了很多 JS 接口用于操作 Cache 对象。

CacheStorage.open() 用于获取一个 Cache 对象实例。
CacheStorage.match() 用于检查 CacheStorage 中是否存在以 Request 为 Key 的 Cache 对象。
CacheStorage.has() 用于检查是否存在指定名称的 Cache 对象。
CacheStorage.keys() 用于返回 CacheStorage 中所有 Cache 对象的 Key 列表。
CacheStorage.delete() 用于删除指定名称的 Cache 对象。

Cache 提供了已缓存的 Request / Response 对象体的存储管理机制。

CacheStorage.open() 开发者可以使用它来获取 Cache 对象实例，使用该实例的方法去管理已缓存的 Request / Response 对象体。
Cache.put() 用于把 Request / Response 对象体放进指定的 Cache。
Cache.add() 用于获取一个 Request 的 Response，并将 Request / Response 对象体放进指定的 Cache。注：等价于 fetch(request) + Cache.put(request, response)。
Cache.addAll() 用于获取一组 Request 的 Response，并将该组 Request / Response 对象体放进指定的 Cache。
Cache.keys() 用于获取 Cache 中所有 Key 列表，一般是 Request 的列表。
Cache.match() 用于查找是否存在以 Request 为 Key 的 Cache 对象。
Cache.matchAll() 用于查找是否存在一组以 Request 为 Key 的 Cache 对象组。
Cache.delete() 用于删除以 Request 为 Key 的 Cache Entry。注意，Cache 不会过期，只能显式删除。

# 基本用法

检查浏览器是否支持 CacheStorage 有两种，一种是与 ServiceWorker 绑定的，即 ServiceWorkerGlobalScope.caches；另外一种是全局的，即 Window.caches。
CacheStorage 中创建 Cache 一般来说，在我们操作一个 Cache 之前，应该先使用 CacheStorage.open() 获取到相应 Cache 的实例。

caches.open(cacheName).then(function(cache) {
  //do something with your cache
});

CacheStorage 中查找 Cache 一般有三种方法查找 Cache，

使用 CacheStorage.keys()

caches.keys().then(function(keyList) {
  //do something with your keyList
});

注：一般用于遍历 CacheStorage，比如在 ServiceWorker activate 状态时清除旧的缓存。

使用 CacheStorage.match()

caches.match(request,{options}).then(function(response) {
  //do something with the request
});

注：一般用于从 CacheStorage 中找出某个 Cache，如果找到则对该 Cache 进行进一步的处理。

使用 CacheStorage.has()

caches.has(cacheName).then(function(true) {
  // your cache exists!
});

CacheStorage 中删除 Cache

caches.delete(cacheName).then(function(true) {
  //your cache is now deleted
});

加入 Cache 一般有三种方式可以把 Response 加入到 Cache 中，

使用 Cache.put()

fetch(url).then(function (response) {
  if(!response.ok) {
    thrownewTypeError('bad response status');
  }
  returncache.put(url, response);
})

注1：put() 会覆盖之前已存储在 Cache 中相同 key 的 key/value 对。注2：Cache.add/Cache.addAll 不会缓存非2XX的响应，即 Cache.add/Cache.addAll 不能缓存opaque responses；而 Cache.put 可以缓存任意 request/response，包括 opaque responses。注3：Cache.add, Cache.addAll, 和 Cache.put，目前的实现是，response body 写入磁盘之后才返回 Promise。而最新的规范表明浏览器在 entry 已记录在数据库即可返回 Promise，即使response body 还在接收的过程中。注4：Chrome46开始，Cache API 只会存储 https 域名下的 request/response。

使用 Cache.add()

cache.add(request).then(function() {
  //request has been added to the cache
});

注1：add() 同样会覆盖之前已存储在 Cache 中相同 key 的 key/value 对。注2：Chrome46开始支持此接口。

使用 Cache.addAll()

cache.addAll(requests[]).then(function() {
  //requests have been added to the cache
});

注1：addAll() 同样会覆盖之前已存储在 Cache 中相同 key 的 key/value 对，但同一 addAll()循环里不能覆盖。注2：addAll()比较适合在 ServiceWorker install 的时候提前更新缓存。注3：Chrome46开始支持此接口。

查询 Cache 一般有三种方式，检查一个 Request/Response 是否在 Cache 中，

使用 Cache.keys()

cache.keys(request,{options}).then(function(keys) {
  //do something with your array of requests
});

注：options 是 keys 的匹配规则参数，ignoreSearch：匹配时忽略？的内容；ignoreMethod：忽略 HTTP method，比如，GET/POST；ignoreVary：忽略 Vary 头部。

使用 Cache.match()

cache.match(request,{options}).then(function(response) {
  //do something with the response
});

使用 Cache.matchAll()

cache.matchAll(request,{options}).then(function(response) {
  //do something with the response array
});

注1：返回 Cache 中所有符合条件的 Response，是 Response 的数组。注2：Chrome47开始支持此接口。

删除 Cache Entry

cache.delete(request,{options}).then(function(true) {
  //your cache entry has been deleted
});

从上面可以看到，Cache API 提供了非常丰富的接口，可以操作缓存的创建，查询，增加，删除，更新，等等。这样页面就对缓存具有非常完整的控制能力了。

# 缓存策略

仅使用 Cache

self.addEventListener('fetch', function(event) {
  // If a match isn't found in the cache, the response
  // will look like a connection error
  event.respondWith(caches.match(event.request));
});

仅使用网络

self.addEventListener('fetch', function(event) {
  event.respondWith(fetch(event.request));
  // or simply don't call event.respondWith, which
  // will result in default browser behaviour
});

优先使用缓存，失败则使用网络

self.addEventListener('fetch', function(event) {
  event.respondWith(
    caches.match(event.request).then(function(response) {
      returnresponse || fetch(event.request);
    })
  );
});

缓存与网络竞争，谁快就用谁

// Promise.race is no good to us because it rejects if
// a promise rejects before fulfilling. Let's make a proper
// race function:
function promiseAny(promises) {
  returnnewPromise((resolve, reject) => {
    // make sure promises are all promises
    promises = promises.map(p => Promise.resolve(p));
    // resolve this promise as soon as one resolves
    promises.forEach(p => p.then(resolve));
    // reject if all promises reject
    promises.reduce((a, b) => a.catch(() => b))
      .catch(() => reject(Error("All failed")));
  });
};
 
self.addEventListener('fetch', function(event) {
  event.respondWith(
    promiseAny([
      caches.match(event.request),
      fetch(event.request)
    ])
  );
});

优先使用网络，失败则使用缓存

self.addEventListener('fetch', function(event) {
  event.respondWith(
    fetch(event.request).catch(function() {
      returncaches.match(event.request);
    })
  );
});

先使用缓存，再访问网络更新缓存，等同于后置验证

Code in the page:
var networkDataReceived = false;
 
startSpinner();
 
// fetch fresh data
var networkUpdate = fetch('/data.json').then(function(response) {
  returnresponse.json();
}).then(function(data) {
  networkDataReceived = true;
  updatePage();
});
 
// fetch cached data
caches.match('/data.json').then(function(response) {
  if(!response) throwError("No data");
  returnresponse.json();
}).then(function(data) {
  // don't overwrite newer network data
  if(!networkDataReceived) {
    updatePage(data);
  }
}).catch(function() {
  // we didn't get cached data, the network is our last hope:
  returnnetworkUpdate;
}).catch(showErrorMessage).then(stopSpinner);
  
Code in the ServiceWorker:
self.addEventListener('fetch', function(event) {
  event.respondWith(
    caches.open('mysite-dynamic').then(function(cache) {
      returnfetch(event.request).then(function(response) {
        cache.put(event.request, response.clone());
        returnresponse;
      });
    })
  );
});

常规的回退流程，在缓存和网络都不可用时，可以提供一个默认页面

self.addEventListener('fetch', function(event) {
  event.respondWith(
    // Try the cache
    caches.match(event.request).then(function(response) {
      // Fall back to network
      returnresponse || fetch(event.request);
    }).catch(function() {
      // If both fail, show a generic fallback:
      returncaches.match('/offline.html');
      // However, in reality you'd have many different
      // fallbacks, depending on URL & headers.
      // Eg, a fallback silhouette image for avatars.
    })
  );
});

注：上述内容来自 The Offline Cookbook。

上述缓存策略能生效的关键在于 FetchEvent.respondWith。FetchEvent.respondWith 允许页面 JS 给请求指定任意 Response，既可以是 fetch 回来的 Response，也可以是本地缓存生成的 Response，这样就等同于页面 JS 可以决定当前请求是使用网络还是使用缓存。

# 常见问题

跨域问题页面的静态资源一般会存储在一个独立的域名，这样往往会引入跨域的问题。比较理想的情况，ServiceWorker fetch 请求，可使用 cors 参数，让服务器返回 Access-Control-Allow-Origin 头部，允许跨域。但更多时候，服务器往往由于各种原因不能返回 Access-Control-Allow-Origin 头部，那么只能使用 no-cors 参数，让其会返回 opaque responses，但 opaque responses 是个很讨厌的东西，JS 不能读取其状态码，即我们是不知道这个响应是成功的还是失败的。跨域问题一般都不容易处理，针对上面提到的问题，ServiceWorker 也没有给出非常完美的解决方案，一些可能的方案如下，

独立的域名注册自己的 ServiceWorker，即使用 Foreign Fetch，参考 Background Sync 技术
应用客户端通过 ServiceWorkerController.shouldInterceptRequest 拦截ServiceWorker 的请求，然后返回带上 Access-Control-Allow-Origin 头部的响应。例如，先把一些静态资源更新到客户端，页面通过 ServiceWorker fetch 请求静态资源时，客户端进行拦截和返回带上 Access-Control-Allow-Origin 头部的响应。

缓存容量限制 ServiceWorkers 规范没有对 Cache API 的容量限制作出定义，Cache API 的存储类型属于Temporary，即浏览器可以在任何时候删除它的存储，比如存储压力过大时。与 AppCache, IndexedDB, WebSQL 和 File System API 等共享 Temporary 类型的存储空间，但不与 Local Storage 和 Session Storage 共享.

注1：Chromium 浏览器, 所有域名的 ServiceWorker Script Cache 存储限额为250M, 会受一些策略算法影响。注2：Chromium 浏览器, 每一个 ServiceWorkerCache 的存储限额为512M，会受手机容量等因素影响。

参考：What's the size limit of Cache Storage for Service Worker? 3. 缓存淘汰机制与 IndexedDB 类似，参考 IndexedDB 浏览器存储限制和清理标准。

# Cache技术的意义

我们已经有非常多的存储机制，比如，AppCache, IndexedDB, WebSQL, File System API, LocalStorage, HTTP Cache 等等，为什么我们还需要新增一种 CacheStorage？已有的缓存机制，前端的控制力都偏弱，很多时候会遇到问题都束手无策。而 CacheStorage 的目标是给页端提供细粒度操作请求缓存的底层原语，等同于给页端开放操作 HTTP Cache 级别缓存的能力，它与 Fetch API 结合，让页端具备了完全操控请求，响应，缓存的能力，这正是页端一直非常缺乏的能力。

← 全新 U4 内核3.0，聚焦 Web 引擎，性能体验大幅提升 PWA系列 -- Fetch 技术 →