I previously discussed LocalStorage load times, and had concluded that it wasn’t too bad. However, when I got back from my new year’s holiday in Patagonia, I started digging through my email backlog and found an interesting line of questioning from GMail engineers, asking why Chrome’s first LocalStorage access was so slow in their use case, and providing data showing it. I, of course, knew why it should be slow, because it’s loading the entire DB from disk on the first access. But I previously had data that showed it wasn’t so bad. What I realized then was that I was not accounting for the size of the DB. So I added histograms to record LocalStorage DB sizes and load times by size. Note that LocalStorage is cached both in the renderer and browser processes. Furthermore, note that I’m only recording a single sample for each time LocalStorage is loaded into the process’s in-memory cache. This obviously means that long-lived renderer processes (roughly speaking, tabs) will only get one sample recorded, even if they heavily use LocalStorage.
As can be seen, the vast majority of LocalStorage DBs are rather small in size (only on the order of a few KBs). That means that in my previous post where I recorded the LocalStorage load times, the histograms primarily consisted of samples for small LocalStorage DBs. In this iteration, I separated out the load times for DBs into three buckets: size < 100KB, 100KB < size < 1MB, and 1MB < size < 5MB, and got the following results.
The short of it is that the long tail is terribly slow for large DBs. This definitely has implications for people considering using LocalStorage as an application resource cache for performance reasons, since caching resources will probably noticeably increase the size of the LocalStorage DB, and also has some security implications. And just in case you forgot, the API is synchronous, so the renderer main thread is blocked while the browser loads LocalStorage into memory, at most likely the worst time for performance – initial page load. And note the jump at the end of the chart…that’s because my max bucket was capped at 10s, since I didn’t think we’d have many samples that exceeded that. Unfortunately, I was wrong :(
In the end, as with all web performance techniques, you really should measure the impact of the technique to make sure that it is actually a performance win in your use case.
PS: I’ve published the full CDFs from the histogram data. Note that this consists of data gathered from Google Chrome 26 opted-in dev channel users in February over a space of 5-6 days. The results should definitely change somewhat for stable channel users (probably for the worse…dev channel users tend to have more advanced machines). Take the Mac and especially the Linux data with extra grains of salt, since their sample sizes are significantly lower.