Using HTTPArchive and Chrome UX report to get Lighthouse score for top visited sites in India.

Paul Kinlan
Available in: Deutsch Español Français 日本語 मानक हिन्दी русский язык tiếng Việt தமிழ் bahasa Indonesia

As I mentioned in my previous post, I am starting to plan more Developer Relations work in India and I want to get a better understanding of how users in India experience the web. In that post I had a very simple heuristic for determine a site in India, is it a ‘.in’ domain. I knew that this wasn’t the best way to look at it, but it felt like a good first go.

What I really wanted was a way to understand the sites that users in India visit and then get their scores ranked by popularity of the site.

Luckily the Chrome UX report has some of that data. The Chrome UX Report has a series of tables in BigQuery that contain a list of many of the top origins that users in India visit (the table is chrome-ux-report.country_in.20180 — note the ‘_in’ which denotes the country). The Chrome UX Report has a lot more data for each origin such as the aggregated speed of the site for actual users, but I really only needed the URLs.

Using the data from Chrome UX report, and combining it with the Alexa ranking table in HTTP Archive along with the previously mentioned HTTPArchive lighthouse scores we can get a better picture of what users in India actually see.

SELECT
  url, rank,
  JSON_EXTRACT(report, '$.categories.seo.score') AS seo_score,
  JSON_EXTRACT(report, '$.categories.pwa.score') AS pwa_score,
  JSON_EXTRACT(report, '$.categories.performance.score') AS speed_score,
  JSON_EXTRACT(report, '$.categories.accessibility.score') AS accessibility_score
FROM
  `httparchive.lighthouse.2018_08_01_mobile`
JOIN (
  SELECT
    DISTINCT origin,
    Alexa_rank AS rank
  FROM
    `httparchive.urls.20170315`
  JOIN
    `chrome-ux-report.country_in.201807`
  ON
    NET.REG_DOMAIN(origin) = Alexa_domain) AS crux
  ON
    url = CONCAT(origin, '/')
ORDER BY
  rank ASC, url ASC

Running the above query returns a lot of data, too much for Google Sheets, so I only analysed roughly the top 16,000 sites (up to about 7k in the Alexa Rankings). Below is the data aggregated without comment.

Top 7k

Score Range SEO Score PWA Score Speed Score A11Y Score
0 0 25 149 10
0.5 45 12253 7841 3925
0.7 1907 3609 2725 6498
0.8 1713 54 1188 2610
0.9 3016 30 1180 1788
1 9278 21 2283 1157
0 0 0 0

Alexa Top 100

Score Range SEO Score PWA Score Speed Score A11Y Score
0 0 0 3 2
0.5 0 2279 1231 519
0.7 87 703 484 1348
0.8 199 0 198 587
0.9 375 0 261 302
1 2316 0 694 219
0 0 0 0

Alexa Top 1000

Score Range SEO Score PWA Score Speed Score A11Y Score
0 0 1 19 2
0.5 16 5471 3517 1942
0.7 546 1867 1272 2941
0.8 757 9 507 1212
0.9 1077 16 567 719
1 4962 6 1241 550
0 0 0 0

I think the tools developers and businesses now have in their hands can make a huge difference to our ability to make reasoned and principled decisions on how users actually feel the experience of the web globally. For me, this data gives me base line that I can look at to see if our strategies for our devrel work influence the ecosystem in the long-term.

Paul Kinlan

Trying to make the web and developers better.

RSS Github Medium