Tag: web scraping

Articles and experiments related to web scraping.

Training the Button detector ML model

I trained a machine learning model to differentiate between buttons and links on web pages. Using a dataset of ~3000 button images and ~4000 link images, I trained a convolutional neural network (CNN) with added noise for better generalization. Preprocessing included grayscale conversion, dataset diversification with multilingual sites, and image compression. The model performed well in initial tests, correctly classifying button-like and link-like elements. Next, I'll build a web app for easier testing and a Lighthouse audit for website analysis.

Read article

Puppeteer Go

I've created Puppeteer Go, a small JavaScript library to simplify the process of creating CLI utilities with Puppeteer. It handles the boilerplate of launching the browser, opening a tab, navigating to a URL, performing a specified action, and cleaning up. This post demonstrates its usage by taking multiple screenshots of elements on a page, inspired by Ire Aderinokun's work. Examples include capturing screenshots of h1 elements on my blog and feature blocks on caniuse.com.

Read article

Stay in the loop.

I'm trialing a newsletter. Join for monthly insights into web dev, Chrome, and the open web.

alternate_email

Get in touch

Open to chat about Chrome or Web development.

Book a consultation