web scraping

Training the Button detector ML model

January 6, 2023

I trained a machine learning model to differentiate between buttons and links on web pages. Using a dataset of ~3000 button images and ~4000 link images, I trained a convolutional neural network (CNN) with added noise for better generalization. Preprocessing included grayscale conversion, dataset diversification with multilingual sites, and image compression. The model performed well in initial tests, correctly classifying button-like and link-like elements. Next, I'll build a web app for easier testing and a Lighthouse audit for website analysis.

Read article →

Puppeteer go

January 10, 2020

A simple node library for Puppeeter

Read article →

Puppeteer Go

December 3, 2019

I've created Puppeteer Go, a small JavaScript library to simplify the process of creating CLI utilities with Puppeteer. It handles the boilerplate of launching the browser, opening a tab, navigating to a URL, performing a specified action, and cleaning up. This post demonstrates its usage by taking multiple screenshots of elements on a page, inspired by Ire Aderinokun's work. Examples include capturing screenshots of h1 elements on my blog and feature blocks on caniuse.com.

Read article →

domcurl: curl + JavaScript

March 12, 2018

A curl-like utitly that runs JavaScript

Read article →

DOMCurl

February 15, 2018

Curl, but can run JavaScript

Read article →

Tag: web scraping

Training the Button detector ML model

Puppeteer go

Puppeteer Go

domcurl: curl + JavaScript

DOMCurl

Stay in the loop.

Get in touch