Today I asked myself if it is possible to detect online ads with AI. So I started my webbrowser and deactivated the adblocker to make some screenshots of ads. I thought I would do this every day for a couple minutes and after a week or two I would have enough data to train a model. Maybe I am to used to use an adblocker, but I underestimated the amount of ads on the internet. After half an hour I collected about 130 screenshots of ads. Another discovery: the more unserious and lowbrow a website is, the more advertising it contains.
After downloading some other images and creating screenshots of normal news articles, I started a new notebook on Paperspace and uploaded my data. Let’s see if I can create something functional with it.
At first I checked if there are some broken files in my dataset, because of a failed upload:
All files were okay, so I created a new dataloader for my images and loaded them:
A quick check how my images look like:
One problem is the different size of the images. To solve this problem, I decided to stretch all images to 256x256 pixels.
Now the model was ready to be trained. I decided to use the model “resnet34”, because it is relatively performant and has a sufficient accuracy for testing purposes.
As a result, I got an error rate of about 30 percent. This is not really good, but acceptable considering the size of my data set.
One possible explanation for this high error rate is that, in addition to normal images of landscapes, animals, etc., I also included a few images of news articles in the dataset. Since these are quite difficult to distinguish from advertisements, I suspected that this is where the error lies.
This assumption can be confirmed if you look at the top losses:
So how could the model be improved? One approach would be to increase the size of the data set. It is very small and contains a lot of similar data, which means that the model cannot be trained properly.
So is it possible to detect advertisements using AI? Possibly. With a little more data and a little finetuning, the results would surely have been better. But at least it was a fun experiment.