How to Map 404 URLs at Scale with Sentence Embeddings via @hamletbatista
Reported today on Search Engine Journal
For the full article visit: http://tracking.feedpress.it/link/13962/13043927
How to Map 404 URLs at Scale with Sentence Embeddings
One surefire way to help clients gain more SEO traffic is to redirect valuable URLs that end up in 404s to equivalent ones.
These URLs generally still get traffic, have valuable external links coming in or both.
One lazy and ineffective approach to map 404 URLs is to redirect all of them to the home page or a dynamic search result. 🤦
For example, Smarthome 302 redirects non-existing pages to its home page. This type of redirection is generally flagged as soft 404 errors in Google Search Console.
The correct approach is to map each one individually to an equivalent page if such a page exists.
However, this process can be very tedious, time-consuming, and expensive if you need to do it manually.
Oftentimes, you need to rely on the default internal search engine of the site, which is rarely any good.
In this column, we will learn how to automate this valuable technique using a neural matching approach.
Here is our plan of action:
Downloading URL Sets
There are many ways to get 404 URLs. You could run a website crawl, download 404s from Google or Bing Search Consoles, etc.
One of my favorite places to get 404 URLs, is the Ahrefs Broken Backlinks tool because it filters 404s to pages with external links.
Google Search Console will likely have far more 404s to map, though. If you rather map all 404s and have more than one thousand to download, you might want to consider using our Cloudflare app which has no such limits.
You can export up to 100,000 URLs or as many as you have when you connect it to Google Drive.
Next, you need a set of all valid website URLs, preferably canonical URLs.
One simple way to get such a list is to download the XML sitemap URLs.
If your client doesn't have XML sitemaps, you can perform a tradi