Horseman Crawler – why you NEED it.

THIS IS NOT MY PROJECT. I am just reporting on it. The creator is Chris Johnson aka @defaced!

I have been waiting to be allowed to talk about this literally for months. And the time has finally come.

May I introduce: Horseman.

Does this world need yet another crawler?

Yes. Yes, it does. We finally need a crawler that gives us more flexibility and ways to work with the data we find during a crawl. Simply exporting data using custom extractions sometimes doesn’t do the trick.

While my personal daily go-to crawling software is still ScreamingFrog, this new Crawler has found its way into my daily work as well. Let me quickly show you an example of what I wanted can do with Horseman, which would involve quite a few more steps if I wanted to do the same with any other crawler.

Why is Horseman getting me all excited?

Horseman is not like any other crawler you know.

It gives you Chrome. Headless Chrome, to be exact. And with that, you get (JavaScript) access to every single crawled page and its DOM. And you can do stuff with it.

Imagine you have the idea to check the sentiment of all the <h1> headings (or any [!] other element on the page) on your website.
My way of doing this right now would be to get a list of the elements I need. In ScreamingFrog, I already have the list of h1 elements. If I want to get any other element, I would need to set up a custom extraction and then export that list.
This CSV list / GoogleSheet data would then be fed into another software that I would need to find / write to get the sentiment of the specific text.

Custom Snippets FTW!

In Horseman, I get custom snippets. These snippets are JavaScript code that gets executed by the Horseman crawler. How does it do that?

You create a snippets (within a real VisualStudioCode editor!) – as simple or complex as you need it to be. And once enabled, Horseman will execute your JavaScript snippet and return whatever you tell it to return – and then output it into your crawl data table.

Let me show you the example I have started above – the sentiment analysis.

So within my custom snippet, I import the sentiment Node.js module and use its analyze method on the textContent of the h1 element and return the sentiment score.

This very simple snippet saves me a huge amount of time because I don’t have to build or use any other software than Horseman.

This way, you can easily extend the functionality of Horseman. It will do whatever you need it to do. Make a crawl as big or small as you need it to be.

Another example: Console errors

Whenever console errors show up in your devtools, you should be moving. These errors can mean that your JavaScript didn’t work correctly – which in return could mean that your navigation, check out process or the complete page might be broken.

It doesn’t always have to be that bad, but it could potentially be. This is why Horseman has got a built in snippet for that.

Which would look like this in the crawl:

These are just two little examples of what Horseman can do for you. There are a ton of snippets already built into the app – and many more are to come.

Some of Horseman’s built in snippets (and some of my custom snippets). Check out the video for more.

Knowing our wonderful SEO community, there will be a huge “marketplace” of freely shared snippets in no time. In the future, I will publish my snippets on here as well, so anybody can use them.

tl;dr: Watch video, get involved.

Check out my Youtube video below. It will show you Horseman in action.

Where can I get Horseman?

At the moment, you need to become a Github sponsor to get into the early access program. Let me tell you: it is totally worth it. Chris Johnson aka @defaced is working on this thing and putting his heart into it. This crawler is really something that any SEO, especially the technical SEOs like me, need to be using to save time, effort and blood, sweat and tears.

Check it out, have fun with it and please support Chris by becoming a sponsor on Github.

Side note:

Horseman is currently in its early access phase. I use it during my daily work. It’s a very, very promising new crawler that gives me the freedom to expand and customize it to get the stuff done that I need to get done.
There are some minor things that are still in the Guthub issue backlog (and which you can also see in the video), but this is mostly polish / design work. Horseman is already a great tool and a much appreciated addition to my SEO toolkit.