Use Google Sheets to pull website data for your SEO game

Google Docs are my new love. They haven’t been before – to be honest: I avoided spreadsheets as much as I could. But ever since I started my current job, I have learned to love working with spreadsheets.

Lately, I have been playing around with Google Sheets to get a quick overview of live webpage data without using any other tools.

Using IMPORTXML function to scrape live webpage data

Google Sheets offers a wide variety of functions. One of these is IMPORTXML.

This function imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.

With the help of IMPORTXML and XPath queries, you can basically scrape information from webpages and list them in your spreadsheet.

The data you would like to extract from the page does not have to be visible on the page itself – it’s okay if it’s just in the HTML.

How to actually use IMPORTXML to scrape website data

In the following example, we will scrape the HTML <title> from a webpage. All we need to do is know some XPath basics (no worries, I will show you how to do it).

  1. Open up a ne Google Spreadsheet
  2. Enter a URL into cell A1
  3. Go to cell B1 and enter the following code
    =IMPORTXML(A1, “//title”)
  4. Press “enter”

Now you will see a “loading” text within the cell B1 – and once it’s done loading, you will see the <title> of the page.

See, that was easy :) Now let’s continue and find some more stuff!

Some Xpath queries to help you find page information

I have collected some XPath queries which you can use to collect useful information. Simply add these queries to the IMPORTXML function as you did with the “//title” in the example above.

Get the meta description

=IMPORTXML(A1, "//meta[@name='description']/@content")

Get the heading 1 (h1)

=IMPORTXML(A1, "//h1")

Get the canonical tag

=IMPORTXML(A1, "//link[@rel='canonical']/@href")

Getting other HTML elements from a webpage

Now let’s say, you want to scrape some Wikipedia articles to see the date they were last edited.

To get the XPath query for the HTML element you’re looking for, simply open one of the pages in Google Chrome and right click on the element you want to extract. Then, click “Inspect” to open the developer tools.

The element you clicked on should be highlighted in the “Elements” tab in the developer tools – just like you can see in this image:

Now, right click on the highlighted line, and in the contect menu select “Copy” -> “Copy XPath”.

Copy the XPath query to your clipboard.

Now you have the XPath query in your clipboard – and you can simply paste it into your IMPORTXML function in your B2 cell (like in the screenshot below). You might need to replace double quotes in your copied XPath query for single quotes (around the id selector).

=IMPORTXML(A2, "//*[@id='footer-info-lastmod']")

After entering the code above and hitting “Enter”, you should see the “last edited” text for the URL in cell A2.
Now, drag the cell B2

Getting the “Last Updated” information from Wikipedia articles in Google Sheets

Importing an XML sitemap into Google Sheets

Sometimes, you want to get an XML sitemap overview of some website. You know the sitemap URL, so why not use the IMPORTXML function to do just that?

Important: I will use two functions in adjacent cells to get both the URL of the referenced pages as well as the lastmod date.

So the setup is:
A1: URL of the XML sitemap
B1: FUNCTION 1 to get the referenced URLs
C2: FUNCTION 2 to get the lastmod date

Function 1 (referenced URLs)

=IMPORTXML("https://haensel.pro/post-sitemap.xml", "//*[local-name() ='url']/*[local-name() ='loc']")

Function 2 (lastmod dates in cell C1)

=IMPORTXML("https://haensel.pro/post-sitemap.xml", "//*[local-name() ='url']/*[local-name() ='lastmod']")

The output should look like this.

Output of the IMPORTXML function to import XML sitemap into Google Sheets

Try it out

You can try this with my sample spreadsheet. Just make sure you make a copy of this before you try editing it :)

Conclusion

This is a simple way of getting some useful data rather quickly and without having to use any other – sometimes costly – software.

Let me know what you think

If you have more useful XPath queries or any other improvements, let me know in the comments below and I will add them to this article.