The author’s views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.
This week, Shawn talks you through the ways your site structure, your sitemaps, and Google Search Console work together to help Google crawl your site, and what you can do to approve Googlebot’s efficiency.
Click on the whiteboard image above to open a high resolution version in a new tab!
Howdy, Moz fans. Welcome to this week’s edition of Whiteboard Friday, and I’m your host, SEO Shawn. This week I’m going to talk about how do you help Google crawl your website more efficiently.
Site structure, sitemaps, & GSC
Now I’ll start at a high level. I want to talk about your site structure, your sitemaps, and Google Search Console, why they’re important and how they’re all related together.
So site structure, let’s think of a spider. As he builds his web, he makes sure to connect every string efficiently together so that he can get across to anywhere he needs to get to, to catch his prey. Well, your website needs to work in that similar fashion. You need to make sure you have a really solid structure, with interlinking between all your pages, categories and things of that sort, to make sure that Google can easily get across your site and do it efficiently without too many disruptions or blockers so they stop crawling your site.
Your sitemaps are kind of a shopping list or a to-do list, if you will, of the URLs you want to make sure that Google is crawling whenever they see your site. Now Google isn’t always going to crawl those URLs, but at least you want to make sure that they see that they’re there, and that’s the best way to do that.
GSC and properties
Then Google Search Console, anybody that creates a website should always connect a property to their website so they can see all the information that Google is willing to share with you about your site and how it’s performing.
So let’s take a quick deep dive into Search Console and properties. So as I mentioned previously, you always should be creating that initial property for your site. There’s a wealth of information you get out of that. Of course, natively, in the Search Console UI, there are some limitations. It’s 1,000 rows of data they’re able to give to you. Good, you can definitely do some filtering, regex, good stuff like that to slice and dice, but you’re still limited to that 1,000 URLs in the native UI.
So something I have actually been doing for the last decade or so is creating properties at a directory level to get that same amount of information, but to a specific directory. Some good stuff that I have been able to do with that is connect to Looker Studio and be able to create great graphs and reports, filters of those directories. To me, it’s a lot easier to do it that way. Of course, you could probably do it with just a single property, but this just gets us more information at a directory level, like example.com/toys.
Next I want to dive into our sitemaps. So as you know, it’s a laundry list of URLs you want Google to see. Typically you throw 50,000, if your site is that big, into a sitemap, drop it at the root, put it in robots.txt, go ahead and throw it in Search Console, and Google will tell you that they’ve successfully accepted it, crawled it, and then you can see the page indexation report and what they’re giving you about that sitemap. But a problem that I’ve been having lately, especially at the site that I’m working at now with millions of URLs, is that Google doesn’t always accept that sitemap, at least not right away. Sometimes it’s taken a couple weeks for Google to even say, “Hey, all right, we’ll accept this sitemap,” and even longer to get any useful data out of that.
So to help get past that issue that I’ve been having, I now break my sitemaps into 10,000 URL pieces. It’s a lot more sitemaps, but that’s what your sitemap index is for. It helps Google collect all that information bundled up nicely, and they get to it. The trade-off is Google accepts those sitemaps immediately, and within a day I’m getting useful information.
Now I like to go even further than that, and I break up my sitemaps by directory. So each sitemap or sitemap index is of the URLs in that directory, if it’s over 50,000 URLs. That’s extremely helpful because now, when you combine that with your property at that toys directory, like we have here in our example, I’m able to see just the indexation status for those URLs by themselves. I’m no longer forced to use that root property that has a hodgepodge of data for all your URLs. Extremely helpful, especially if I’m launching a new product line and I want to make sure that Google is indexing and giving me the data for that new toy line that I have.
Always I think a good practice is make sure you ping your sitemaps. Google has an API, so you can definitely automate that process. But it’s super helpful. Every time there’s any kind of a change to your content, add sites, add URLs, remove URLs, things like that, you just want to ping Google and let them know that you have a change to your sitemap.
All the data
So now we’ve done all this great stuff. What do we get out of that? Well, you get tons of data, and I mean a ton of data. It’s super useful, as mentioned, when you’re trying to launch a new product line or diagnose why there’s something wrong with your site. Again, we do have a 1,000 limit per property. But when you create multiple properties, you get even more data, specific to those properties, that you could export and get all the valuable information from.
Even cooler is recently Google rolled out their Inspection API. Super helpful because now you can actually run a script, see what the status is of those URLs, and hopefully some good information out of that. But again, true to Google’s nature, we have a 2,000 limit for calls on the API per day per property. However, that’s per property. So if you have a lot of properties, and you can have up to 50 Search Console properties per account, now you could roll 100,000 URLs into that script and get the data for a lot more URLs per day. What’s super awesome is Screaming Frog has made some great changes to the tool that we all love and use every day, to where you cannot only connect that API, but you can share that limit across all your properties. So now grab those 100,000 URLs, slap them in Screaming Frog, drink some coffee, kick back and wait till the data pours out. Super helpful, super amazing. It makes my job insanely easier now because of that. Now I’m able to go through and see: Is it a Google thing, discovered or crawled and not indexed? Or are there issues with my site to why my URLs are not showing in Google?
Bonus: Page experience report
As an added bonus, you have the page experience report in Search Console that talks about Core Vitals, mobile usability, and some other data points that you could get broken down at the directory level. That makes it a lot easier to diagnose and see what’s going on with your site.
Hopefully you found this to be a useful Whiteboard Friday. I know these tactics have definitely helped me throughout my career in SEO, and hopefully they’ll help you too. Until next time, let’s keep crawling.