Chiropractic Denver: September 2021

Wednesday, September 29, 2021

Reveal Your Rivals with True Competitor (Beta)

One of the biggest challenges in SEO is trying to convince your client or boss that the competition they face online may not match their legacy competitors and personal grudges. Big Earl across the street at Big Earl’s Widgets may be irritating and, sure, maybe he does have a “stupid, smug face,” but that doesn’t change the fact that WidgetShack.com is eating your lunch (and let’s not even talk about Amazon).

To make matters worse, competitive analysis is time-consuming and tedious work, even if you do have access to the data. Today, after years of rethinking how competitive analysis should work (and, honestly, re-rethinking it on many occasions), I’m proud to announce the first step in expanding Moz’s competitive analysis toolkit — True Competitor.

Try True Competitor

What is True Competitor?

Before I dive into the details, let’s take it out for a spin. Just enter your domain or subdomain and your locale (the beta supports English-language markets in the United States, Great Britain, Australia, and Canada):

Then let the tool do its work. You’ll get back something like this:

True Competitor pulls ranking keywords (by highest-volume) for any domain in our Keyword Explorer database — even your competitors’ and prospects’ domains — and analyzes recent Google SERPs to find out who you’re truly competing against.

What are Overlap and Rivalry?

Hopefully, you’re already familiar with our proprietary Domain Authority (DA) metric, but Overlap and Rivalry are new to True Competitor. Overlap is simple — it’s the percentage of shared keywords where the target site and the competitor both ranked in the top 10 traditional organic results. This is essentially a Share of Voice (SoV) metric. It’s a good first stop, and you can sort by DA or Overlap for multiple views of the data — but what if the keywords you overlap on aren’t particularly relevant, or a competitor is just too far out of reach?

That’s where Rivalry comes in. Rivalry factors in the Click-Thru Rate (CTR) and volume of overlapping keywords, the target site’s ranking (keywords where the target ranks higher are more likely to be relevant), and the proximity of the two sites’ DA scores to help you sort which competitors are the most relevant and realistic.

What can you do with this data?

Hopefully, you can use True Competitor to validate your own assumptions, challenge bad assumptions, and learn about competitors you might not have considered. That’s not all, though — select up to two competitors for in-depth information:

Just click on [ + Analyze Competitors ] and your selections will be auto-filled in our Keyword Overlap tool in Keyword Explorer. Here, you can dive deeper into your keyword overlap and find specific keywords to target with your SEO efforts:

We’re currently working on new ways to analyze this data and help you surface the most relevant keyword and content overlaps. We hope to have more to announce in Q4.

This list doesn’t match my list!

GOOD. Sorry, that’s a little flippant. Ultimately, we hope there’s something new and unexpected in this data. Otherwise, what’s the point? The goal of True Competitor is to help you see who you’re really up against in Google rankings. How you use that information is up to you.

I’d like to challenge you, dear reader, on one point. We have a bad habit of thinking of the “competition” as a single, small set of sites or companies. In the example above, I chose to explore SEMrush and Ahrefs, because they’re our most relevant product competitors. Consider if I had taken a different route:

Looking at our SEO news competitors paints a different but also very useful picture, especially for our content team and writers. We also have multiple Google subdomains showing up in our Top 25 — some Google products (like Google Search Console) are competitors, and some (like Google Analytics) are simply of interest to our readership and topics that we cover.

My challenge to you is to really think about these different spheres of competition and move beyond a singular window of what “competitor” means. You may not target all of these competitors or even care about them all on any given day, and that’s fine, but each window is an area that might uniquely inform your SEO and content strategies.

As a Subject Matter Expert at Moz, I have the privilege of working on multiple parts of our product, but this project is something I’ve been thinking about for a long time and is near and dear to me. I’d like to personally thank our Product team — Igor, Hayley, and Darian — for all of their hard work, leadership, and pushback to make this product better. Many thanks also to our App Front-end Engineering team, and a special shout-out to Maura and Grant for helping port the original prototype into an actual product.

Get started with True Competitor

True Competitor is currently available in beta for all Moz Pro customers and community accounts.

Try True Competitor

We welcome your feedback — please click on the [Make a Suggestion] button in the upper-right of the True Competitor home-page if you have any specific comments or concerns.

Reveal Your Rivals with True Competitor (Beta)

Try True Competitor

What is True Competitor?

Then let the tool do its work. You’ll get back something like this:

What are Overlap and Rivalry?

What can you do with this data?

We’re currently working on new ways to analyze this data and help you surface the most relevant keyword and content overlaps. We hope to have more to announce in Q4.

This list doesn’t match my list!

Get started with True Competitor

True Competitor is currently available in beta for all Moz Pro customers and community accounts.

Try True Competitor

We welcome your feedback — please click on the [Make a Suggestion] button in the upper-right of the True Competitor home-page if you have any specific comments or concerns.

Monday, September 27, 2021

Google Local Filler Content Isn't Good UX, and Needs Revisions

Did you ever turn in a school paper full of vague ramblings, hoping your teacher wouldn’t notice that you’d failed to read the assigned book?

I admit, I once helped my little sister fulfill a required word count with analogies about “waves crashing against the rocks of adversity” when she, for some reason, overlooked reading The Communist Manifesto in high school. She got an A on her paper, but that isn’t the mark I’d give Google when there isn’t enough content to legitimately fill them local packs, Local Finders, and Maps.

The presence of irrelevant listings in response to important local queries:

Makes it unnecessarily difficult for searchers to find what they need
Makes it harder for relevant businesses to compete
Creates a false impression of bountiful local choice of resources, resulting in disappointing UX

Today, we’ll look at some original data in an attempt to quantify the extent of this problem, and explore what Google and local businesses can do about it.

What’s meant by “local filler” content and why is it such a problem?

The above screenshot captures the local pack results for a very specific search for a gastroenterologist in Angels Camp, California. In its effort to show me a pack, Google has scrambled together results that are two-thirds irrelevant to the full intent of my query, since I am not looking for either an eye care center or a pediatrician. The third result is better, even though Google had to travel about 15 miles from my specified search city to get it, because Dr. Eddi is, at least, a gastroenterologist.

It’s rather frustrating to see Google allowing the one accurate specialist to be outranked by two random local medical entities, perhaps simply because they are closer to home. It obviously won’t do to have an optometrist or children’s doctor consult with me on digestive health, and unfortunately, the situation becomes even odder when we click through to the local finder:

Of the twenty results Google has pulled together to make up the first page of the local finder, only two are actually gastroenterologists, lost in the weeds of podiatrists, orthopedic surgeons, general MDs, and a few clinics with no clarity as to whether their presence in the results relates to having a digestive health specialists on staff . Zero of the listed gastroenterologists are in the town I’ve specified. The relevance ratio is quite poor for the user and shapes a daunting environment for appropriate practitioners who need to be found in all this mess.

You may have read me writing before about local SEO seeking to build the online mirror of real-world communities. That’s the ideal: ensuring that towns and cities have an excellent digital reference guide to the local resources available to them. Yet when I fact-checked with the real world (calling medical practices around this particular town), I found that there actually are no gastroenterologists in Angels Camp, even though Google’s results might make it look like there must be. What I heard from locals is that you must either take a 25 minute drive to Sonora to see a GI doctor, or head west for an hour and fifteen minutes to Modesto for appropriate care.

Google has yoked itself to AI, but the present state of search leaves it up to my human intelligence to realize that the SERPs are making empty promises, and that there are, in fact, no GI docs in Angels Camp. This is what a neighbor, primary care doctor, or local business association would tell me if I was considering moving to this community and needed to be close to specialists. But Google tells me that there are more than 23 million organic choices relevant to my requirements, and scores of local business listings that so closely match my intent, they deserve pride of place in 3-packs, Finders and Maps.

The most material end result for the Google user is that they will likely experience unnecessary fatigue wasting time on the phone calling irrelevant doctors at a moment when they are in serious need of help from an appropriate professional. As a local SEO, I’m conditioned to look at local business categories and can weed out useless content almost automatically because of this, but is the average searcher noticing the truncated “eye care cent…” on the above listing? They’re almost certainly not using a Chrome extension like GMB Spy to see all the possible listing categories since Google decided to hide them years ago.

On a more philosophical note, my concern with local SERPs made up of irrelevant filler content is that they create a false picture of local bounty. As I recently mentioned to Marie Haynes:

The work of local businesses (and local SEOs!) derives its deepest meaning from providing and promoting essential local resources. Google’s inaccurate depiction of abundance could, even if in a small way, contribute to public apathy. The truth is that the US is facing a severe shortage of doctors, and anything that doesn’t reflect this reality could, potentially, undermine public action on issues like why our country, unlike the majority of nations, doesn’t make higher education free or affordable so that young people can become the medical professionals and other essential services providers we unquestionably need to be a functional society. Public well-being depends on complete accuracy in such matters.

As a local SEO, I want a truthful depiction of how well-resourced each community really is on the map, as a component of societal thought and decision-making. We’re all coping with public health and environmental emergencies now and know in our bones how vital essential local services have become.

Just how big is the problem of local filler content?

If the SERPs were more like humans, my query for “gastroenterologist Angels Camp” would return something like a featured snippet stating, “Sorry, our index indicates there are no GI Docs in Angels Camp. You’ll need to look in Sonora or Modesto for nearest options.” It definitely wouldn’t create the present scenario of, “Bad digestive system? See an eye doctor!” that’s being implied by the current results. I wanted to learn just how big this problem has become for Google.

I looked at the local packs in 25 towns and cities across California of widely varying populations using the search phrase “gastroenterologist” and each of the localities. I noted how many of the results returned were within the city specified in my search and how many used “gastroenterologist” as their primary category. I even gave Google an advantage in this test by allowing entries that didn’t use gastroenterologist as their primary category but that did have some version of that word in their business title (making the specialty clearer to the user) to be included in Google’s wins column. Of the 150 total data points I checked, here is what I found:

42% of the content Google presented in local packs had no obvious connection to gastroenterology. It’s a shocking number, honestly. Imagine the number of wearying, irrelevant calls patients may be making seeking digestive health consultation if nearly half of the practices listed are not in this field of medicine.

A pattern I noticed in my small sample set is that larger cities had the most relevant results. Smaller towns and rural areas had much poorer relevance ratios. Meanwhile, Google is more accurate as to returning results within the query’s city, as shown by these numbers:

The trouble is, what looks like more of a win for Google here doesn’t actually chalk up as a win for searchers. In my data set, where Google was accurate in showing results from my specified city, the entities were often simply not GI doctors. There were instances in which all 3 results got the city right, but zero of the results got the specialty right. In fact, in one very bizarre case, Google showed me this:

Welders aside, it’s important to remember that our initial Angels Camp example demonstrated how the searcher, encountering a pack with filler listings in it and drilling down further into the Local Finder results for help may actually end up with even less relevance. Instead of two-out-of-three local pack entries being useless to them, they may end up with two-out-of-twenty unhelpful listings, with relevance consigned to obscurity.

And, of course, filler listings aren’t confined to medical categories. I engaged in this little survey because I’d noticed how often, in category after category, the user experience is less-than-ideal.

What should Google do to lessen the poor UX of irrelevant listings?

Remember that we’re not talking about spam here. That’s a completely different headache in Googleland. I saw no instances of spam in my data. The welder was not trying to pass himself off as a doctor. Rather, what we have here appears to be a case of Google weighting location keywords over goods/services keywords, even when it makes no sense to do so.

Google needs to develop logic that excludes extremely irrelevant listings for specific head terms to improve UX. How might this logic work?

1. Google could rely more on their own categories. Going back to our original example in which an eye care center is the #1 ranked result for “gastroenterologist angels camp”, we can use GMB Spy to check if any of the categories chosen by the business is “gastroenterologist”:

Google can, of course, see all the categories, and this lack of “gastroenterologist” among them should be a big “no” vote on showing the listing for our query.

2. Google could cross check the categories with the oft-disregarded business description:

Again, no mention of gastroenterological services there. Another “no” vote.

3. Google could run sentiment analysis on the reviews for an entity, checking to see if they contain the search phrase:

Lots of mentions of eye care here, but the body of reviews contains zero mentions of intestinal health. Another “no” vote.

4. Google could cross check the specified search phrases against all the knowledge they have from their crawls of the entity’s website:

This activity should confirm that there is no on-site reference to Dr. Haymond being anything other than an ophthalmologist . Then Google would need to make a calculation to downgrade the significance of the location (Angels Camp) based on internal logic that specifies that a user looking for a gastroenterologist in a city would prefer to see gastroenterologists a bit farther away than seeing eye doctors (or welders) nearby. So, this would be another “no” vote for inclusion as a result for our query.

5. Finally, Google could cross reference this crawl of the website against their wider crawl of the web:

This should act as a good, final confirmation that Dr. Haymond is an eye doctor rather than a gastroenterologist, even if he is in our desired city, and give us a fifth “no” vote for bringing his listing up in response to our query.

The web is vast, and so is Google’s job, but I believe the key to resolving this particular type of filler content is for Google to rely more on the knowledge they have of an entity’s vertical and less on their knowledge of its location. A diner may be willing to swap out tacos for pizza if there’s a Mexican restaurant a block away but no pizzerias in town, but in these YMYL categories, the same logic should not apply.

It’s not uncommon for Google to exclude local results from appearing at all when their existing logic tells them there isn’t a good answer. It’s tempting to say that solving the filler content problem depends on Google expanding the number of results for which they don’t show local listings. But, I don’t think this is a good solution, because the user then commonly sees irrelevant organic entries, instead of local ones. It seems to me that a better path is for Google to expand the radius of local SERPs for a greater number of queries so that a search like ours receives a map of the nearest gastroenterologists, with closer, superfluous businesses filtered out.

What should you do if a local business you’re promoting is getting lost amid filler listings?

SEO is going to be the short answer to this problem. It’s true that you can click the “send feedback” link at the bottom of the local finder, Google Maps or an organic SERP, and fill out form like this, with a screenshot:

However, my lone report of dissatisfaction with SERP quality is unlikely to get Google to change the results. Perhaps if they received multiple reports…

More practically-speaking, if a business you’re promoting is getting lost amid irrelevant listings, search engine optimization will be your strongest tool for convincing Google that you are, in fact, the better answer. In our study, we realized that there are, in fact, no GI docs in Angels Camp, and that the nearest one is about fifteen miles away. If you were in charge of marketing this particular specialist, you could consider:

1. Gaining a foothold in nearby towns and cities

Recommend that the doctor develop real-world relationships with neighboring towns from which he would like to receive more clients. Perhaps, for example, he has hospital privileges, or participates in clinics or seminars in these other locales.

2. Writing about locality relationships

Publish content on the website highlighting these relationships and activities to begin associating the client’s name with a wider radius of localities.

3. Expanding the linktation radius

Seek relevant links and unstructured citations from the neighboring cities and towns, on the basis of these relationships and participation in a variety of community activities.

4. Customizing review requests based on customers’ addresses

If you know your customers well, consider wording review requests to prompt them to mention why it’s worth it to them to travel from X location for goods/services (nota bene: medical professionals, of course, need to be highly conversant with HIPPA compliance when it comes to online reputation management).

5. Filling out your listings to the max

Definitely do give Google and other local business listing platforms the maximum amount of information about the business you’re marketing (Moz Local can help!) . Fill out all the fields and give a try to functions like Google Posts, product listings, and Q&A.

6. Sowing your seeds beyond the walled garden

Pursue an active social media, video, industry, local news, print, radio, and television presence to the extent that your time and budget allows. Google’s walled garden, as defined by my friend, Dr. Pete, is not the only place to build your brand. And, if my other pal, Cyrus Shepard, is right, anti-trust litigation could even bring us to a day when Google’s own ramparts become less impermeable. In the meantime, work at being found beyond Google while you continue to grapple with visibility within their environment.

Study habits

It’s one thing for a student to fudge a book report, but squeaking by can become a negative lifelong habit if it isn’t caught early. I’m sure any Google staffer taking the time to actually read through the local packs in my survey would agree that they don’t rate an A+.

I’ve been in local SEO long enough to remember when Google first created their local index with filler content pulled together from other sources, without business owners having any idea they were even being represented online, and these early study habits seem to have stuck with the company when it comes to internal decision making that ends up having huge real-world impacts. The recent title tag tweak that is rewriting erroneous titles for vaccine landing pages is a concerning example of this lack of foresight and meticulousness.

If I could create a syllabus for Google’s local department, it would begin with separating out categories of the greatest significance to human health and safety and putting them through a rigorous, permanent manual review process to ensure that results are as accurate as possible, and as free from spam, scams, and useless filler content as the reviewers can make them. Google has basically got all of the money and talent in the world to put towards quality, and ethics would suggest they are obliged to make the investment.

Society deserves accurate search results delivered by studious providers, and rural and urban areas are worthy of equal quality commitments and a more nuanced approach than one-size-fits all. Too often, in Local, Google is flunking for want of respecting real-world realities. Let’s hope they start applying themselves to the fullest of their potential.

Google Local Filler Content Isn't Good UX, and Needs Revisions

Did you ever turn in a school paper full of vague ramblings, hoping your teacher wouldn’t notice that you’d failed to read the assigned book?

The presence of irrelevant listings in response to important local queries:

Makes it unnecessarily difficult for searchers to find what they need
Makes it harder for relevant businesses to compete
Creates a false impression of bountiful local choice of resources, resulting in disappointing UX

Today, we’ll look at some original data in an attempt to quantify the extent of this problem, and explore what Google and local businesses can do about it.

What’s meant by “local filler” content and why is it such a problem?

On a more philosophical note, my concern with local SERPs made up of irrelevant filler content is that they create a false picture of local bounty. As I recently mentioned to Marie Haynes:

Just how big is the problem of local filler content?

What should Google do to lessen the poor UX of irrelevant listings?

Google needs to develop logic that excludes extremely irrelevant listings for specific head terms to improve UX. How might this logic work?

Google can, of course, see all the categories, and this lack of “gastroenterologist” among them should be a big “no” vote on showing the listing for our query.

2. Google could cross check the categories with the oft-disregarded business description:

Again, no mention of gastroenterological services there. Another “no” vote.

3. Google could run sentiment analysis on the reviews for an entity, checking to see if they contain the search phrase:

Lots of mentions of eye care here, but the body of reviews contains zero mentions of intestinal health. Another “no” vote.

4. Google could cross check the specified search phrases against all the knowledge they have from their crawls of the entity’s website:

5. Finally, Google could cross reference this crawl of the website against their wider crawl of the web:

What should you do if a local business you’re promoting is getting lost amid filler listings?

However, my lone report of dissatisfaction with SERP quality is unlikely to get Google to change the results. Perhaps if they received multiple reports…

1. Gaining a foothold in nearby towns and cities

2. Writing about locality relationships

Publish content on the website highlighting these relationships and activities to begin associating the client’s name with a wider radius of localities.

3. Expanding the linktation radius

Seek relevant links and unstructured citations from the neighboring cities and towns, on the basis of these relationships and participation in a variety of community activities.

4. Customizing review requests based on customers’ addresses

5. Filling out your listings to the max

6. Sowing your seeds beyond the walled garden

Study habits

Friday, September 24, 2021

Crawl Budget

In today’s episode of Whiteboard Friday, Tom covers a more advanced SEO concept: crawl budget. Google has a finite amount of time it's willing to spend crawling your site, so if you’re having issues with indexation, this is a topic you should care about.

Video Transcription

Happy Friday, Moz fans, and today's topic is crawl budget. I think it's worth saying right off the bat that this is somewhat of a more advanced topic or one that applies primarily to larger websites. I think even if that's not you, there is still a lot you can learn from this in terms of SEO theory that comes about when you're looking at some of the tactics you might employ or some of the diagnostics you might employ for a crawl budget.

But in Google's own documentation they suggest that you should care about crawl budget if you have more than a million pages or more than 10,000 pages that are updated on a daily basis. I think those are obviously kind of hard or arbitrary thresholds. I would say that if you have issues with your site getting indexed and you have pages deep on your site that are just not getting into the index that you want to, or if you have issues with pages not getting indexed quickly enough, then in either of those cases crawl budget is an issue that you should care about.

What is crawl budget?

Drawing of a spider holding a dollar bill.

So what actually is crawl budget? Crawl budget refers to the amount of time that Google is willing to spend crawling a given site. Although it seems like Google is sort of all-powerful, they have finite resources and the web is vast. So they have to prioritize somehow and allocate a certain amount of time or resource to crawl a given website.

Now they prioritize based on — or so they say they prioritize based on the popularity of sites with their users and based on the freshness of content, because Googlebot sort of has a thirst for new, never-before-seen URLs.

We're not really going to talk in this video about how to increase your crawl budget. We're going to focus on how to make the best use of the crawl budget you have, which is generally an easier lever to pull in any case.

Causes of crawl budget issues

So how do issues with crawl budget actually come about?

Facets

Now I think the main sort of issues on sites that can lead to crawl budget problems are firstly facets.

So you can imagine on an e-comm site, imagine we've got a laptops page. We might be able to filter that by size. You have a 15-inch screen and 16 gigabytes of RAM. There might be a lot of different permutations there that could lead to a very large number of URLs when actually we've only got one page or one category as we think about it — the laptops page.

Similarly, those could then be reordered to create other URLs that do the exact same thing but have to be separately crawled. Similarly they might be sorted differently. There might be pagination and so on and so forth. So you could have one category page generating a vast number of URLs.

Search results pages

A few other things that often come about are search results pages from an internal site search can often, especially if they're paginated, they can have a lot of different URLs generated.

Listings pages

Listings pages. If you allow users to upload their own listings or content, then that can over time build up to be an enormous number of URLs if you think about a job board or something like eBay and it probably has a huge number of pages.

Fixing crawl budget issues

Chart of crawl budget issue solutions and whether they allow crawling, indexing, and PageRank.

So what are some of the tools that you can use to address these issues and to get the most out of your crawl budget?

So as a baseline, if we think about how a normal URL behaves with Googlebot, we say, yes, it can be crawled, yes, it can be indexed, and yes, it passes PageRank. So a URL like these, if I link to these somewhere on my site and then Google follows that link and indexes these pages, these probably still have the top nav and the site-wide navigation on them. So the link actually that's passed through to these pages will be sort of recycled round. There will be some losses due to dilution when we're linking through so many different pages and so many different filters. But ultimately, we are recycling this. There's no sort of black hole loss of leaky PageRank.

Robots.txt

Now at the opposite extreme, the most extreme sort of solution to crawl budget you can employ is the robots.txt file.

So if you block a page in robots.txt, then it can't be crawled. So great, problem solved. Well, no, because there are some compromises here. Technically, sites and pages blocked in robots.txt can be indexed. You sometimes see sites showing up or pages showing up in the SERPs with this meta description cannot be shown because the page is blocked in robots.txt or this kind of message.

So technically, they can be indexed, but functionally they're not going to rank for anything or at least anything effective. So yeah, well, sort of technically. They do not pass PageRank. We're still passing PageRank through when we link into a page like this. But if it's then blocked in robots.txt, the PageRank goes no further.

So we've sort of created a leak and a black hole. So this is quite a heavy-handed solution, although it is easy to implement.

Link-level nofollow

Link-level nofollow, so by this I mean if we took our links on the main laptops category page, that were pointing to these facets, and we put a nofollow attribute internally on those links, that would have some advantages and disadvantages.

I think a better use case for this would actually be more in the listings case. So imagine if we run a used car website, where we have millions of different used car individual sort of product listings. Now we don't really want Google to be wasting its time on these individual listings, depending on the scale of our site perhaps.

But occasionally a celebrity might upload their car or something like that, or a very rare car might be uploaded and that will start to get media links. So we don't want to block that page in robots.txt because that's external links that we would be squandering in that case. So what we might do is on our internal links to that page we might internally nofollow the link. So that would mean that it can be crawled, but only if it's found, only if Google finds it in some other way, so through an external link or something like that.

So we sort of have a halfway house here. Now technically nofollow these days is a hint. In my experience, Google will not crawl pages that are only linked to through an internal nofollow. If it finds the page in some other way, obviously it will still crawl it. But generally speaking, this can be effective as a way of restricting crawl budget or I should say more efficiently using crawl budget. The page can still be indexed.

That's what we were trying to achieve in that example. It can still pass PageRank. That's the other thing we were trying to achieve. Although you are still losing some PageRank through this nofollow link. That still counts as a link, and so you're losing some PageRank that would otherwise have been piped into that follow link.

Noindex, nofollow

Noindex and nofollow, so this is obviously a very common solution for pages like these on ecomm sites.

Now, in this case, the page can be crawled. But once Google gets to that page, it will discover it's noindex, and it will crawl it much less over time because there is sort of less point in crawling a noindex page. So again, we have sort of a halfway house here.

Obviously, it can't be indexed. It's noindex. It doesn't pass PageRank outwards. PageRank is still passed into this page, but because it's got a nofollow in the head section, it doesn't pass PageRank outwards. This isn't a great solution. We've got some compromises that we've had to achieve here to economize on crawl budget.

Noindex, follow

So a lot of people used to think, oh, well, the solution to that would be to use a noindex follow as a sort of best of both. So you put a noindex follow tag in the head section of one of these pages, and oh, yeah, everyone is a winner because we still get the same sort of crawling benefit. We're still not indexing this sort of new duplicate page, which we don't want to index, but the PageRank solution is fixed.

Well, a few years ago, Google came out and said, "Oh, we didn't realize this ourselves, but actually as we crawl this page less and less over time, we will stop seeing the link and then it kind of won't count." So they sort of implied that this no longer worked as a way of still passing PageRank, and eventually it would come to be treated as noindex and nofollow. So again, we have a sort of slightly compromised solution there.

Canonical

Now the true best of all worlds might then be canonical. With the canonical tag, it's still going to get crawled a bit less over time, the canonicalized version, great. It's still not going to be indexed, the canonicalized version, great, and it still passes PageRank.

So that seems great. That seems perfect in a lot of cases. But this only works if the pages are near enough duplicates that Google is willing to consider them a duplicate and respect the canonical. If they're not willing to consider them a duplicate, then you might have to go back to using the noindex. Or if you think actually there's no reason for this URL to even exist, I don't know how this wrong order combination came about, but it seems pretty pointless.

301

I'm not going to link to it anymore. But in case some people still find the URL somehow, we could use a 301 as a sort of economy that is going to perform pretty well eventually for... I'd say even better than canonical and noindex for saving crawl budget because Google doesn't even have to look at the page on the rare occasion it does check it because it just follows the 301.

It's going to solve our indexing issue, and it's going to pass PageRank. But obviously, the tradeoff here is users also can't access this URL, so we have to be okay with that.

Implementing crawl budget tactics

So sort of rounding all this up, how would we actually employ these tactics? So what are the activities that I would recommend if you want to have a crawl budget project?

One of the less intuitive ones is speed. Like I said earlier, Google is sort of allocating an amount of time or amount of resource to crawl a given site. So if your site is very fast, if you have low server response times, if you have lightweight HTML, they will simply get through more pages in the same amount of time.

So this counterintuitively is a great way to approach this. Log analysis, this is sort of more traditional. Often it's quite unintuitive which pages on your site or which parameters are actually sapping all of your crawl budget. Log analysis on large sites often yields surprising results, so that's something you might consider. Then actually employing some of these tools.

So redundant URLs that we don't think users even need to look at, we can 301. Variants that users do need to look at, we could look at a canonical or a noindex tag. But we also might want to avoid linking to them in the first place so that we're not sort of losing some degree of PageRank into those canonicalized or noindex variants through dilution or through a dead end.

Robots.txt and nofollow, as I sort of implied as I was going through it, these are tactics that you would want to use very sparingly because they do create these PageRank dead ends. Then lastly, a sort of recent or more interesting tip that I got a while back from an Ollie H.G. Mason blog post, which I'll probably link to below, it turns out that if you have a sitemap on your site that you only use for fresh or recent URLs, your recently changed URLS, then because Googlebot has such a thirst, like I said, for fresh content, they will start crawling this sitemap very often. So you can sort of use this tactic to direct crawl budget towards the new URLs, which sort of everyone wins.

Googlebot only wants to see the fresh URLs. You perhaps only want Googlebot to see the fresh URLs. So if you have a sitemap that only serves that purpose, then everyone wins, and that can be quite a nice and sort of easy tip to implement. So that's all. I hope you found that useful. If not, feel free to let me know your tips or challenges on Twitter. I'm curious to see how other people approach this topic.

Video transcription by Speechpad.com.