Inbound marketing blog

How to set robots.txt in HubSpot and why it matters

Posted by Steve Oakman on Apr 1, 2014 11:44:00 AM

How to set robots.txt in HubSpotPeople often ask us how to do things in HubSpot. When we get these questions we like to turn them into blog posts in our HubSpot FAQs series. In this post we'll look at how you can use your robots.txt file in HubSpot and what getting this right (and wrong) can mean for your marketing.
 

What is a robots.txt file?

A robots.txt file is used to tell web bots and other search engine crawlers where they can’t go. It stands for Robots Exclusion Protocol. Before bots visit a web page they check the robots.txt file to see whether they’re allowed to go there. If the file says User-agent: *Disallow: / the bot steers clear.

Sadly disallowing bots doesn’t mean they can’t get to content. It just means they usually accept your request. Malware and spam bots won’t take any notice. Because the file is, by nature, available publicly and everyone can see the areas of your server you want kept out of the search results, you can’t use robots.txt to hide information.

Robots.txt is useful if you have pages or documents on your site that you don’t want to be crawled by search bots, indexed and presented for anyone to access via the search results. 
 

Avoid becoming invisible to search engines

One situation that sometimes occurs is when a site's robots.txt is set incorrectly. This can happen for many reasons, for example if your site is quite new then your developers may have set the robots.txt to prevent unfinished pages from being crawled. This can have disatrous consequences for your SEO, as potentially your site might not appear in the search results at all.

To check the robots.txt of your site, just visit your equivalent of http://www.example.com/robots.txt. If you want search bots to access all areas of your site, you'll want to see:

User-agent: *
Disallow:

However, you definately don't want to see:

User-agent: *
Disallow: /

If you do, you'll be blocking search bots from your site entirely.
 
 

Other examples of robots.txt

Here are some other examples, formatted in the same way they'd appear in the file itself. 

To exclude a specific bot (in the example we’ve called it ‘BotName’):

User-agent: BotName
Disallow: /

To allow just one bot access and block the rest (in the example we’ve let the Google bot through):

User-agent: Google
Disallow:

User-agent: *
Disallow: /
 

Using robots.text in HubSpot

There may be pages on your site that you'd prefer didn't appear in the search engine results. To restrict bots' access to these, you could set the robots.txt file up as follows:

User-agent: *
Disallow: /example-page-1
Disallow: /example-page-2
Allow: /

Similarly, you may want to restrict bot's access to your 'thank you' pages, which people are redirected to after filling out a form on one of your landing pages, as they'll probably include direct links to download your lead generation content offers such as ebooks and whitepapers.

Here's an example of a robots.txt file that excludes thank you pages:

User-agent: *
Disallow: /thank-you-for downloading-our-example-ebook-1
Disallow: /thank-you-for downloading-our-example-whitepaper-2
Allow: /

In the same way, you may want to stop some of the PDFs held on your site from being indexed if they're confidential or lead generation offers.

Here's how you could tell bots not to access those PDFs:

User-agent: *
Disallow: /pdf-example-1.pdf
Disallow: /pdf-example-2.pdf
Disallow: /pdf-example-3.pdf
Allow: /

As useful as these can be, it's worth remembering that a robots.txt file won't hide information on your site. That's because it's always accessible using the same address on every site and anyone can read it, as mentioned above. On HubSpot, if your information is confidential then a password protected page may be a better option.
 

How to check and edit robots.txt in HubSpot in 4 easy steps

If your website in on HubSpot's Content Optimisation System, then it's easy to check and edit your Robots.txt file. Here's how to do it:

  1. Go to Content > Content Settings and select the domain or subdomain you want to edit (using the Select a Domain to Edit dropdown box).

  2. Scroll down to Robots.txt (Advanced).

  3. Customise this robots.txt section. You'll only need to use the part of the URL after the domain or subdomain e.g. after www.example.com or info.example.com).

  4. Save your changes.

This makes it much simpler to check your robots.txt file and to prevent certain pages and content from appearing in the search results if you wish.
 

About Concentric Marketing

Concentric Marketing is an inbound marketing agency and a Gold Level HubSpot Agency Partner. If you want to know more about digital marketing on the HubSpot platform, please just call 0845 034 5603 or click here.

Topics: HubSpot CMS, SEO, HubSpot, HubSpot FAQs

New Call-to-action

HubSpot Certification

Our HubSpot Certification proves that we have comprehensive knowledge of HubSpot tools and how to use them to meet our clients’ goals.

Partner Certification

Our Partner Certification shows that we consistently and effectively plan, manage and deliver inbound marketing services for our clients.

HubSpot Design Certification

This certification demonstrates our expertise at designing and building websites on HubSpot's Content Optimisation System (COS). 

"Concentric were diligent and professional throughout the process, constantly adjusting to our ever evolving demands. The finished site has really helped to lift our brand and they have played a key role in this. If you need a HubSpot site - choose Concentric!"

Marketing Director, Technology Industry