What is a Robots.txt file?

You are here:

Knowledgehub Home
Technical Issues/ Hosting
What is a Robots.txt file?

In this article

Robots.txt file

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don’t want robots to use.

Robots.txt works well for:

Barring crawlers from non-public parts of your website
Barring search engines from trying to index scripts, utilities, or other types of code
Avoiding the indexation of duplicate content on a website, such as “print” versions of html pages
Auto-discovery of XML Sitemaps.

Was this article helpful?

Like0 Dislike 0

Previous: How to set up Mac Mail

Next: Adding a Parked Domain

October 17, 2022,

Rob Jennings

When he found himself in a business conversation with someone talking about their ‘customer-centric core competencies’ he realised it was time to create a digital agency that was less about self-promoting buzz-words and more about the practical endeavour to assist clients in making effective use of the web.