What is a robots.txt file TN-W17
A robots.txt file is a digital "Keep Out" sign, designed to keep web crawlers out of certain parts of a web site.
The most common use of robots.txt is preventing pages appearing in search engines like Google. Common examples are:
- Login pages for restricted areas of a web sites
- Search results pages from site search
Scanning either of these types of pages shouldn't cause problems when SortSite or PowerMapper ignores the robots.txt option. You probably always want to check these pages for broken links and accessibility issues.
Robots.txt is also sometimes used to block robots from part of the sites where robots fetching lots of pages can cause problems (e.g. a set of pages that run slow database queries). This is much less common, since aggressive bad robots (e.g email address spambots) ignore robots.txt anyway, so blocking doesn't usually help.
Contact the web site's administrator or webmaster to determine the reason a site has blocked access via robots.txt.
Adding the following entries to the top of the robots.txt file, before any Disallow: directives, will bypass any blocks intended for other web crawlers:
User-agent: PowerMapper Allow: /
See Also: Wikipedia: Robots Exclusion Standard
Applies To: all versions
Last Reviewed: June 11, 2015