Pages blocked by robots.txt, or too few pages scanned TN-M03

If too few pages are scanned there are several possible causes:

  • The crawler only visits pages on the same domain as the home page, so pages on a different domain do not appear on the map. To add other domains to the same report or sitemap, select the Options command from the View menu, and add the additional domain names to the Additional Domains box in the Links tab.

  • Some pages were blocked by the Robot Exclusion Standard (robots.txt) or explicitly blocked in the Blocks tab on the Options window.

To find out which links are blocked by robots.txt for a site, https://www.google.com for example, open the address https://www.google.com/robots.txt. If you get a 404 Not Found message then no links are blocked, if you get a text file back this lists which links are blocked.

In the desktop versions, you can ignore the Robot Exclusion Standard by selecting the Options command from the View menu and unchecking Obey Robots.txt

Adding the following entries to the robots.txt file will bypass any blocks intended for other web crawlers:

User-agent: PowerMapper
Allow: /

The PowerMapper user agent in robots.txt is understood by all PowerMapper products.

See Also: What is robots.txt

Applies To: PowerMapper 3.0 and SortSite 3.0 or later

Last Reviewed: November 14, 2020