Scanning large sites in Enterprise Edition TN-W20

Enterprise Edition scans are limited by the license type:

  • 25k licenses can report/map up to 25,000 pages per scan
  • 50k licenses can report/map up to 50,000 pages per scan
  • 100k licenses can report/map up to 100,000 pages per scan

Memory limits may also come into play:

  • Each scan process is limited to 1.5GB of RAM to avoid interfering with other scans or the web app
  • Each unique URL used on the site occupies 48 bytes plus the number of bytes in the URL
  • Each issue reported uses 48 bytes for each page it’s reported on
  • Each line reported for an issue uses 8 bytes

For sitemaps the maximum number of pages that can be reported is 50,000 on a 50k license. One sitemap report (Excel Link Report) lists each link on every page, so this report can become very large if each page has a large number of links (a 300MB CSV file is not uncommon)

For scan reports the maximum number of issues and lines reported is controlled by these settings in Edit Scan:

  • Maximum pages listed per issue (default 20)
  • Maximum line numbers per issue (default 4)

There are a fixed number of rules (over 1300) for checking accessibility and browser compatibility, but these rules are added to by HTML validation because a rule is created for each unknown element: “Element XYZ not allowed as child element”.

Coding errors like this:

 <adata-FFE4='FFED'>
 <adata-D5FE='AAC4'>

which has a missing space and should read:

 <a data-FFE4='FFED'>
 <a data-D5FE='AAC4'>

can result in large amounts of memory used due to a rule being created for each unknown element:

  • “Element adata-FFE4 not allowed as child element”
  • “Element adata-D5FE not allowed as child element”

This problem is rare, but if it happens turning off HTML validation (Edit Scan->Standards) will greatly reduce the amount of memory used.

Applies To: Enterprise Edition 2016.1 or later

Last Reviewed: October 31, 2017