Scanning large sites in Enterprise Edition TN-W20

Enterprise Edition scans are limited by the license type:

25k licenses can report/map up to 25,000 pages per scan
50k licenses can report/map up to 50,000 pages per scan
100k licenses can report/map up to 100,000 pages per scan

Memory limits may also come into play:

Each scan process is limited to 1.5GB of RAM to avoid interfering with other scans or the web app
Each unique URL used on the site occupies 48 bytes plus the number of bytes in the URL
Each issue reported uses 48 bytes for each page it’s reported on
Each line reported for an issue uses 8 bytes

For sitemaps the maximum number of pages that can be reported is 50,000 on a 50k license. One sitemap report (Excel Link Report) lists each link on every page, so this report can become very large if each page has a large number of links (a 300MB CSV file is not uncommon)

For scan reports the maximum number of issues and lines reported is controlled by these settings in Edit Scan:

Maximum pages listed per issue (default 20)
Maximum line numbers per issue (default 4)

There are a fixed number of rules (over 1300) for checking accessibility and browser compatibility, but these rules are added to by HTML validation because a rule is created for each unknown element: “Element XYZ not allowed as child element”.

Coding errors like this:

 <adata-FFE4='FFED'>
 <adata-D5FE='AAC4'>

which has a missing space and should read:

 <a data-FFE4='FFED'>
 <a data-D5FE='AAC4'>

can result in large amounts of memory used due to a rule being created for each unknown element:

“Element adata-FFE4 not allowed as child element”
“Element adata-D5FE not allowed as child element”

This problem is rare, but if it happens turning off HTML validation (Edit Scan->Standards) will greatly reduce the amount of memory used.

Applies To: Enterprise Edition 2016.1 or later

Last Reviewed: October 31, 2017