Layout table vs data table detection


Summary

The table element in HTML is not always treated as a table by assistive technology.

Historically HTML tables have been misused for layout. Before the introduction of CSS Grid in 2017 there was no reliable CSS method to layout HTML on a grid, so tables were often used instead (and some legacy browsers still don’t support CSS Grid).

This means there are two types of table:

  • Data tables containing data where rows and columns have meaning (e.g. the columns in the Periodic Table of Elements describe fundamental properties of matter)
  • Layout tables where the content could be presented in other ways with no loss of information (e.g. a grid of product images on an online store)

Data tables are read as 2-dimesional table objects in a screen reader (with right/left and up/down navigation), but cells in layout tables are read like a series of spans (with next/prev navigation).

Detection of layout tables is done by heuristics with major inconsistencies between browsers:

  • Tables with th elements in the first row or first column are nearly always treated as data tables
  • Tables with role=presentation are nearly always treated as layout tables
  • Any other table is semi-randomly classified as data or layout, with no consistency between screen readers, and in some cases based on totally random factors like browser window size

User impact

Exposing a layout table as a table creates problems for screen reader users because a lot of extra information is read inside tables (e.g. row / column position) and navigation changes from 1 dimensional next/previous item to 2 dimensional up/down/left/right.

Conversely, not exposing a data table as a table creates even worse problems because the tabular relationships have been removed. This makes the Periodic Table impossible to understand because important relationships like silver and gold being in the same column are lost (which means you can’t answer some chemistry exam questions).

Background

The misuse of layout tables led to user agents using heuristics to detect them. The HTML Standard encourages user agents to do this but doesn’t specify the exact heuristics.

In the early days of the the web, tables were commonly used for layout due to the widely varying CSS support in different browsers. This is less common now, but sometimes still happens. This technique presented problems for screen reader users because tabular relationships were voiced for non-tabular content. For example, a navigation bar in one cell and content in another cell would be voiced as a table row.

Screen readers compensate for this using heuristics to guess if a table is used for layout. When a layout table is detected, a screen reader linearizes table cells into a series of paragraphs, which prevents the table data being voiced as a rows and columns.

Unfortunately, this causes serious problems when a data table is wrongly identified as a layout table. For example, consider trying to understand the Periodic Table of the Elements as a long series of element names without the columns.

Heuristics by user agent

The following table is derived from the source code of Firefox, NVDA and WebKit, plus vendor documentation.

N/A in the table indicates an element, attribute or condition is not used by user agent’s heuristic

Layout table detection heuristics
Element or @attributeNVDA 2019.3
IE11
NVDA 2019.3
Firefox
JAWS 2019
IE11
JAWS 2019
Firefox
VoiceOver 10.14
Safari
HTML Standard
@role=tableN/ADataN/AN/ADataN/A
@role=presentationLayoutLayoutLayoutLayoutLayoutLayout
thDataDataData (1)Data (1)Data (2)Data
theadDataDataN/AN/ADataData
tfootDataDataN/AN/ADataN/A
captionDataData (3)N/AN/ADataData
colN/ADataN/AN/ADataN/A
colgroupDataDataN/AN/ADataN/A
rowgroupDataN/AN/AN/AN/ANon-standard element
@aria-colcount or @aria-rowcountN/AN/AN/AN/ADataN/A
@aria-colindex, @aria-rowindex, @aria-colspan, @aria-rowspanN/AN/AN/AN/ADataN/A
@contenteditableN/ADataN/AN/ADataN/A
@summary=""DataN/AN/AN/AN/ADon't use as heuristic
@summary="non-empty"DataDataN/AN/ADataDon't use as heuristic
@border=0N/AN/AN/AN/AN/ALayout
@border=1N/ADataN/AN/ADataData
@cellspacing=0N/AN/AN/AN/AN/ALayout
@cellpadding=0N/AN/AN/AN/AN/ALayout
@rulesN/AN/AN/AN/ADataN/A
@headersDataData (6)N/AN/ADataData
@scopeN/AData (6)N/AN/AN/AData
@axisN/AN/AN/AN/ADataN/A
@abbrN/AData (6)N/AN/AN/AN/A
@datatable=0N/ALayoutLayoutLayoutN/ANon-standard attribute
@datatable=1N/AN/ADataDataN/ANon-standard attribute
@datatable=trueN/AN/ADataDataN/ANon-standard attribute
CSS empty-cells:N/AN/AN/AN/ADataN/A
CSS bordersN/ADataN/AN/AData (4)Data
CSS cell background colorN/AN/AN/AN/AData (5)N/A
CSS alternating row colorsN/ADataN/AN/ADataN/A
Contains form controlsN/AN/AN/AN/AN/AN/A
Contains embed, applet or iframeN/ALayoutN/AN/AN/AN/A
Contains nested tableN/ALayoutN/AN/AN/AN/A
Contained in mathN/ADataN/AN/AN/AN/A
Cell contains abbrN/ADataN/AN/AN/AN/A
Cell contains acronymN/ADataN/AN/AN/AN/A
Only 1 cellN/ALayoutN/AN/ALayoutN/A
Only 1 rowN/ALayoutN/AN/AN/AN/A
Only 1 columnN/ALayoutN/AN/AN/AN/A
5 or more columnsN/ADataN/AN/AN/AN/A
20 or more rowsN/ADataN/AN/ADataN/A
2-4 columns, no borders, and >= 95% of doc widthN/ALayoutN/AN/AN/AN/A
2-4 columns, no borders, and 10 or fewer cellsN/ALayoutN/AN/AN/AN/A
JAWS Heisenberg heuristicN/AN/AData (7)Data (7)N/AN/A
DefaultLayout (8)DataLayout (8)Layout (8)Layout (8)N/A

Notes

This table shows the main interoperability issues, but is a simplification! It doesn’t capture all of the implementation subtleties in edge cases.

(1) From JAWS 11.0.756 onward.

(2) Only if th is in first column or first row. VoiceOver ignores other th elements for the purpose of the layout table heuristic.

(3) Firefox ignores caption for layout role calculation if it’s not first child, is empty or has an ARIA role.

(4) Only if more than 10 or at least 50% of the cells have borders.

(5) Only if more than 10 or at least 50% of the cells have a different background color to the table background color.

(6) Only if attribute is non-empty.

(7) See JAWS implementation below.

(8) Fallback to a data table which announces table structure is a much better fallback, since removing table structure when it’s needed causes major problems.

What the standards say

The HTML Standard encourages user agents to provide layout table detection heuristics and has suggestions for implementing them (shown in the last column of the table above) but cautions:

It is quite possible that the above suggestions are wrong. Implementers are urged to provide feedback elaborating on their experiences with trying to create a layout table detection heuristic.

How the implementations actually work

NVDA with Firefox

NVDA with Firefox uses the IAccessible2 API, so the role and name calculation is done by Firefox and exposed through IAccessible2. The type of table is exposed through a non-standard attribute called ’layout-guess’. This attribute is used to tell NVDA whether to ignore the table because it’s a layout table. This can be over-ridden by the ‘Include layout tables’ setting in NVDA.

The heuristic executes the following steps in order and returns when it finds a match:

  1. If contenteditable set on table element it’s a data table.
  2. If role specified use that role (e.g. role=table or role=presentation override heuristics)
  3. If inside math element it’s a data table.
  4. If datatable=0 set on table element it’s a layout table.
  5. If non-empty summary set on table element it’s a data table.
  6. If first element inside table is a non-empty caption it’s a data table.
  7. If table contains col, colgroup, tfoot or thead it’s a data table.
  8. If table contains row with th it’s a data table.
  9. If table contains cells with headers, scope, or abbr attributes it’s a data table.
  10. If table contains a cell whose only content is an abbr or acronym element it’s a data table.
  11. If table contains a nested table it’s a layout table.
  12. If table contains 1 row or 1 column it’s layout table.
  13. If table contains more than 5 columns it’s a data table.
  14. If cell at 0,0 has a border on any edge it’s a data table.
  15. If table contains rows with alternating background colors (zebra stripes) it’s a data table.
  16. If table contains more than 20 rows it’s a data table.
  17. If table is 95% of document width it’s a layout table.
  18. If table contains 10 or fewer cells it’s a layout table.
  19. If table contains embed, object or iframe it’s a layout table.
  20. No heuristics match so it’s a data table.

The code is in TableAccessible::IsProbablyLayoutTable in accessible/generic/TableAccessible.cpp.

NVDA with IE11

A much simpler algorithm is used by NVDA with IE11, but the calculation is done by NVDA virtual buffer backend instead of IE11.

  1. If table contains caption, colgroup, rowgroup, tfoot or thead it’s a data table.
  2. If table contains th anywhere it’s a data table.
  3. If summary (even empty) set on table element it’s a data table.
  4. If table contains cells with headers attributes it’s a data table.
  5. Otherwise it’s a layout table.

The code is in fillVBuf_helper_collectAndUpdateTableInfo in nvdaHelper/vbufBackends/mshtml/mshtml.cppand assigns a value to tableInfo->definitData.

Safari / WebKit

The heuristic executes the following steps in order and returns when it finds a match:

  1. If role specified use that role (e.g. role=table or role=presentation override heuristics).
  2. If contenteditable set on table element it’s a data table.
  3. If non-empty summary set on table element it’s a data table.
  4. If table contains caption, tfoot or thead it’s a data table.
  5. If non-empty rules attribute set on table element it’s a data table.
  6. If table contains col or colgroup it’s a data table.
  7. If non-zero aria-colcount or aria-rowcount attributes set on table element it’s a data table.
  8. If table contains more than 20 rows it’s a data table.
  9. If table contains cells with non-empty axis, headers, scope, or abbr attributes it’s a data table.
  10. If table contains cells with non-zero aria-colindex, aria-rowindex, aria-colspan, aria-rowspan attributes set on table element it’s a data table.
  11. If table CSS specifies empty-cells: property it’s a data table.
  12. If more than 10 cells have a border or background different to table background it’s a data table.
  13. If first row is all th elements and table has at least 2 columns it’s a data table.
  14. If first column is all th elements and table has at least 2 rows it’s a data table.
  15. If there are less than 2 cells it’s a layout table.
  16. If more than 50% of cells have a border or background different to table background it’s a data table.
  17. If table contains rows with alternating background colors (zebra stripes) it’s a data table.
  18. No heuristics match so it’s a layout table.

The code is in AccessibilityTable::isDataTable() in Source/WebCore/accessibility/AccessibilityTable.cpp.

JAWS

If table doesn’t have th elements or role=presentation the following heuristic is used:

  • If a table has 2 or more rows and 2 or more columns, and there are 4 or more cells between 200 and 16,000 square pixels then it’s a data table
  • Otherwise it’s a layout table

This is affected by default font size and by window size if table size is relative to viewport width (e.g. width=100%). Changing the default font size (text zoom), or resizing the window can change the table from a layout table to a data table, or vice versa.

The following table will change from layout to data table and back again after resizing the window and hitting refresh:

<table style="width:100%">
    <tr>
        <td>At small window sizes</td>
        <td>I am data</td>
    </tr>
    <tr>
        <td>At large window sizes</td>
        <td>I am layout</td>
    </tr>
</table>

The calculation is also inconsistent between IE11 (where cell margins are included in the square pixel calculation) and Firefox (where cell margins are excluded).

Reference: Tables and Forms with JAWS and MAGic