Filtering results on wpt.fyi, Safari-specific failures

# Philip Jägenstedt (2 days ago)

Following the improved Safari results last year [1] and the discussion that generated, I'm happy to announce that the filtering requested as now available in the search box. The full syntax is documented [2] but there's also a new insights view [3] with some useful searches.

Especially interesting for this list could be this view, of Chrome Dev, Firefox Nightly and Safari Technology Preview, filtered to the Safari-specific failures: wpt.fyi/results/?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned&q=%28chrome%3Apass%7Cchrome%3Aok%29+%28firefox%3Apass%7Cfirefox%3Aok%29+%28safari%3A%21pass%26safari%3A%21ok%29

Both Google and Mozilla have efforts [4][5] to reduce the number of Chrome/Firefox-specific failures, as this seems like a category of problems which especially valuable, where changing just one browser can remove a pain point for web developers.

No doubt some failures are spurious, but hopefully there is value to be found by looking into where the largest numbers of failures appear to be. If something seems to be wrong with the search/filtering, please file an issue for us! [6]

Credit to Mark Dittmer and Luke Bjerring who owned this project.

P.S. We are also working on triage metadata for wpt.fyi, to make it possible to burn down a list of failures like this and not later have to re-triage to find the new failures. [7]

[1] lists.webkit.org/pipermail/webkit-dev/2018-October/030209.html [2] web-platform-tests/wpt.fyi/blob/master/api/query/README.md [3] staging.wpt.fyi/insights [4] bugs.chromium.org/p/chromium/issues/detail?id=896242 [5] bugzilla.mozilla.org/show_bug.cgi?id=1498357 [6] web-platform-tests/wpt.fyi/issues/new?title=Structured+Queries+issue&projects=web-platform-tests/wpt.fyi/8&labels=bug&template=search.md [7] docs.google.com/document/d/1oWYVkc2ztANCGUxwNVTQHlWV32zq6Ifq9jkkbYNbSAg/edit?usp=sharing

Contact us to advertise here
# Philip Jägenstedt (2 days ago)

I'd like to point out right away that diagnosing reftest failures is currently cumbersome because we don't store the screenshots. This is also a work in progress: docs.google.com/document/d/1IhZa4mrjK1msUMhtamKwKJ_HhXD-nqh_4-BcPWM6soQ/edit?usp=sharing

Until that has launched, I would recommend ignoring reftest failures if the cause of failure isn't obvious.

# Maciej Stachowiak (2 days ago)

Neat.

I see some obvious areas for focus, where Safari fails lots of tests that the other browser don’t.

For context, I tried looking at this view, which shows all tests that Safari and Firefox pass with Safari results regardless of result: wpt.fyi/results/?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned&q=%28chrome%3Apass%7Cchrome%3Aok%29+%28firefox%3Apass%7Cfirefox%3Aok%29, wpt.fyi/results/?label=master&label=experimental&product=chrome[taskcluster]&product=firefox[taskcluster]&product=safari[azure]&aligned&q=(chrome:pass|chrome:ok)+(firefox:pass|firefox:ok)

I noticed some puzzling results there: Safari passes all the ambient-light and bluetooth tests that Chrome and Firefox do, despite not supporting these standards at all. (For that matter I’m not sure Firefox supports these specs either.) Not sure if harness problem, or dubious tests that don’t actually test the standard.

# Philip Jägenstedt (2 days ago)

I think I know what's going on there. When drilling down into tests and subtests, only those matching the filter are shown. Clearing the filter things look a bit different in the directories you mentioned: wpt.fyi/results/ambient-light?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/bluetooth?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

In particular for idlharness.js tests some subtests will pass because they're preconditions for the real tests. There will also be tests that check that something doesn't work, which will pass even if the feature is entirely unsupported if "not working" results in the same thing, e.g. throwing an exception. Sometimes tests can be tweaked to fail if the feature is unsupported.

Drilling down into a directory somewhat at random and clearing filters, it does look like this is legit: wpt.fyi/results/fetch/api/cors?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

# Maciej Stachowiak (2 days ago)

On Feb 25, 2019, at 1:57 PM, Philip Jägenstedt <foolip at chromium.org> wrote:

I think I know what's going on there. When drilling down into tests and subtests, only those matching the filter are shown. Clearing the filter things look a bit different in the directories you mentioned: wpt.fyi/results/ambient-light?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/ambient-light?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/bluetooth?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/bluetooth?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

In particular for idlharness.js tests some subtests will pass because they're preconditions for the real tests.

OK.

There will also be tests that check that something doesn't work, which will pass even if the feature is entirely unsupported if "not working" results in the same thing, e.g. throwing an exception. Sometimes tests can be tweaked to fail if the feature is unsupported.

It would be helpful for clarity if “feature not supported at all” resulted in zero tests passing, but perhaps it is challenging to stick to writing tests that way.

Drilling down into a directory somewhat at random and clearing filters, it does look like this is legit: wpt.fyi/results/fetch/api/cors?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/fetch/api/cors?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

There’s definitely lots of failures that look legit.

# Sam Sneddon (a day ago)

A lot of the test results I'm seeing there are the "harness status", which has been a common cause of confusion: web-platform-tests/wpt.fyi#62. Don't know quite what the right solution is here, but it's definitely still confusing.

/

Want more features?

Request early access to our private beta of readable email premium.