Filtering results on wpt.fyi, Safari-specific failures

# Philip Jägenstedt (a month ago)

Following the improved Safari results last year [1] and the discussion that generated, I'm happy to announce that the filtering requested as now available in the search box. The full syntax is documented [2] but there's also a new insights view [3] with some useful searches.

Especially interesting for this list could be this view, of Chrome Dev, Firefox Nightly and Safari Technology Preview, filtered to the Safari-specific failures: wpt.fyi/results/?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned&q=%28chrome%3Apass%7Cchrome%3Aok%29+%28firefox%3Apass%7Cfirefox%3Aok%29+%28safari%3A%21pass%26safari%3A%21ok%29

Both Google and Mozilla have efforts [4][5] to reduce the number of Chrome/Firefox-specific failures, as this seems like a category of problems which especially valuable, where changing just one browser can remove a pain point for web developers.

No doubt some failures are spurious, but hopefully there is value to be found by looking into where the largest numbers of failures appear to be. If something seems to be wrong with the search/filtering, please file an issue for us! [6]

Credit to Mark Dittmer and Luke Bjerring who owned this project.

P.S. We are also working on triage metadata for wpt.fyi, to make it possible to burn down a list of failures like this and not later have to re-triage to find the new failures. [7]

[1] lists.webkit.org/pipermail/webkit-dev/2018-October/030209.html [2] web-platform-tests/wpt.fyi/blob/master/api/query/README.md [3] staging.wpt.fyi/insights [4] bugs.chromium.org/p/chromium/issues/detail?id=896242 [5] bugzilla.mozilla.org/show_bug.cgi?id=1498357 [6] web-platform-tests/wpt.fyi/issues/new?title=Structured+Queries+issue&projects=web-platform-tests/wpt.fyi/8&labels=bug&template=search.md [7] docs.google.com/document/d/1oWYVkc2ztANCGUxwNVTQHlWV32zq6Ifq9jkkbYNbSAg/edit?usp=sharing

Contact us to advertise here
# Philip Jägenstedt (a month ago)

I'd like to point out right away that diagnosing reftest failures is currently cumbersome because we don't store the screenshots. This is also a work in progress: docs.google.com/document/d/1IhZa4mrjK1msUMhtamKwKJ_HhXD-nqh_4-BcPWM6soQ/edit?usp=sharing

Until that has launched, I would recommend ignoring reftest failures if the cause of failure isn't obvious.

# Maciej Stachowiak (a month ago)

Neat.

I see some obvious areas for focus, where Safari fails lots of tests that the other browser don’t.

For context, I tried looking at this view, which shows all tests that Safari and Firefox pass with Safari results regardless of result: wpt.fyi/results/?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned&q=%28chrome%3Apass%7Cchrome%3Aok%29+%28firefox%3Apass%7Cfirefox%3Aok%29, wpt.fyi/results/?label=master&label=experimental&product=chrome[taskcluster]&product=firefox[taskcluster]&product=safari[azure]&aligned&q=(chrome:pass|chrome:ok)+(firefox:pass|firefox:ok)

I noticed some puzzling results there: Safari passes all the ambient-light and bluetooth tests that Chrome and Firefox do, despite not supporting these standards at all. (For that matter I’m not sure Firefox supports these specs either.) Not sure if harness problem, or dubious tests that don’t actually test the standard.

# Philip Jägenstedt (a month ago)

I think I know what's going on there. When drilling down into tests and subtests, only those matching the filter are shown. Clearing the filter things look a bit different in the directories you mentioned: wpt.fyi/results/ambient-light?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/bluetooth?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

In particular for idlharness.js tests some subtests will pass because they're preconditions for the real tests. There will also be tests that check that something doesn't work, which will pass even if the feature is entirely unsupported if "not working" results in the same thing, e.g. throwing an exception. Sometimes tests can be tweaked to fail if the feature is unsupported.

Drilling down into a directory somewhat at random and clearing filters, it does look like this is legit: wpt.fyi/results/fetch/api/cors?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

# Maciej Stachowiak (a month ago)

On Feb 25, 2019, at 1:57 PM, Philip Jägenstedt <foolip at chromium.org> wrote:

I think I know what's going on there. When drilling down into tests and subtests, only those matching the filter are shown. Clearing the filter things look a bit different in the directories you mentioned: wpt.fyi/results/ambient-light?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/ambient-light?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/bluetooth?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/bluetooth?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

In particular for idlharness.js tests some subtests will pass because they're preconditions for the real tests.

OK.

There will also be tests that check that something doesn't work, which will pass even if the feature is entirely unsupported if "not working" results in the same thing, e.g. throwing an exception. Sometimes tests can be tweaked to fail if the feature is unsupported.

It would be helpful for clarity if “feature not supported at all” resulted in zero tests passing, but perhaps it is challenging to stick to writing tests that way.

Drilling down into a directory somewhat at random and clearing filters, it does look like this is legit: wpt.fyi/results/fetch/api/cors?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned, wpt.fyi/results/fetch/api/cors?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned

There’s definitely lots of failures that look legit.

# Sam Sneddon (a month ago)

A lot of the test results I'm seeing there are the "harness status", which has been a common cause of confusion: web-platform-tests/wpt.fyi#62. Don't know quite what the right solution is here, but it's definitely still confusing.

/

# Robert Ma (4 days ago)

On Mon, Feb 25, 2019 at 8:49 AM Philip Jägenstedt <foolip at chromium.org>

wrote:

I'd like to point out right away that diagnosing reftest failures is currently cumbersome because we don't store the screenshots. This is also a work in progress:

docs.google.com/document/d/1IhZa4mrjK1msUMhtamKwKJ_HhXD-nqh_4-BcPWM6soQ/edit?usp=sharing

Until that has launched, I would recommend ignoring reftest failures if the cause of failure isn't obvious.

Great news! Reftest screenshots are now available on wpt.fyi. No more guesswork for why a reftest fails!

For example, this wpt.fyi/results/css/css-flexbox/flex-wrap-002.html?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned&q=%28chrome%3Apass%7Cchrome%3Aok%29+%28firefox%3Apass%7Cfirefox%3Aok%29+%28safari%3A%21pass%26safari%3A%21ok%29

is one of the Safari-only reftest failures you can find using the search link posted earlier. Now you can click the "compare" button (you might need to force-reload the page to see it) to view the screenshots. This example looks like a genuine failure, while some others are probably caused by font antialiasing/kerning (they should most likely use the Ahem font instead).

We are also working on another feature to triage the failures docs.google.com/document/d/1oWYVkc2ztANCGUxwNVTQHlWV32zq6Ifq9jkkbYNbSAg/edit (e.g. to mark a test as a genuine failure and link it to bug trackers, or as flaky/broken). Stay tuned!

# Philip Jägenstedt (17 hours ago)

On Fri, Mar 29, 2019 at 6:16 PM Robert Ma <robertma at chromium.org> wrote:

On Mon, Feb 25, 2019 at 8:49 AM Philip Jägenstedt <foolip at chromium.org> wrote:

I'd like to point out right away that diagnosing reftest failures is currently cumbersome because we don't store the screenshots. This is also a work in progress:

docs.google.com/document/d/1IhZa4mrjK1msUMhtamKwKJ_HhXD-nqh_4-BcPWM6soQ/edit?usp=sharing

Until that has launched, I would recommend ignoring reftest failures if the cause of failure isn't obvious.

Great news! Reftest screenshots are now available on wpt.fyi. No more guesswork for why a reftest fails!

For example, this wpt.fyi/results/css/css-flexbox/flex-wrap-002.html?label=master&label=experimental&product=chrome%5Btaskcluster%5D&product=firefox%5Btaskcluster%5D&product=safari%5Bazure%5D&aligned&q=%28chrome%3Apass%7Cchrome%3Aok%29+%28firefox%3Apass%7Cfirefox%3Aok%29+%28safari%3A%21pass%26safari%3A%21ok%29 is one of the Safari-only reftest failures you can find using the search link posted earlier. Now you can click the "compare" button (you might need to force-reload the page to see it) to view the screenshots. This example looks like a genuine failure, while some others are probably caused by font antialiasing/kerning (they should most likely use the Ahem font instead).

We are also working on another feature to triage the failures docs.google.com/document/d/1oWYVkc2ztANCGUxwNVTQHlWV32zq6Ifq9jkkbYNbSAg/edit (e.g. to mark a test as a genuine failure and link it to bug trackers, or as flaky/broken). Stay tuned!

The screenshots can also come in handy when comparing Safari stable to Technology Preview: wpt.fyi/results/?diff&filter=ADC&q=seq%28status%3Apass+status%3Afail%29&run_id=5130810281689088&run_id=5197532699295744

/css/css-contain/contain-layout-baseline-003.html is one reftest that appears to have regressed in Technology Preview, and one can see the failure here: wpt.fyi/analyzer?screenshot=sha1%3A66e5479ec5db9b860338e89803b563f7e99510f6&screenshot=sha1%3A385fc160998db876af7fce0e6a9fbf8ad06b4a45

Want more features?

Request early access to our private beta of readable email premium.