Huge improvement in Safari results on wpt.fyi

# Philip Jägenstedt (10 days ago)

Fresh off the bots, I'm excited to report more robust Safari results, and that Safari WPT pass rates are clearly improving! Thanks to the hard work of Mike Pennisi [1] we now have the first Safari 12 results: wpt.fyi/results/?sha=ee2e69bfb1&product=safari-12.0

This uses the same setup as for Safari Technology Preview, which has been running for a while [2] and are the results you see on the "experimental" view: wpt.fyi/results/?label=experimental

This appears much more robust than the Safari 11 data we've collected from Sauce Labs, and we can see a massive improvement between Safari 11 and 12: wpt.fyi/results/?sha=ee2e69bfb1&product=safari-11.1&product=safari-12.0&diff

This lumps together infrastructure improvements as well as Safari 11->12 improvements, but improvements in service-workers/ [3] stands

out, as well as in webdriver/, referrer-policy/, css/css-align/, and others. (The effect of moving away from Sauce is mainly less timeouts.)

Also very interesting is to compare Safari 12 stable to TP: wpt.fyi/results/?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1&diff

One can tell that work is going in canvas-related things, web-animations/, css/css-logical/ and more! \o/

I hope you'll all find these results valuable, and please report bugs or feature requests here: web-platform-tests/wpt.fyi/issues

P.S. We're also trying to use use these diff views to spot regressions. It's a bit hard to use, [4] but a fix in in progress [5] and I might check back here when that works. I'll append to the end of this email a non-exhaustive list of possible regressions already possible to spot.

[1] web-platform-tests/results-collection#604 [2] wpt.fyi/test-runs?labels=safari,experimental [3] wpt.fyi/results/service-workers?sha=ee2e69bfb1&product=safari-11.1&product=safari-12.0&diff=true [4] web-platform-tests/wpt.fyi#411 [5] web-platform-tests/wpt.fyi#609

P.P.S. Possible regressions in Safari TP: wpt.fyi/results/css/vendor-imports/mozilla/mozilla-central-reftests/shapes1?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1, wpt.fyi/results/service-workers/service-worker/extendable-event-async-waituntil.https.html?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1, wpt.fyi/results/service-workers/service-worker/skip-waiting-installed.https.html?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1

Contact us to advertise here
# Ryosuke Niwa (8 days ago)

Thanks for the intriguing data, Philip.

Is there a way to get a list of tests where all other browsers pass but Safari / WebKit fail?

That would allow us to quickly identify the set of tests we can fix to improve the interoperability across browsers right away.

  • R. Niwa

On Tue, Oct 2, 2018 at 3:45 AM Philip Jägenstedt <foolip at chromium.org>

wrote:

# Chris Dumez (8 days ago)

On Oct 2, 2018, at 3:45 AM, Philip Jägenstedt <foolip at chromium.org> wrote:

Hi WebKittens,

Fresh off the bots, I'm excited to report more robust Safari results, and that Safari WPT pass rates are clearly improving! Thanks to the hard work of Mike Pennisi [1] we now have the first Safari 12 results: wpt.fyi/results/?sha=ee2e69bfb1&product=safari-12.0

This uses the same setup as for Safari Technology Preview, which has been running for a while [2] and are the results you see on the "experimental" view: wpt.fyi/results/?label=experimental

This appears much more robust than the Safari 11 data we've collected from Sauce Labs, and we can see a massive improvement between Safari 11 and 12: wpt.fyi/results/?sha=ee2e69bfb1&product=safari-11.1&product=safari-12.0&diff

This lumps together infrastructure improvements as well as Safari 11->12 improvements, but improvements in service-workers/ [3] stands out, as well as in webdriver/, referrer-policy/, css/css-align/, and others. (The effect of moving away from Sauce is mainly less timeouts.)

Also very interesting is to compare Safari 12 stable to TP: wpt.fyi/results/?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1&diff

One can tell that work is going in canvas-related things, web-animations/, css/css-logical/ and more! \o/

I hope you'll all find these results valuable, and please report bugs or feature requests here: web-platform-tests/wpt.fyi/issues

P.S. We're also trying to use use these diff views to spot regressions. It's a bit hard to use, [4] but a fix in in progress [5] and I might check back here when that works. I'll append to the end of this email a non-exhaustive list of possible regressions already possible to spot.

[1] web-platform-tests/results-collection#604 [2] wpt.fyi/test-runs?labels=safari,experimental [3] wpt.fyi/results/service-workers?sha=ee2e69bfb1&product=safari-11.1&product=safari-12.0&diff=true [4] web-platform-tests/wpt.fyi#411 [5] web-platform-tests/wpt.fyi#609

P.P.S. Possible regressions in Safari TP: wpt.fyi/results/css/vendor-imports/mozilla/mozilla-central-reftests/shapes1?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1, wpt.fyi/results/service-workers/service-worker/extendable-event-async-waituntil.https.html?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1, wpt.fyi/results/service-workers/service-worker/extendable-event-async-waituntil.https.html?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1

Looks like this regression sneaked in with Bug 188246 bugs.webkit.org/show_bug.cgi?id=188246.

# Ryosuke Niwa (8 days ago)

Sounds like we should import those tests!

  • R. Niwa
# Philip Jägenstedt (4 days ago)

That filtering capability unfortunately does not yet exist on wpt.fyi but it's a high priority and actively being worked on: web-platform-tests/wpt.fyi#201

FWIW, I suspect that these purposes, comparing to the stable versions of all other browsers might be the most useful: wpt.fyi/results/?product=chrome%5Bstable%5D&product=edge%5Bstable%5D&product=firefox%5Bstable%5D&product=safari%5Bexperimental%5D&aligned

Again, no way to filter on wpt.fyi, but I'll see if I can download the full results and write a quick script.

# Philip Jägenstedt (a day ago)

Alright, I've written a one-off script [1] to find the Safari-only failures, and here's the output: gist.github.com/foolip/4d410ce79416bcdce71feb212159a02e

Barring bugs, each of linked tests or one of its subtests should be failing in Safari Technology Preview and passing in stable versions of Chrome, Edge and Firefox.

Numerically, most of the failures are in css (622), encoding (135) and html (60). With css, it's mostly css/CSS2.

I hope looking through this may be of use to you!

[1] foolip/ad

# Dean Jackson (19 hours ago)

It turns out that many (most?) of the CSS failures are because we no longer expose user-installed fonts, e.g. Ahem.

Options:

  • update lots of tests to load Ahem via @font-face (yuck)
  • allow Ahem to be used if installed (weird to special case one font, but probably ok)

Dean

# Geoffrey Garen (18 hours ago)

Honest question: What’s gross about using @font-face?

It would be lots of test edits. That’s a bummer.

But maybe it’s clearer for the tests to specify the font they want to use. It makes the test self-describing, eliminating the requirement that the user take a step outside the test to get the right result.

Thanks, Geoff

# Emilio Cobos Álvarez (17 hours ago)

On 10/12/18 3:59 AM, Geoffrey Garen wrote:

Honest question: What’s gross about using @font-face?

It would be lots of test edits. That’s a bummer.

But maybe it’s clearer for the tests to specify the font they want to use. It makes the test self-describing, eliminating the requirement that the user take a step outside the test to get the right result.

Note that there's also the opposite opinion of loading a web font potentially hiding bugs:

lists.w3.org/Archives/Public/www-style/2017Jan/0053.html

Though I don't have such a strong opinion myself, I think @font-face is a fine solution for that problem (and other people seemed to be ok with that as well, looking at how that thread continues...).

I don't know if the CSSWG ended up taking an official position on this, but may be worth asking in www-style before doing he work of a mass-convert.

# Geoffrey Sneddon (6 hours ago)

On Fri, Oct 12, 2018 at 4:23 AM Emilio Cobos Álvarez <emilio at crisal.io> wrote: >

On 10/12/18 3:59 AM, Geoffrey Garen wrote:

Honest question: What’s gross about using @font-face?

It would be lots of test edits. That’s a bummer.

But maybe it’s clearer for the tests to specify the font they want to use. It makes the test self-describing, eliminating the requirement that the user take a step outside the test to get the right result.

See web-platform-tests/wpt#9105 about this.

Note that there's also the opposite opinion of loading a web font potentially hiding bugs:

lists.w3.org/Archives/Public/www-style/2017Jan/0053.html

Though I don't have such a strong opinion myself, I think @font-face is a fine solution for that problem (and other people seemed to be ok with that as well, looking at how that thread continues...).

I don't have a strong opinion here, but: a) it certainly seems annoying to flush layout and avoid triggering any layout invalidation bugs; b) we have plenty of other (manual 🙁) tests for the font matching algorithm (and parts of that obviously do need to use system installed fonts).

As an aside: when did user installed fonts stop being allowed by default? r226172 states nothing is using the SPI yet (though did it already default to No? in which case it has been disallowed by default since r225641). wpt.fyi seems to have Ahem being installed okay for STP but not stable, based on infrastructure/assumptions/ahem.html, and all that does it copy the font to ~/Library/Fonts, which confuses me!

I don't know if the CSSWG ended up taking an official position on this, but may be worth asking in www-style before doing he work of a mass-convert.

I'd like to suggest to discuss this on the above linked WPT issue; the CSS WG are far from the only stakeholder here (there are plenty of reftests elsewhere in WPT!).

# Philip Jägenstedt (5 hours ago)

On Fri, Oct 12, 2018 at 4:07 PM Geoffrey Sneddon <me at gsnedders.com> wrote: >

On Fri, Oct 12, 2018 at 4:23 AM Emilio Cobos Álvarez <emilio at crisal.io> wrote: >

On 10/12/18 3:59 AM, Geoffrey Garen wrote:

Honest question: What’s gross about using @font-face?

It would be lots of test edits. That’s a bummer.

But maybe it’s clearer for the tests to specify the font they want to use. It makes the test self-describing, eliminating the requirement that the user take a step outside the test to get the right result.

See web-platform-tests/wpt#9105 about this.

Note that there's also the opposite opinion of loading a web font potentially hiding bugs:

lists.w3.org/Archives/Public/www-style/2017Jan/0053.html

Though I don't have such a strong opinion myself, I think @font-face is a fine solution for that problem (and other people seemed to be ok with that as well, looking at how that thread continues...).

I don't have a strong opinion here, but: a) it certainly seems annoying to flush layout and avoid triggering any layout invalidation bugs; b) we have plenty of other (manual ) tests for the font matching algorithm (and parts of that obviously do need to use system installed fonts).

I don't think we should change a bunch of tests, it's useful to be able to depend on some system font existing across the board, and Ahem is it. We already need root access in our CI systems because of /etc/hosts, so just putting Ahem in /Library/Fonts as part of the setup is fine.

As an aside: when did user installed fonts stop being allowed by default? r226172 states nothing is using the SPI yet (though did it already default to No? in which case it has been disallowed by default since r225641). wpt.fyi seems to have Ahem being installed okay for STP but not stable, based on infrastructure/assumptions/ahem.html, and all that does it copy the font to ~/Library/Fonts, which confuses me!

I'd also like to know when this change happened, because in foolip/wpt#5 I had to work around it for Azure Pipelines, which has macOS 10.13.6, while all the other CI systems I tried worked with the code as-is. They are all running the same version of STP. (This PR is still just me experimenting, but the goal is to get Safari coverage for PRs pretty soon.)

# Philip Jägenstedt (5 hours ago)

On the run of Safari that was used for this report, the infrastructure test for ahem was actually passing: wpt.fyi/results/infrastructure/assumptions?sha=67152fdecd&product=chrome[stable]&product=edge[stable]&product=firefox[stable]&product=safari[experimental]

Are you sure that Ahem is the explanation for the failures, do you have a test that you think is actually passing and the wpt.fyi results are wrong? Clearly, having screenshots would make it easier to understand a situation like this, and it's something we've discussed a bit today: web-platform-tests/wpt.fyi#57

Want more features?

Request early access to our private beta of readable email premium.