Requesting feedback about EWS comments on Bugzilla bugs

# Aakash Jain (5 days ago)

I am gathering feedback about EWS - specially about the comments which EWS makes on the Bugzilla bugs. Currently, the comments are not very user-friendly/polished/readable and are sometimes very noisy (e.g.: 72 comments and 36 attachments by EWS in bugs.webkit.org/show_bug.cgi?id=177484, bugs.webkit.org/show_bug.cgi?id=177484). I am working on improving them and looking for specific ideas/feedback.

Few ideas which I am considering:

1) Do not upload archive (for layout-test-results) on bugzilla, instead upload it to another server, unzip it and post a link to the results.html. Pros: a) Engineers won't have to download the attachment, unzip it, look for failures, and then delete it from their disk. They can simply click the url to view the results. b) This approach will also reduce 2 comments per failure to 1 comment. Currently there are two comments per failure, one for failure details, second for bugzilla attachment.

2) Aggregate comments from multiple queues. Pros: less noise Cons: comments would get delayed while waiting for results from other queues. (Also might be little complex to implement)

3) Improve the text of the comments to make them more readable (specific ideas are welcome).

4) When a patch becomes 'obsolete', tag the corresponding EWS comments as 'obsolete', so that they will be hidden.

5) Do not comment on bugzilla bug at all, instead send email to the author of the patch. Pros: less noisy, also this will allow to include more detailed information about the failure in email. Cons: reviewers would have to click status-bubbles to see the failures, failure information is not immediately present in the comments.

What do you guys think?

Thanks Aakash

Contact us to advertise here
# Tim Horton (5 days ago)

On Jun 15, 2019, at 21:13, Aakash Jain <aakash_jain at apple.com> wrote:

 Hi Everyone,

I am gathering feedback about EWS - specially about the comments which EWS makes on the Bugzilla bugs. Currently, the comments are not very user-friendly/polished/readable and are sometimes very noisy (e.g.: 72 comments and 36 attachments by EWS in bugs.webkit.org/show_bug.cgi?id=177484). I am working on improving them and looking for specific ideas/feedback.

Few ideas which I am considering:

1) Do not upload archive (for layout-test-results) on bugzilla, instead upload it to another server, unzip it and post a link to the results.html. Pros: a) Engineers won't have to download the attachment, unzip it, look for failures, and then delete it from their disk. They can simply click the url to view the results.

This would be nice!

b) This approach will also reduce 2 comments per failure to 1 comment. Currently there are two comments per failure, one for failure details, second for bugzilla attachment.

2) Aggregate comments from multiple queues. Pros: less noise Cons: comments would get delayed while waiting for results from other queues. (Also might be little complex to implement)

Maybe post a comment that says just “some queues failed” when the first one fails, with a link to page on another server that lists the failures (and pending queues)? That way you can still have only one comment and do your coalescing, but it doesn’t have to wait for all to be done before commenting?

3) Improve the text of the comments to make them more readable (specific ideas are welcome).

4) When a patch becomes 'obsolete', tag the corresponding EWS comments as 'obsolete', so that they will be hidden.

This would be great.

5) Do not comment on bugzilla bug at all, instead send email to the author of the patch. Pros: less noisy, also this will allow to include more detailed information about the failure in email. Cons: reviewers would have to click status-bubbles to see the failures, failure information is not immediately present in the comments.

I feel like “I have to click the status bubbles” isn’t so bad. That’s what we do for build failures (they don’t comment) and nobody complains about that...

# Michael Catanzaro (5 days ago)

1-4 all seem uncontroversial, especially with Tim's suggested improvement to immediately leave a "some queues failed" comment.

On Sat, Jun 15, 2019 at 11:13 PM, Aakash Jain <aakash_jain at apple.com>

wrote:

5) Do not comment on bugzilla bug at all, instead send email to the author of the patch. Pros: less noisy, also this will allow to include more detailed information about the failure in email. Cons: reviewers would have to click status-bubbles to see the failures, failure information is not immediately present in the comments.

Well I think this would be OK, but next we'll be complaining about the emails. The underlying problem is excessive test flakiness. We're just not doing a very good job of tracking flaky tests and EWS isn't good at detecting flaky tests automatically. (Often a flaky test will fail twice in the same run, and that gets detected as a test failure.) Instead of reporting bugs for such tests, we're ignoring them because there are so many and we're so used to it.

commit-queue is able to report bugs when it detects a flaky patch, but EWS doesn't do this. That might be a positive change. Of course, EWS would need to be smart enough to not report a bug if the flakiness is triggered by the patch itself, which might not be simple to determine.

I don't have any concrete suggestions here. Just brainstorming.

# Guillaume Emont (5 days ago)

I agree with Tim's suggestion of limiting the comments to only one immediate comment saying "at least one EWS failed" and referring to a web page that summarizes everything. That could cover what is done by 1) and 2) in a more elegant way.

I'm not sure 3) would make a big difference for me, but I'm not against it.

I think 4) is necessary, and I imagine it would solve pretty easily a lot of the issues that people have with these comments (like the example of #177484 you gave).

I'm not too sure about 5). I like to have all that info easily accessible from one place (the bugzilla issue), I'd rather not have to juggle between that and my email client. Also, if it's only emailed to the author of the patch, it won't be as easily available to people who want to follow what's happening with a bug (for review or other purposes) and/or help with some issues.

Anyway, my 2 cents on the issue, hope it helps.

Guillaume

Quoting Aakash Jain (2019-06-16 06:13:19)

# Darin Adler (4 days ago)

On Jun 15, 2019, at 9:13 PM, Aakash Jain <aakash_jain at apple.com> wrote:

1) Do not upload archive (for layout-test-results) on bugzilla, instead upload it to another server, unzip it and post a link to the results.html. Pros: a) Engineers won't have to download the attachment, unzip it, look for failures, and then delete it from their disk. They can simply click the url to view the results. b) This approach will also reduce 2 comments per failure to 1 comment. Currently there are two comments per failure, one for failure details, second for bugzilla attachment.

Great improvement to do this. The most confusing thing about build bot comments is all the “creation of attachments” extra text with things like “attachment number” and “patch".

However, it’s really nice that I can download a directory full of test results easily. I’d like to see the EWS website still have that feature.

4) When a patch becomes 'obsolete', tag the corresponding EWS comments as 'obsolete', so that they will be hidden.

Incredibly valuable.

5) Do not comment on bugzilla bug at all

I think this makes sense. I don’t see a reason that test results need to be comments. I think the “red bubble” in EWS already calls someone’s attention to failures.

If we want to augment it, we should think of what we are aiming at. I do find it useful to see which tests are failing, and when I click on the red bubble I don’t see that information. I have to click once to see the “log of activities” then click on “results”, then see a confusing giant file with lots of other information. At the bottom of that file the one thing I want to know.

A better hierarchy is to put that “what new tests are failing” summary right t the top and let the logs be fallbacks, not the primary place to see the features.

instead send email to the author of the patch.

Why? I don’t think this should send any emails at all, unless the person requested it.

Pros: less noisy, also this will allow to include more detailed information about the failure in email.

I think the more detailed information should be on the webpage, not in an email.

Cons: reviewers would have to click status-bubbles to see the failures, failure information is not immediately present in the comments.

I think we should start with this approach, eliminating the comments entirely.

— Darin

# Keith Rollin (4 days ago)

On Jun 16, 2019, at 11:14, Darin Adler <darin at apple.com> wrote:

If we want to augment it, we should think of what we are aiming at. I do find it useful to see which tests are failing, and when I click on the red bubble I don’t see that information. I have to click once to see the “log of activities” then click on “results”, then see a confusing giant file with lots of other information. At the bottom of that file the one thing I want to know.

We might want to also start turning those failure into action items. We could have an automatic mechanism that gathers the failures, records them in a database, and then — with sufficient data — makes determinations about the flakiness or other status of the test. It could then mark the test as flaky or raise it as an issue to some responsible (and responsive) party.

We could also have a relatively manual process. The failures are surfaced in Bugzilla or in a Bugzilla-accessible page. The engineer posting the patch could then review the failures and mark them as “Flag as flaky”, “Flag as failing and should be fixed by someone else”, “Flag as failing and should be ignored”, etc. These responses could then be turned into action items for some responsible (and responsive) party to address.

As Michael says, there’s a big issue with ignoring test results. Putting a frictionless process in place to address test results would help make them more effective. When I make a change to an Xcode project and Windows builds throw up errors, that’s not something caused by my immediate patch, but I would like to see the flaky test fixed.

— Keith

# Aakash Jain (2 days ago)

On Jun 17, 2019, at 1:52 PM, Keith Rollin <krollin at apple.com> wrote:

On Jun 16, 2019, at 11:14, Darin Adler <darin at apple.com> wrote:

If we want to augment it, we should think of what we are aiming at. I do find it useful to see which tests are failing, and when I click on the red bubble I don’t see that information. I have to click once to see the “log of activities” then click on “results”, then see a confusing giant file with lots of other information. At the bottom of that file the one thing I want to know.

We might want to also start turning those failure into action items. We could have an automatic mechanism that gathers the failures, records them in a database, and then — with sufficient data — makes determinations about the flakiness or other status of the test. It could then mark the test as flaky or raise it as an issue to some responsible (and responsive) party.

Agree. This is the plan. Jonathan Bedard is working on an improved flakiness dashboard. Once we have that, EWS will start using it's API to get the test flakiness information. That should significantly reduce EWS's false positives (and also reduce the number of retries EWS has to do while trying to rule out flakiness).

# Aakash Jain (2 days ago)

On Jun 16, 2019, at 2:14 PM, Darin Adler <darin at apple.com> wrote:

On Jun 15, 2019, at 9:13 PM, Aakash Jain <aakash_jain at apple.com <mailto:aakash_jain at apple.com>> wrote:

1) Do not upload archive (for layout-test-results) on bugzilla, instead upload it to another server, unzip it and post a link to the results.html. Pros: a) Engineers won't have to download the attachment, unzip it, look for failures, and then delete it from their disk. They can simply click the url to view the results. b) This approach will also reduce 2 comments per failure to 1 comment. Currently there are two comments per failure, one for failure details, second for bugzilla attachment.

Great improvement to do this. The most confusing thing about build bot comments is all the “creation of attachments” extra text with things like “attachment number” and “patch".

However, it’s really nice that I can download a directory full of test results easily. I’d like to see the EWS website still have that feature.

Sure, will keep an option to download the test results archive as well.

4) When a patch becomes 'obsolete', tag the corresponding EWS comments as 'obsolete', so that they will be hidden.

Incredibly valuable.

5) Do not comment on bugzilla bug at all

I think this makes sense. I don’t see a reason that test results need to be comments. I think the “red bubble” in EWS already calls someone’s attention to failures.

What do you think about comments for 'Style' failures, is that ok to keep, or should we remove them as well?

If we want to augment it, we should think of what we are aiming at. I do find it useful to see which tests are failing, and when I click on the red bubble I don’t see that information. I have to click once to see the “log of activities” then click on “results”, then see a confusing giant file with lots of other information. At the bottom of that file the one thing I want to know.

A better hierarchy is to put that “what new tests are failing” summary right t the top and let the logs be fallbacks, not the primary place to see the features.

This brings another interesting question, what page should be displayed on clicking an EWS bubble?

a) old EWS style page, e.g.: webkit-queues.webkit.org/patch/362845/ios-sim-ews, webkit-queues.webkit.org/patch/362845/ios-sim-ews. This does have deficiencies like you mentioned.

b) Buildbot page, e.g.: ews-build.webkit.org/#/builders/3/builds/414, ews-build.webkit.org/#/builders/3/builds/414, currently this is new EWS behavior (but this doesn't cover the case when there are multiple builds (e.g.: retries) for a patch on a given queue).

c) Newly designed page which shows the summary of failures, have link(s) to the Buildbot page(s), have link to download test result archives, and maybe summary of major build steps which were executed. More feedback regarding the design of this page would be useful.

instead send email to the author of the patch.

Why? I don’t think this should send any emails at all, unless the person requested it.

ok

Pros: less noisy, also this will allow to include more detailed information about the failure in email.

I think the more detailed information should be on the webpage, not in an email.

Cons: reviewers would have to click status-bubbles to see the failures, failure information is not immediately present in the comments.

I think we should start with this approach, eliminating the comments entirely.

Sure, let's do this. I wouldn't change the old EWS (which is being replaced), but new EWS will have this behavior.

# Jonathan Bedard (2 days ago)

To elaborate a little on Aakash’s comments here:

We’ve found that tracking results for the number of tests we have, on the number of configurations we test with the number of weekly commits to the WebKit project is actually not a trivial problem to solve. The naive SQL approach to this problem isn’t performant. Over the last year, I’ve developed a solution to this problem, and we’ve been reporting a subset of results to it for some time (<> is when the reporting started)

The plan is to commit this to WebKit in the next few weeks, then, as Aakash mentioned, have EWS query this service to determine whether a specific test is failing more generally, or if the failure is specific to the engineer’s patch.

Keith’s suggestion to add a ‘frictionless process in place to address test results’ points to a larger discussion I’ve been having with Aakash and Ryan about how to handle transient test failures. Right now, we use the same mechanism to handle tests which are not supported on a specific configuration and tests which are failing on a specific configuration. This pollutes our changelings with thousands of test gardening commits, (I grepped for ‘garden’ in our LayoutTest changelings, and found more than 8000 results) is error-prone and means that knowledge about a failure caused by the configuration may be lost if you’re bisecting. I have a number of ideas for improving this, but none of them are coherent enough to be proposed at the moment. I’m only mentioning this here because it seems like others are thinking about this problem too.

Jonathan

Want more features?

Request early access to our private beta of readable email premium.