Categories: Quality

Try to find the patch which caused a crash.

For some categories of crashes, we are automatically able to pinpoint the patch which introduced the regression.

The issue

Developers make mistakes, not because they’re bad but most of the time because the code is complex and sometimes just because the modifications they made are so trivial that they don’t pay too much attention.

In parallel, the sooner we can catch these mistakes, the easier it is for developers to fix them. At the end, this strongly improves the user experience.
Indeed, if developers are quickly informed about new regressions introduced by their changes, it becomes much easier for them to fix issues as they still remember the changes.

How do we achieve that?

When a new crash signature shows up, we retrieve the stack trace of the crash, i.e. the sequence of called functions which led to the crash: https://crash-stats.mozilla.com/report/index/53b199e7-30f5-4c3d-8c8a-e39c82170315#frames .

For each function, we have the file name where it is defined and the mercurial changeset from which Firefox was built, so in querying https://hg.mozilla.org  it is possible to know what the last changes on this file were.

The strategy is the following:

  1. we retrieve the crashes which just appeared in the last nightly version (no crash in the last three days);
  2. we bucketize crashes by their proto-signature;
  3. for each bucket, we get a crash report and then get the functions and files which appear in the stack trace;
  4. for each file, we query mercurial to know if a patch has been applied to this file in the last three days.

The last stage is to analyze the stack traces and the corresponding patches to infer that a patch is probably the responsible for a crash and finally just report a bug.

Results

As an example:

https://bugzilla.mozilla.org/show_bug.cgi?id=1347836

The patch https://hg.mozilla.org/mozilla-central/diff/99e3488b1ea4/layout/base/nsLayoutUtils.cpp modified the function nsLayoutUtils::SurfaceFromElement and the crash occured in this function (https://crash-stats.mozilla.com/report/index/53b199e7-30f5-4c3d-8c8a-e39c82170315#frames), few lines after the modified line.

Finally the issue was a function which returned a pointer which could be dangling (the patch).

https://bugzilla.mozilla.org/show_bug.cgi?id=1347461

The patch https://hg.mozilla.org/mozilla-central/diff/bf33ec027cea/security/manager/ssl/DataStorage.cpp modified the line where the crash occured (https://crash-stats.mozilla.com/report/index/c7ba45aa-99a9-448b-91df-37da82170314#frames).

Finally the issue was an attempt to use an uninitialized object.

https://bugzilla.mozilla.org/show_bug.cgi?id=1342217

The patch https://hg.mozilla.org/mozilla-central/diff/e6fa8ff0d0be/dom/media/platforms/wrappers/MediaDataDecoderProxy.cpp added the function where the crash occured (https://crash-stats.mozilla.com/report/index/6a96375e-5c83-4ebe-9078-2d4472170222#frames).

Finally the issue was just a missing return in a function (the patch).

In these differents bugs, the volume is very low so almost nobody care about them but finally they reveal true mistakes in the code, so the volume could be higher in beta or release.
For the future, we hope that it will be possible to automate most of that process and file automatically a bug.

2017-09-06 update:
A meta bug listing all bugs found using clouseau can be seen on: https://bugzilla.mozilla.org/show_bug.cgi?id=1396527

One comment on “Try to find the patch which caused a crash.”

Post a comment

  1. njn wrote on

    Nice! This tool looks like it gives highly pertinent notifications.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *