Anamaria Stoica

My Mozilla Blog

Posts Tagged ‘End to End Times

Mozilla-Central End to End Times Values Distribution (October)

with 5 comments

In my previous post End to End Times Report I started talking about E2E times, by defining what they are and then looking at some monthly E2E times averages of the past 3 months for mozilla-central and try.

I also kept mentioning that the normal E2E times for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load. Now, how much exactly do the E2E times vary away from the normal times and how?

In order have a better grasp on what the E2E times values distribution might be, I plotted the histogram of all E2E times for mozilla-central registered in October (more precisely October 1-20, 2010). And here’s how it looks after removing the outliers:

Mozilla-central E2E times histogram without outliers (October 1-20)

The histogram above represents the distribution of the E2E times among bins of 15 minutes.

As it turns out the histogram looks pretty nice. Most values (66.38%) are located in the 3h – 4h 25m normal time interval, with a high peak in the 3h 45m – 4h time subinterval.

However there is a long tail of values between 5h to 10 hours. Even though the number of values in each bin is small, summed up together they represent around 15% of the Build Runs.

The values smaller than 3h (10.92%) are build failures and exceptions. The very large outliers (>10h) were excluded from the histogram. They represent 7.18% of all Build Runs, with 4.02% between 10-25h and 3.16% between 25-255h (see plot bellow with outliers included).

Time Interval Percentage (%) Comments
0 – 3h 10.92 % Failures
3h – 4h 25m 66.38 % Normal times
4h 25m – 10h 15.52 % Long tail of large values
>10h 7.18 % Outliers
* 10h-25h: 4.02%
* >25h: 3.16%
Branch mozilla-central
Timeframe ~October 1-20, 2010
No. values 348
Max value 255h 51m
Mean value 7h 12m
Median value 3h 42m

 

Here’s the histogram re-plotted, but this time with all the outliers included:

See Also: End to End Times Report, Mozilla’s Build System.

Advertisements

Written by Anamaria Stoica

November 22, 2010 at 8:02 am

End to End Times Report

with 10 comments

The End to End Time measures how long it takes for a Build Run to complete. That is, the time difference between the timestamp of the change that triggered this Build Run and the timestamp of when the last of the generated Build Requests ends (in other words, when all builds and tests are completed). (see Build Run Life Cycle diagram below, also published in Mozilla’s Build System blog post)

The normal End to End Time for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load.

Report Contents

Summary

As you can see from the snapshot above (snapshot of the End to End Times Report for try branch as seen on October 22, 2010, a little after 12:00 PM), the report starts with some general information, like the branch selected, number of Build Runs found in the specified timeframe (given by startime and endtime URL GET parameters) and another very important value, the Average build run duration (also called the Average End to End Time).

Build Runs Info

Right under, there’s a table which displays information on individual Build Runs (each row represents a Build Run):

1. Push’s Timestamp

Initially, the table is sorted by the ‘Least When Timestamp‘ column, which is actually the push’s change timestamp. This means that at the top should be listed the most recent pushes to the repo (colored gray if still running/pending). Note: the table is sortable by all other columns too.

2. Result: success vs. warnings vs. failure

The rows have different colors depending on the Build Run’s result (‘Results‘ column): green for success, orange for warnings, red for exception and failure and gray for no result (“-“) (if all Build Requests are currently running or pending).

3. Complete? Still Running?

The ‘Complete‘ column tells whether all Build Requests are completed or not (values: yes/no).

4. End to End Time (Duration)

A very important column is ‘Duration‘, also known as the End to End Time. The duration is computed as following:

Duration := Greatest Finish Time – Least When Timestamp

, or how long it took for all Build Requests in this Build Run to complete (or up until now, if not complete). The ‘Least When Timestamp‘ is the earliest timestamp of the Build Requests’ start times and ‘Greatest Finish Time‘, the latest timestamp of the Build Requests’ finish times.

5. Build Requests Numbers Broke Down by Status And Job Type

The number of Build Requests within a Build Run (differs per branch, for example in mozilla-central there should be 168 if everything was successful) are broke down once by status: Complete, Running, Pending, Cancelled, Interrupted and Misc, and again by job type: Builds, Unittests and Talos.

6. Rebuilds And Forcebuilds

There are also counts on how many rebuilds and forcebuilds were done.

7. Further Information, Link to Build Run Report Page

To see more about the different parameters, check out the Build Run Report. The revision links on the ‘Revision‘ column points to such reports, where you can see the exact status of individual Build Requests.

The End to End Times Report contains all the Build Runs displayed by Tinderboxpushlog, but with accurate data (which does not lie! 🙂 ). However the report was not intended as a real time monitorization tool, but rather as an analysis tool which provides a peak into how well the Build System is performing. Not so far anyways…

Average End to End Times (E2E)

Here are some E2E Averages computed per month, though E2E times tend to vary greatly from week to week or even from one Build Run to another.

Month Branch Mean Median
Aug m-c 9h 22m 4h 29m
try 10h 25m 7h 8m
Sep m-c 6h 12m 4h 8m
try 7h 6m 4h 59m
Oct m-c 6h 41m 3h 43m
try 4h 20m 3h 55m

The average is currently computed only as a simple arithmetic mean, which due to large outlier values might not the the best measurement. The median values were added to the table presented above as a comparison only, and aren’t currently calculated by the report.

As you can see from the chart, the E2E times have decreased over the past 3 months for mozilla-central and try. For try the improvement is even more visible, mostly thanks to the new Try Chooser.

Problem / E2E Report Incomplete

There is one problem that prevents the E2E Times Report from being complete, and that is the nightly builds. The Build Requests generated for the nightly builds have no revision number attached, which means there is currently no exact way of regrouping the individual Build Requests back to the Build Run. To make things more complicated, the nightly’s tests do get revision numbers, that is the revision number of the most recent commit, thus making the nightly’s tests mix up with the previous Build Run’s Build Requests! (contaminating the E2E time of that Build Run too)

To solve this issue, the following bug has been issued to Bugzilla: Bug 594496 – Generate unique id for a push in schedulerdb/statusdb.

Fun Outliers

By sorting the table by the ‘Duration’ column you can run into many surprising findings, like:

  • outrageous wait times:
    • a 500h build run, failed. Cause: 500h wait times. Revision: 19a458b7ab57

  • one can ruin for them all:
    • a 60h build runsuccessful, no wait times. Cause: 1 single talos took 52h and ended successfully (all other Build Requests had normal run times). Revision: 72d2863f43c7

    • a 19h build run, exception, no wait times. Cause: 1 single talos took 17h and ended in exception (all others had normal runt times). Revision: 0e40a49c27bb

  • human rescue intervention (true, a bit late):
    • a 127h build run, no wait times. Cause: some cancelled jobs, after running for too long. Revision: 6dfa6a7c94e0

Thus, the E2E Times Report can also help detect such irregularities in due time!

See Also: Average Time per Builder Report, Build Run Report, Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 15, 2010 at 6:46 am

Build Run Report

with one comment

One push to hg.mozilla.org triggers off the Build System to generate a certain number of Build Requests (depending on the branch). All these build requests make up a Build Run. In a previous post I have covered in more detail its flow through the Build System and the Build Run Life Cycle.

The present post will focus on the Buildapi Report on Build Runs.

URL & Parameters

The report can be accessed at the following URL:

<hostname>/reports/revision/<branch_name>/<revision>

, <branch_name> := the branch name (e.g. mozilla-central, try, ...)
, <revision> := the revision number (first 12 characters)

Report Contents

1. Summary

The report starts by displaying some general information on the Build Run:

Build Run - Summary
Fields explained:

  • Revision – the revision number (first 12 chars)
  • No. build requests – number of Build Requests
  • e2e Time:
    • Duration – the End to End Time (Duration := Greatest Finish Time – Least When Timestamp), or how long it took for all Build Requests in this Build Run to complete
    • Least When Timestamp – the earliest timestamp of the Build Requests’ start times
    • Greatest Finish Time – the latest timestamp of the Build Requests’ finish times
  • Build Requests statuses break down (No. build requests := Complete + Running + Pending + Cancelled + Interrupted + Misc):
    • Complete – number of completed Build Requests
    • Running – number of still running Build Requests
    • Pending – number of still pending Build Requests
    • Cancelled – number of cancelled Build Requests
    • Interrupted – number of interrupted Build Requests
    • Misc – number of Build Requests having other statuses (should never happen)
  • Rebuilds – number of rebuilds
  • Forcebuilds – number of forced builds
  • Results – how many of the Build Requests were successful, with warnings, failed (and/or encountered exceptions) or other (usually still pending and running Build Requests)
  • Build Requests job type break down (No. build requests := Builds + Unittests + Talos):
    • Builds – number of builds
    • Unittests – number of unittests
    • Talos – number of talos

2. Details: Individual Build Requests

Next, the report presents information on the individual Build Requests making up the Build Run. If you are interested in how the Build Requests are fetched from the database and what the individual fields describing a Build Request mean, you might want to also read Build Requests Query.

The table displays a lot of information, and many of the parameters are internal and relevant only to how the Build System works.

Build Run - Build Requests Table

, continuing with:

Build Run - Build Request Table More Info

Demo

Build Run eae6bdacf6d2 Report Demo.

This is just a demo & works only for Build Run with revision number eae6bdacf6d2. All links outside the purpose of this demo were deliberately disabled. Enjoy!

Note: all table columns are sortable.

See Also: Average Time per Builder Report, End to End Times Report, Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 12, 2010 at 10:28 am

Mozilla’s Build System

with 12 comments

Mozilla’s Build System is a very cool distributed system run by Buildbot. The system automatically rebuilds and tests the tree every time something has changed.

The Build Infrastructure currently has around 1,000 machines grouped into 3 pools, each made up of several Build Masters and many Slaves:

  • Build Pool (handles builds triggered by all changes, except those going to Try):
    • 4 Build Masters
    • ~300 Slaves
  • Try Build Pool (handles Try builds):
    • 1 Build Master
    • ~200 Slaves
  • Test Pool (handles all tests, including Try)
    • 7 Test Masters
    • ~400 Slaves

How it works

The hg poller looks for new changes in the hg.mozilla.org repository every few minutes. The changes are picked up by the Build Scheduler Master, which creates Build Requests, one for each of the supported platforms. The Build Requests go into the Scheduler Database as pending. The Build Masters look for pending Build Requests and take them on only if there are free Slaves to assign them to.

Mozilla's Build System

As the builds complete, the Build Master updates their statuses in the Scheduler Database. Also, the Test Scheduler Master creates Test Build Requests for the corresponding tests.

Next, the Test Build Requests are picked up by the Test Masters and assigns them to free Slaves. When the tests are complete, the Test Master updates back their statuses in the Scheduler Database.

Each Build Master and Test Master controls its own set of Slaves.

Build Run Life Cycle

One push to mozilla-central, if successful, generates a total of 168 Build Requests (as of October 2010, but subject to change in the future), from which 10 are builds (one for each of the supported 10 platforms), 108 unittests and 50 talos tests. All these build requests make up a Build Run.

Each of the 10 platform builds comes with its own set of test requests. The tests are created only when the corresponding build completes, and only if successful. Which means that if there are failed builds, some of the tests won’t be created, and the Build Run won’t have 168 Build Requests, but less.

Build Run Life Cycle

Two very important measures in a Build Runs’s life cycle are the Wait Time and End to End Time.

The Wait Time measures how long Build Requests wait in the queue before starting, more specific, it measures the time difference between the timestamp of the change that generated that Build Request and the timestamp of when that Build Request is assigned to a free slave. (see Build Run Life Cycle diagram above)

The End to End Time measures how long it takes for a Build Run to complete. That is, the time difference between the timestamp of the change that triggered this Build Run and the timestamp of when the last of the generated Build Requests ends (in other words, when all builds and tests are completed). (see Build Run Life Cycle diagram above)

The normal End to End Time for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load.

The Great Wall of Mac minis

The builds are done on a mix of VMs, 1U servers, xserves and Mac minis, and all the testing is done on Mac minis.

The Great Wall of Mac minis is made up of a little over 400 of the Mac minis’ boxes, and is located by the Release Engineers’ desks in the Mountain View office. 😀