Anamaria Stoica

My Mozilla Blog

Today We’re Launching Firefox 4: Fast, Fun and Awesome

leave a comment »

The day has finally come! Join the party and spread the word now wherever you are!! :)

Firefox Twitter Party

Firefox 4 TShirt

Written by Anamaria Stoica

March 22, 2011 at 9:59 am

Posted in Mozilla

First Day Back to Mozilla, Contracting This Time

with 4 comments

Today is my first day back to Mozilla, after my internship this summer!! This time I’ll be working part-time as a contractor for the same team (Release Engineering), 10h/week for now, and hopefully lots more next semester when I won’t have any courses. Yay!! :)

I’ll be working remotely from Bucharest, Romania (GMT+2 or PST+10 ;)).

Written by Anamaria Stoica

November 22, 2010 at 8:44 am

Posted in Mozilla

Mozilla-Central End to End Times Values Distribution (October)

with 5 comments

In my previous post End to End Times Report I started talking about E2E times, by defining what they are and then looking at some monthly E2E times averages of the past 3 months for mozilla-central and try.

I also kept mentioning that the normal E2E times for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load. Now, how much exactly do the E2E times vary away from the normal times and how?

In order have a better grasp on what the E2E times values distribution might be, I plotted the histogram of all E2E times for mozilla-central registered in October (more precisely October 1-20, 2010). And here’s how it looks after removing the outliers:

Mozilla-central E2E times histogram without outliers (October 1-20)

The histogram above represents the distribution of the E2E times among bins of 15 minutes.

As it turns out the histogram looks pretty nice. Most values (66.38%) are located in the 3h – 4h 25m normal time interval, with a high peak in the 3h 45m – 4h time subinterval.

However there is a long tail of values between 5h to 10 hours. Even though the number of values in each bin is small, summed up together they represent around 15% of the Build Runs.

The values smaller than 3h (10.92%) are build failures and exceptions. The very large outliers (>10h) were excluded from the histogram. They represent 7.18% of all Build Runs, with 4.02% between 10-25h and 3.16% between 25-255h (see plot bellow with outliers included).

Time Interval Percentage (%) Comments
0 – 3h 10.92 % Failures
3h – 4h 25m 66.38 % Normal times
4h 25m – 10h 15.52 % Long tail of large values
>10h 7.18 % Outliers
* 10h-25h: 4.02%
* >25h: 3.16%
Branch mozilla-central
Timeframe ~October 1-20, 2010
No. values 348
Max value 255h 51m
Mean value 7h 12m
Median value 3h 42m

 

Here’s the histogram re-plotted, but this time with all the outliers included:

See Also: End to End Times Report, Mozilla’s Build System.

Written by Anamaria Stoica

November 22, 2010 at 8:02 am

End to End Times Report

with 10 comments

The End to End Time measures how long it takes for a Build Run to complete. That is, the time difference between the timestamp of the change that triggered this Build Run and the timestamp of when the last of the generated Build Requests ends (in other words, when all builds and tests are completed). (see Build Run Life Cycle diagram below, also published in Mozilla’s Build System blog post)

The normal End to End Time for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load.

Report Contents

Summary

As you can see from the snapshot above (snapshot of the End to End Times Report for try branch as seen on October 22, 2010, a little after 12:00 PM), the report starts with some general information, like the branch selected, number of Build Runs found in the specified timeframe (given by startime and endtime URL GET parameters) and another very important value, the Average build run duration (also called the Average End to End Time).

Build Runs Info

Right under, there’s a table which displays information on individual Build Runs (each row represents a Build Run):

1. Push’s Timestamp

Initially, the table is sorted by the ‘Least When Timestamp‘ column, which is actually the push’s change timestamp. This means that at the top should be listed the most recent pushes to the repo (colored gray if still running/pending). Note: the table is sortable by all other columns too.

2. Result: success vs. warnings vs. failure

The rows have different colors depending on the Build Run’s result (‘Results‘ column): green for success, orange for warnings, red for exception and failure and gray for no result (“-“) (if all Build Requests are currently running or pending).

3. Complete? Still Running?

The ‘Complete‘ column tells whether all Build Requests are completed or not (values: yes/no).

4. End to End Time (Duration)

A very important column is ‘Duration‘, also known as the End to End Time. The duration is computed as following:

Duration := Greatest Finish Time – Least When Timestamp

, or how long it took for all Build Requests in this Build Run to complete (or up until now, if not complete). The ‘Least When Timestamp‘ is the earliest timestamp of the Build Requests’ start times and ‘Greatest Finish Time‘, the latest timestamp of the Build Requests’ finish times.

5. Build Requests Numbers Broke Down by Status And Job Type

The number of Build Requests within a Build Run (differs per branch, for example in mozilla-central there should be 168 if everything was successful) are broke down once by status: Complete, Running, Pending, Cancelled, Interrupted and Misc, and again by job type: Builds, Unittests and Talos.

6. Rebuilds And Forcebuilds

There are also counts on how many rebuilds and forcebuilds were done.

7. Further Information, Link to Build Run Report Page

To see more about the different parameters, check out the Build Run Report. The revision links on the ‘Revision‘ column points to such reports, where you can see the exact status of individual Build Requests.

The End to End Times Report contains all the Build Runs displayed by Tinderboxpushlog, but with accurate data (which does not lie! :) ). However the report was not intended as a real time monitorization tool, but rather as an analysis tool which provides a peak into how well the Build System is performing. Not so far anyways…

Average End to End Times (E2E)

Here are some E2E Averages computed per month, though E2E times tend to vary greatly from week to week or even from one Build Run to another.

Month Branch Mean Median
Aug m-c 9h 22m 4h 29m
try 10h 25m 7h 8m
Sep m-c 6h 12m 4h 8m
try 7h 6m 4h 59m
Oct m-c 6h 41m 3h 43m
try 4h 20m 3h 55m

The average is currently computed only as a simple arithmetic mean, which due to large outlier values might not the the best measurement. The median values were added to the table presented above as a comparison only, and aren’t currently calculated by the report.

As you can see from the chart, the E2E times have decreased over the past 3 months for mozilla-central and try. For try the improvement is even more visible, mostly thanks to the new Try Chooser.

Problem / E2E Report Incomplete

There is one problem that prevents the E2E Times Report from being complete, and that is the nightly builds. The Build Requests generated for the nightly builds have no revision number attached, which means there is currently no exact way of regrouping the individual Build Requests back to the Build Run. To make things more complicated, the nightly’s tests do get revision numbers, that is the revision number of the most recent commit, thus making the nightly’s tests mix up with the previous Build Run’s Build Requests! (contaminating the E2E time of that Build Run too)

To solve this issue, the following bug has been issued to Bugzilla: Bug 594496 – Generate unique id for a push in schedulerdb/statusdb.

Fun Outliers

By sorting the table by the ‘Duration’ column you can run into many surprising findings, like:

  • outrageous wait times:
    • a 500h build run, failed. Cause: 500h wait times. Revision: 19a458b7ab57

  • one can ruin for them all:
    • a 60h build runsuccessful, no wait times. Cause: 1 single talos took 52h and ended successfully (all other Build Requests had normal run times). Revision: 72d2863f43c7

    • a 19h build run, exception, no wait times. Cause: 1 single talos took 17h and ended in exception (all others had normal runt times). Revision: 0e40a49c27bb

  • human rescue intervention (true, a bit late):
    • a 127h build run, no wait times. Cause: some cancelled jobs, after running for too long. Revision: 6dfa6a7c94e0

Thus, the E2E Times Report can also help detect such irregularities in due time!

See Also: Average Time per Builder Report, Build Run Report, Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 15, 2010 at 6:46 am

Build Run Report

with one comment

One push to hg.mozilla.org triggers off the Build System to generate a certain number of Build Requests (depending on the branch). All these build requests make up a Build Run. In a previous post I have covered in more detail its flow through the Build System and the Build Run Life Cycle.

The present post will focus on the Buildapi Report on Build Runs.

URL & Parameters

The report can be accessed at the following URL:

<hostname>/reports/revision/<branch_name>/<revision>

, <branch_name> := the branch name (e.g. mozilla-central, try, ...)
, <revision> := the revision number (first 12 characters)

Report Contents

1. Summary

The report starts by displaying some general information on the Build Run:

Build Run - Summary
Fields explained:

  • Revision – the revision number (first 12 chars)
  • No. build requests – number of Build Requests
  • e2e Time:
    • Duration – the End to End Time (Duration := Greatest Finish Time – Least When Timestamp), or how long it took for all Build Requests in this Build Run to complete
    • Least When Timestamp – the earliest timestamp of the Build Requests’ start times
    • Greatest Finish Time – the latest timestamp of the Build Requests’ finish times
  • Build Requests statuses break down (No. build requests := Complete + Running + Pending + Cancelled + Interrupted + Misc):
    • Complete – number of completed Build Requests
    • Running – number of still running Build Requests
    • Pending – number of still pending Build Requests
    • Cancelled – number of cancelled Build Requests
    • Interrupted – number of interrupted Build Requests
    • Misc – number of Build Requests having other statuses (should never happen)
  • Rebuilds – number of rebuilds
  • Forcebuilds – number of forced builds
  • Results – how many of the Build Requests were successful, with warnings, failed (and/or encountered exceptions) or other (usually still pending and running Build Requests)
  • Build Requests job type break down (No. build requests := Builds + Unittests + Talos):
    • Builds – number of builds
    • Unittests – number of unittests
    • Talos – number of talos

2. Details: Individual Build Requests

Next, the report presents information on the individual Build Requests making up the Build Run. If you are interested in how the Build Requests are fetched from the database and what the individual fields describing a Build Request mean, you might want to also read Build Requests Query.

The table displays a lot of information, and many of the parameters are internal and relevant only to how the Build System works.

Build Run - Build Requests Table

, continuing with:

Build Run - Build Request Table More Info

Demo

Build Run eae6bdacf6d2 Report Demo.

This is just a demo & works only for Build Run with revision number eae6bdacf6d2. All links outside the purpose of this demo were deliberately disabled. Enjoy!

Note: all table columns are sortable.

See Also: Average Time per Builder Report, End to End Times Report, Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 12, 2010 at 10:28 am

Introducing the Average Time per Builder Report

with 3 comments

The Average Time per Builder Report measures the average run time of each builder (e.g. ‘Linux mozilla-central build’, ‘Rev3 Fedora 12 mozilla-central opt test crashtest’) within a branch, computed over a timeframe. It also calculates the percentage of time spent by the system running jobs for each builder and the percentage of successful vs. warnings vs. failed jobs. In addition, all information mentioned above is aggregated and filterable by platform (fedora vs. fedora64 vs. leopard vs. linux…), build type (debug vs. opt) and job type (build vs. talos vs. unittest vs. repack).

First &amp; last builders sorted by avg. run time Time spent per each platform in mozilla-central (Oct 1-20)
First & last builders sorted by avg. run time (mozilla-central, Oct 1-20) Time spent per each platform in mozilla-central (Oct 1-20)

URL & Parameters

The report can be accessed at the following URL:

<hostname>/reports/builders/<branch_name> ?(<param>=<value>&)*

, <branch_name> := the branch name (e.g. mozilla-central, try, ...)

Parameters (all optional):

  • format – format of the output; allowed values: html, json, chart; default: html
  • starttime – start time, UNIX timestamp (in seconds); default: endtime minus 24 hours
  • endtime – end time, UNIX timestamp (in seconds); default: starttime plus 24 hours or current time (if starttime is not specified either)
  • tqx – used by Google Visualization API (automatically appended by the library), relevant only if format=chart; default:
  • plaform - comma separated list of platforms; filter and display results only for the listed platforms; allowed values: fedora, fedora64, leopard, linux, linux64, snowleopard, win2k3, win7, win764, xp; default: ” (all)
  • build_type - comma separated list of build types (opt, debug); filter and display results only for the listed build types; allowed values: debug, opt; default: ” (all)
  • job_type - comma separated list of job types (build, unittest, talos, repack); filter and display results only for the listed job types; allowed values: build, repack, talos, unittest; default: ” (all)
  • detail_level - the detail level for the results (builder, job_type, build_type, platform, branch). By default, results are computed per builder; the other detail levels aggregate the results at job type, build type, platform or branch level; allowed values: branch, platform, build_type, job_type, builder; default: builder

Features

1. Average Run Time

First and foremost, the report measures the average run time for each builder (detail_level=builder). This way you can see how long individual builds, unittests and talos take on average and compare them.

By setting different filters, it is possible to compare only the builders of a platform, build type or job type of interest. Just to take a couple of examples, it’s very easy to see:

  • which fedora unittest takes the longest (platform=fedora; job_type=unittest; detail_level=builder): Rev3 ‘Fedora 12 mozilla-central debug test mochitests-4/5′ with 0h 59m 45s – see Fedora Unittest Demo

or

  • which platform takes the longest to build (platform=; build_type=debug,opt; job_type=build; detail_level=builder): ‘OS X 10.6.2 mozilla-central nightly’ with 2h 52m 46s – see Platform Builds Demo

The averages are simple arithmetic means so far, calculated over the number of Build Requests found for each builder within the specified timeframe. The number of Build Requests are displayed on the ‘No. breqs’ column and are different for each builder.

As a future improvement, the median could be used instead of the simple mean, or removing the outliers when computing the mean.

2. Percentage of System Run Time

In addition to the average run times, the report measures the percentage of time spent by the system doing jobs of a certain type (‘PTG Run Time %’ column). This number is computed by summing the run times of all Build Requests of a certain builder (job type, build type or platform, depending on the chosen detail level) and divided by the sum of the run times of all Build Requests displayed, after all filters have been applied (platforms, build types or job types).

ExampleHow much time is spent per each Linux builder?

  • Filters: platform=linux; build_type=opt,debug; job_type=build
  • Detail level: builder

As you can see from the table above, when looking only at Linux build builders, the system spends 34.78% doing ‘Android R7 mozilla-central build’ builds, based on 345 Build Requests having an average of 33m 21s. The percentage goes up both with the number of Build Requests and average run time.

The example looks at jobs registered between October 1-20, 2010 on mozilla-central. The same example can be accessed on the demo page at Linux Builders Demo.

3. Aggregation

It is possible to aggregate the results for the builders on upper levels, by setting the detail_level to job_type, build_type, platform or branch.

To make things more clear, let’s take an example: How much time is it spent per each Snowleopard optimized job type?

  • Filters: platform=snowleopard; build_type=opt; job_type=build,repack,talos,unittest
  • Detail level: job_type

The example looks at jobs registered between October 1-20, 2010 on mozilla-central. See demo page at Snowleopard optimized job types.

4. Filters

There are 3 types of filters that can be set: platforms, build types and job types. All of them have been used in one or more of the previous examples. For instance, in the ‘How much time is it spent per each Snowleopard optimized job type’ example (see 3. Aggregation), the filters are set as follows: platform=snowleopard; build_type=opt; job_type=build,repack,talos,unittest.

5. Percentage of Success vs. Warnings vs. Failure

Another interesting information presented by the report is the percentage of success vs. warnings vs. failure of registered build requests. By sorting the results by these values, you can easily see which tests fail the most, always fail, or always pass.

Examples:

  • Most failing builders (note: there are 11 builders with 100% failure (failure or exception) rate; why do they always fail?)

Demo

Average Time per Builder Report Demo

This is just a demo & works only for mozilla-central for October 1-20, 2010. All links outside the purpose of this demo were deliberately disabled. Enjoy!

Note: all table columns are sortable.

Repository

The main module handling the Builders report is buildapi.model.builders.

See Also: Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 10, 2010 at 8:06 am

Mozilla’s Build System

with 12 comments

Mozilla’s Build System is a very cool distributed system run by Buildbot. The system automatically rebuilds and tests the tree every time something has changed.

The Build Infrastructure currently has around 1,000 machines grouped into 3 pools, each made up of several Build Masters and many Slaves:

  • Build Pool (handles builds triggered by all changes, except those going to Try):
    • 4 Build Masters
    • ~300 Slaves
  • Try Build Pool (handles Try builds):
    • 1 Build Master
    • ~200 Slaves
  • Test Pool (handles all tests, including Try)
    • 7 Test Masters
    • ~400 Slaves

How it works

The hg poller looks for new changes in the hg.mozilla.org repository every few minutes. The changes are picked up by the Build Scheduler Master, which creates Build Requests, one for each of the supported platforms. The Build Requests go into the Scheduler Database as pending. The Build Masters look for pending Build Requests and take them on only if there are free Slaves to assign them to.

Mozilla's Build System

As the builds complete, the Build Master updates their statuses in the Scheduler Database. Also, the Test Scheduler Master creates Test Build Requests for the corresponding tests.

Next, the Test Build Requests are picked up by the Test Masters and assigns them to free Slaves. When the tests are complete, the Test Master updates back their statuses in the Scheduler Database.

Each Build Master and Test Master controls its own set of Slaves.

Build Run Life Cycle

One push to mozilla-central, if successful, generates a total of 168 Build Requests (as of October 2010, but subject to change in the future), from which 10 are builds (one for each of the supported 10 platforms), 108 unittests and 50 talos tests. All these build requests make up a Build Run.

Each of the 10 platform builds comes with its own set of test requests. The tests are created only when the corresponding build completes, and only if successful. Which means that if there are failed builds, some of the tests won’t be created, and the Build Run won’t have 168 Build Requests, but less.

Build Run Life Cycle

Two very important measures in a Build Runs’s life cycle are the Wait Time and End to End Time.

The Wait Time measures how long Build Requests wait in the queue before starting, more specific, it measures the time difference between the timestamp of the change that generated that Build Request and the timestamp of when that Build Request is assigned to a free slave. (see Build Run Life Cycle diagram above)

The End to End Time measures how long it takes for a Build Run to complete. That is, the time difference between the timestamp of the change that triggered this Build Run and the timestamp of when the last of the generated Build Requests ends (in other words, when all builds and tests are completed). (see Build Run Life Cycle diagram above)

The normal End to End Time for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load.

The Great Wall of Mac minis

The builds are done on a mix of VMs, 1U servers, xserves and Mac minis, and all the testing is done on Mac minis.

The Great Wall of Mac minis is made up of a little over 400 of the Mac minis’ boxes, and is located by the Release Engineers’ desks in the Mountain View office. :D

Follow

Get every new post delivered to your Inbox.