Anamaria Stoica

My Mozilla Blog

Posts Tagged ‘Report

End to End Times Report

with 10 comments

The End to End Time measures how long it takes for a Build Run to complete. That is, the time difference between the timestamp of the change that triggered this Build Run and the timestamp of when the last of the generated Build Requests ends (in other words, when all builds and tests are completed). (see Build Run Life Cycle diagram below, also published in Mozilla’s Build System blog post)

The normal End to End Time for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load.

Report Contents

Summary

As you can see from the snapshot above (snapshot of the End to End Times Report for try branch as seen on October 22, 2010, a little after 12:00 PM), the report starts with some general information, like the branch selected, number of Build Runs found in the specified timeframe (given by startime and endtime URL GET parameters) and another very important value, the Average build run duration (also called the Average End to End Time).

Build Runs Info

Right under, there’s a table which displays information on individual Build Runs (each row represents a Build Run):

1. Push’s Timestamp

Initially, the table is sorted by the ‘Least When Timestamp‘ column, which is actually the push’s change timestamp. This means that at the top should be listed the most recent pushes to the repo (colored gray if still running/pending). Note: the table is sortable by all other columns too.

2. Result: success vs. warnings vs. failure

The rows have different colors depending on the Build Run’s result (‘Results‘ column): green for success, orange for warnings, red for exception and failure and gray for no result (“-“) (if all Build Requests are currently running or pending).

3. Complete? Still Running?

The ‘Complete‘ column tells whether all Build Requests are completed or not (values: yes/no).

4. End to End Time (Duration)

A very important column is ‘Duration‘, also known as the End to End Time. The duration is computed as following:

Duration := Greatest Finish Time – Least When Timestamp

, or how long it took for all Build Requests in this Build Run to complete (or up until now, if not complete). The ‘Least When Timestamp‘ is the earliest timestamp of the Build Requests’ start times and ‘Greatest Finish Time‘, the latest timestamp of the Build Requests’ finish times.

5. Build Requests Numbers Broke Down by Status And Job Type

The number of Build Requests within a Build Run (differs per branch, for example in mozilla-central there should be 168 if everything was successful) are broke down once by status: Complete, Running, Pending, Cancelled, Interrupted and Misc, and again by job type: Builds, Unittests and Talos.

6. Rebuilds And Forcebuilds

There are also counts on how many rebuilds and forcebuilds were done.

7. Further Information, Link to Build Run Report Page

To see more about the different parameters, check out the Build Run Report. The revision links on the ‘Revision‘ column points to such reports, where you can see the exact status of individual Build Requests.

The End to End Times Report contains all the Build Runs displayed by Tinderboxpushlog, but with accurate data (which does not lie! 🙂 ). However the report was not intended as a real time monitorization tool, but rather as an analysis tool which provides a peak into how well the Build System is performing. Not so far anyways…

Average End to End Times (E2E)

Here are some E2E Averages computed per month, though E2E times tend to vary greatly from week to week or even from one Build Run to another.

Month Branch Mean Median
Aug m-c 9h 22m 4h 29m
try 10h 25m 7h 8m
Sep m-c 6h 12m 4h 8m
try 7h 6m 4h 59m
Oct m-c 6h 41m 3h 43m
try 4h 20m 3h 55m

The average is currently computed only as a simple arithmetic mean, which due to large outlier values might not the the best measurement. The median values were added to the table presented above as a comparison only, and aren’t currently calculated by the report.

As you can see from the chart, the E2E times have decreased over the past 3 months for mozilla-central and try. For try the improvement is even more visible, mostly thanks to the new Try Chooser.

Problem / E2E Report Incomplete

There is one problem that prevents the E2E Times Report from being complete, and that is the nightly builds. The Build Requests generated for the nightly builds have no revision number attached, which means there is currently no exact way of regrouping the individual Build Requests back to the Build Run. To make things more complicated, the nightly’s tests do get revision numbers, that is the revision number of the most recent commit, thus making the nightly’s tests mix up with the previous Build Run’s Build Requests! (contaminating the E2E time of that Build Run too)

To solve this issue, the following bug has been issued to Bugzilla: Bug 594496 – Generate unique id for a push in schedulerdb/statusdb.

Fun Outliers

By sorting the table by the ‘Duration’ column you can run into many surprising findings, like:

  • outrageous wait times:
    • a 500h build run, failed. Cause: 500h wait times. Revision: 19a458b7ab57

  • one can ruin for them all:
    • a 60h build runsuccessful, no wait times. Cause: 1 single talos took 52h and ended successfully (all other Build Requests had normal run times). Revision: 72d2863f43c7

    • a 19h build run, exception, no wait times. Cause: 1 single talos took 17h and ended in exception (all others had normal runt times). Revision: 0e40a49c27bb

  • human rescue intervention (true, a bit late):
    • a 127h build run, no wait times. Cause: some cancelled jobs, after running for too long. Revision: 6dfa6a7c94e0

Thus, the E2E Times Report can also help detect such irregularities in due time!

See Also: Average Time per Builder Report, Build Run Report, Pushes Report, Wait Times Report

Advertisements

Written by Anamaria Stoica

November 15, 2010 at 6:46 am

Build Run Report

with one comment

One push to hg.mozilla.org triggers off the Build System to generate a certain number of Build Requests (depending on the branch). All these build requests make up a Build Run. In a previous post I have covered in more detail its flow through the Build System and the Build Run Life Cycle.

The present post will focus on the Buildapi Report on Build Runs.

URL & Parameters

The report can be accessed at the following URL:

<hostname>/reports/revision/<branch_name>/<revision>

, <branch_name> := the branch name (e.g. mozilla-central, try, ...)
, <revision> := the revision number (first 12 characters)

Report Contents

1. Summary

The report starts by displaying some general information on the Build Run:

Build Run - Summary
Fields explained:

  • Revision – the revision number (first 12 chars)
  • No. build requests – number of Build Requests
  • e2e Time:
    • Duration – the End to End Time (Duration := Greatest Finish Time – Least When Timestamp), or how long it took for all Build Requests in this Build Run to complete
    • Least When Timestamp – the earliest timestamp of the Build Requests’ start times
    • Greatest Finish Time – the latest timestamp of the Build Requests’ finish times
  • Build Requests statuses break down (No. build requests := Complete + Running + Pending + Cancelled + Interrupted + Misc):
    • Complete – number of completed Build Requests
    • Running – number of still running Build Requests
    • Pending – number of still pending Build Requests
    • Cancelled – number of cancelled Build Requests
    • Interrupted – number of interrupted Build Requests
    • Misc – number of Build Requests having other statuses (should never happen)
  • Rebuilds – number of rebuilds
  • Forcebuilds – number of forced builds
  • Results – how many of the Build Requests were successful, with warnings, failed (and/or encountered exceptions) or other (usually still pending and running Build Requests)
  • Build Requests job type break down (No. build requests := Builds + Unittests + Talos):
    • Builds – number of builds
    • Unittests – number of unittests
    • Talos – number of talos

2. Details: Individual Build Requests

Next, the report presents information on the individual Build Requests making up the Build Run. If you are interested in how the Build Requests are fetched from the database and what the individual fields describing a Build Request mean, you might want to also read Build Requests Query.

The table displays a lot of information, and many of the parameters are internal and relevant only to how the Build System works.

Build Run - Build Requests Table

, continuing with:

Build Run - Build Request Table More Info

Demo

Build Run eae6bdacf6d2 Report Demo.

This is just a demo & works only for Build Run with revision number eae6bdacf6d2. All links outside the purpose of this demo were deliberately disabled. Enjoy!

Note: all table columns are sortable.

See Also: Average Time per Builder Report, End to End Times Report, Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 12, 2010 at 10:28 am

Introducing the Average Time per Builder Report

with 3 comments

The Average Time per Builder Report measures the average run time of each builder (e.g. ‘Linux mozilla-central build’, ‘Rev3 Fedora 12 mozilla-central opt test crashtest’) within a branch, computed over a timeframe. It also calculates the percentage of time spent by the system running jobs for each builder and the percentage of successful vs. warnings vs. failed jobs. In addition, all information mentioned above is aggregated and filterable by platform (fedora vs. fedora64 vs. leopard vs. linux…), build type (debug vs. opt) and job type (build vs. talos vs. unittest vs. repack).

First &amp; last builders sorted by avg. run time Time spent per each platform in mozilla-central (Oct 1-20)
First & last builders sorted by avg. run time (mozilla-central, Oct 1-20) Time spent per each platform in mozilla-central (Oct 1-20)

URL & Parameters

The report can be accessed at the following URL:

<hostname>/reports/builders/<branch_name> ?(<param>=<value>&)*

, <branch_name> := the branch name (e.g. mozilla-central, try, ...)

Parameters (all optional):

  • format – format of the output; allowed values: html, json, chart; default: html
  • starttime – start time, UNIX timestamp (in seconds); default: endtime minus 24 hours
  • endtime – end time, UNIX timestamp (in seconds); default: starttime plus 24 hours or current time (if starttime is not specified either)
  • tqx – used by Google Visualization API (automatically appended by the library), relevant only if format=chart; default:
  • plaform – comma separated list of platforms; filter and display results only for the listed platforms; allowed values: fedora, fedora64, leopard, linux, linux64, snowleopard, win2k3, win7, win764, xp; default: ” (all)
  • build_type – comma separated list of build types (opt, debug); filter and display results only for the listed build types; allowed values: debug, opt; default: ” (all)
  • job_type – comma separated list of job types (build, unittest, talos, repack); filter and display results only for the listed job types; allowed values: build, repack, talos, unittest; default: ” (all)
  • detail_level – the detail level for the results (builder, job_type, build_type, platform, branch). By default, results are computed per builder; the other detail levels aggregate the results at job type, build type, platform or branch level; allowed values: branch, platform, build_type, job_type, builder; default: builder

Features

1. Average Run Time

First and foremost, the report measures the average run time for each builder (detail_level=builder). This way you can see how long individual builds, unittests and talos take on average and compare them.

By setting different filters, it is possible to compare only the builders of a platform, build type or job type of interest. Just to take a couple of examples, it’s very easy to see:

  • which fedora unittest takes the longest (platform=fedora; job_type=unittest; detail_level=builder): Rev3 ‘Fedora 12 mozilla-central debug test mochitests-4/5’ with 0h 59m 45s – see Fedora Unittest Demo

or

  • which platform takes the longest to build (platform=; build_type=debug,opt; job_type=build; detail_level=builder): ‘OS X 10.6.2 mozilla-central nightly’ with 2h 52m 46s – see Platform Builds Demo

The averages are simple arithmetic means so far, calculated over the number of Build Requests found for each builder within the specified timeframe. The number of Build Requests are displayed on the ‘No. breqs’ column and are different for each builder.

As a future improvement, the median could be used instead of the simple mean, or removing the outliers when computing the mean.

2. Percentage of System Run Time

In addition to the average run times, the report measures the percentage of time spent by the system doing jobs of a certain type (‘PTG Run Time %’ column). This number is computed by summing the run times of all Build Requests of a certain builder (job type, build type or platform, depending on the chosen detail level) and divided by the sum of the run times of all Build Requests displayed, after all filters have been applied (platforms, build types or job types).

ExampleHow much time is spent per each Linux builder?

  • Filters: platform=linux; build_type=opt,debug; job_type=build
  • Detail level: builder

As you can see from the table above, when looking only at Linux build builders, the system spends 34.78% doing ‘Android R7 mozilla-central build’ builds, based on 345 Build Requests having an average of 33m 21s. The percentage goes up both with the number of Build Requests and average run time.

The example looks at jobs registered between October 1-20, 2010 on mozilla-central. The same example can be accessed on the demo page at Linux Builders Demo.

3. Aggregation

It is possible to aggregate the results for the builders on upper levels, by setting the detail_level to job_type, build_type, platform or branch.

To make things more clear, let’s take an example: How much time is it spent per each Snowleopard optimized job type?

  • Filters: platform=snowleopard; build_type=opt; job_type=build,repack,talos,unittest
  • Detail level: job_type

The example looks at jobs registered between October 1-20, 2010 on mozilla-central. See demo page at Snowleopard optimized job types.

4. Filters

There are 3 types of filters that can be set: platforms, build types and job types. All of them have been used in one or more of the previous examples. For instance, in the ‘How much time is it spent per each Snowleopard optimized job type’ example (see 3. Aggregation), the filters are set as follows: platform=snowleopard; build_type=opt; job_type=build,repack,talos,unittest.

5. Percentage of Success vs. Warnings vs. Failure

Another interesting information presented by the report is the percentage of success vs. warnings vs. failure of registered build requests. By sorting the results by these values, you can easily see which tests fail the most, always fail, or always pass.

Examples:

  • Most failing builders (note: there are 11 builders with 100% failure (failure or exception) rate; why do they always fail?)

Demo

Average Time per Builder Report Demo

This is just a demo & works only for mozilla-central for October 1-20, 2010. All links outside the purpose of this demo were deliberately disabled. Enjoy!

Note: all table columns are sortable.

Repository

The main module handling the Builders report is buildapi.model.builders.

See Also: Pushes Report, Wait Times Report

Written by Anamaria Stoica

November 10, 2010 at 8:06 am

Introducing the Pushes Report

with one comment

The Pushes Report counts the number of pushes per branch within a selected timeframe. This number is equivalent with the number of pushes listed in Mercurial’s pushlog, like for example the mozilla-central pushlog.

In addition, the report brakes down the number of pushes per time intervals, listing for example how many pushes were there every 2 hours within a day, or every day within a month or week, or even every month (~30 days) within a year.

To see exactly how pushes are fetched from the Scheduler Database, and what restrictions are applied on them, see Pushes Query.

URL & Parameters

The Pushes Report can be accessed at the following URL:

<hostname>/reports/pushes?(<param>=<value>&)*

Parameters (all optional):

  • format – format of the output; allowed values: html, json, chart; default: html
  • starttime – start time, UNIX timestamp (in seconds); default: endtime minus 24 hours
  • endtime – end time, UNIX timestamp (in seconds); default: starttime plus 24 hours or current time (if starttime is not specified either
  • int_size – break down results per time intervals of this length if value > 0; values are specified in seconds; default: 7200 (2 hours)
  • branch – comma separated list of branches; displays results only for branches in the list; default: ” (all)
  • tqx – used by Google Visiualization API (automatically appended by the library), relevant only if format=chart; default:

Examples

1. Daily Report

  • timeframe: Sep 22 (1 day)
  • interval size: 2h (7200s)
  • branches: all (default)
  • url: <hostname>/reports/pushes?starttime=1285138800&endtime=1282546800

Notes:

  • busier in the middle of the work hours, less busy at night, however this is not the usual case! The number of pushes during a day varies very much from a day to another. The day of the week, release development cycle stage or other external factors greatly influence how the chart looks.

Daily Report - Sep 22

 

2. Weekly Report & Branch Filtering

  • timeframe: Sep 13-20 (1 week)
  • interval size: 8h (28800s)
  • branches: mozilla-central, try
  • url: <hostname>/reports/pushes?starttime=1284361200&endtime=1284966000&int_size=28800&branch=mozilla-central,try

Notes:

  • busier in the middle of the week, less busy on weekends (as expected)
  • compare only branches of interest/relevant for a more clear view

Weekly Report - Sep 13-20, filter branches: m-c, try

 

All reports have column chart versions:

Weekly Report - Sep 13-20, filter branches: m-c, try (column chart version)

 

3. Monthly Report

  • timeframe: July 1-31 (1 month)
  • interval size: 1 day (86400s)
  • branches: all
  • url: <hostname>/reports/pushes?starttime=1277967600&endtime=1280559600&int_size=86400

Notes:

  • you can distinguish the weeks apart within the month, with little activity on weekends
  • the gap at the beginning of the chart is the Summit week 🙂

Monthly Report - July

 

4. Annual Report

  • timeframe: June-Sep (4 months – we only have data as of June)
  • interval size: 30 days (2592000s)
  • branches: all
  • url: <hostname>/reports/pushes?starttime=1275375600&endtime=1285830000&int_size=2592000

Notes:

  • compare the activity from a moth to another
  • the interval is exactly 30 days, and not the number of days in that month!

Annual Report - June, July, August, September

See Also: Pushes Query, Wait Times Report

Written by Anamaria Stoica

October 16, 2010 at 9:33 am

Posted in Buildapi, Mozilla

Tagged with , ,

Introducing the Wait Times Report

with 3 comments

The Wait Times Report was the first report I got to work on. The report measures how long jobs wait in the queue before starting, more specific, it measures the time difference between the timestamp of the change that generated that job and the timestamp of when that job is assigned to a free slave.

The report is per build pool: build pool, try build pool and test pool. For more specific details on the report contents jump further to Report Contents.

It also allows the specification of a timeframe for the jobs (starttime and endtime as UNIX timestamps). If these parameters are not specified, the defaults are used: endtime will be the server’s current timestamp and starttime 24 hours before (i.e. the last 24 hours).

To see exactly how jobs are selected from the Scheduler Database, and what restrictions are applied on them, see Wait Times Query.

URL & Parameters

The Wait Times report can be accessed the following URL:

<hostname>/reports/waittimes/<pool>?(<param>=<value>&)*

, <pool> := buildpool | trybuildpool | testpool

Prameters (all optional):

  • format – format of the output; allowed values: html, json, chart; default: html
  • starttime – start time, UNIX timestamp (in seconds); default: endtime minus 24 hours
  • endtime – end time, UNIX timestamp (in seconds); default: starttime plus 24 hours or current time (if starttime is not specified either
  • int_size – break down results per time intervals of this length if value > 0; values are specified in seconds; default: 7200 (2 hours)
  • mpb – minutes per block, length of wait time block in minutes; default: 15
  • maxb – maximum block size; for wait times larger than maxb, group stats into the largest block, if maxb > 0; default: 0
  • num – the wait times for each block are represented either as the actual values (full) or percentages (ptg), relevant only if format=chart; allowed values: full, ptg; default: full
  • tqx – used by Google Visiualization API (automatically appended by the library), relevant only if format=chart; default:

Wait Time E-Mails

The Wait Time e-mails are sent by fetching and parsing the JSON format of these reports (found at <report_url>?<report_params>&format=json).

Report Contents

The report measures how long jobs wait in the queue before starting, considering all jobs in one build pool, submitted in a specified timeframe (several other filters are applied too).

The report groups jobs’ wait times in blocks of mpb minutes, for example: 0-15, 15-30, 30-45,… are the first 3 blocks, where a block has 15 minutes (mpb=15). For each of these blocks, the report counts how many jobs had their wait time in that interval.

Let’s say we have:
0-15  44 88%
15-30 5 10%
30-45 1 2%
In the report above, we have 50 jobs, from which 44 jobs waited between 0 and 15 minutes, representing 88% of all jobs registered, 5 jobs (10%) waited between 15 and 30 minutes and only 1 job (2%) waited more than 30 minutes.
For a real, more detailed example, scroll down to Wait Times Example.

The same stats are computed, but broke down by platform (linux, linux64, fedora, snowleopard, xp, … for complete list see buildapi.model.util.PLATFORMS_BUILDERNAME).

Report Python Class
The Wait Times Report Python class can be found at buildapi.model.waittimes.WaitTimesReport.

Constructing the Report
The report is computed by calling buildapi.model.waittimes.GetWaitTimes. This function calls buildapi.model.waittimes.WaitTimesQuery, which handles the logic of selecting only the jobs of interest. See Wait Times Query post for further details.

Each of the jobs are added to the report one by one, and the report stats are updated in the same time.

Other Report Info:

  • unknownbuilders – excluded builders, like l10n
  • otherplatforms – platforms not found in known platforms, and not excluded
  • pending – jobs that have not started yet (still waiting)
  • has_no_changes – jobs that have no change, like nightly builds

Example

Wait Times for August 6th, 2010, for try build pool. The report online looks like this:

We can see the wait times were bad for that day, only 58.84% (752) jobs waited between 0 and 15 minutes, 5.24% (64) jobs waited between 15 and 30 minutes, and over 28% (362) jobs waited more than 60 minutes (blue table on the left)! On the right the numbers are broke down by platform (green tables on the right).

The overall wait times (blue table on the left) are also displayed as charts broke down by time intervals (int_size = 2 hours):

Wait Times Aug 6th Trybuildpool - Percentage Stacked Chart

Chart 1 - Percentage Stacked Chart

Chart 1 displays the percentage of each of the wait time blocks per time interval. For example, in the 2:00-4:00 interval, around 50% of the jobs waited less than 15 minutes (blue color), around 30% jobs waited 15 to 30 minutes (red color), and 20% jobs waited 30 to 45 minutes (orange), and there are no jobs that waited more than 45 minutes. You can see that starting with 2PM (14:00) wait times started going really bad, and from 6PM-8PM the majority of jobs waited more than 60 minutes (purple block)!

Same data, but scaled by number of jobs:

Chart 2 - Stacked Chart

Chart 2 - Column Chart

See Also: Wait Times Query, Pushes Report

Written by Anamaria Stoica

October 13, 2010 at 12:38 pm

Posted in Buildapi, Mozilla

Tagged with , ,