Anamaria Stoica

My Mozilla Blog

Introducing the Pushes Report

with one comment

The Pushes Report counts the number of pushes per branch within a selected timeframe. This number is equivalent with the number of pushes listed in Mercurial’s pushlog, like for example the mozilla-central pushlog.

In addition, the report brakes down the number of pushes per time intervals, listing for example how many pushes were there every 2 hours within a day, or every day within a month or week, or even every month (~30 days) within a year.

To see exactly how pushes are fetched from the Scheduler Database, and what restrictions are applied on them, see Pushes Query.

URL & Parameters

The Pushes Report can be accessed at the following URL:

<hostname>/reports/pushes?(<param>=<value>&)*

Parameters (all optional):

  • format – format of the output; allowed values: html, json, chart; default: html
  • starttime – start time, UNIX timestamp (in seconds); default: endtime minus 24 hours
  • endtime – end time, UNIX timestamp (in seconds); default: starttime plus 24 hours or current time (if starttime is not specified either
  • int_size – break down results per time intervals of this length if value > 0; values are specified in seconds; default: 7200 (2 hours)
  • branch – comma separated list of branches; displays results only for branches in the list; default: ” (all)
  • tqx – used by Google Visiualization API (automatically appended by the library), relevant only if format=chart; default:

Examples

1. Daily Report

  • timeframe: Sep 22 (1 day)
  • interval size: 2h (7200s)
  • branches: all (default)
  • url: <hostname>/reports/pushes?starttime=1285138800&endtime=1282546800

Notes:

  • busier in the middle of the work hours, less busy at night, however this is not the usual case! The number of pushes during a day varies very much from a day to another. The day of the week, release development cycle stage or other external factors greatly influence how the chart looks.

Daily Report - Sep 22

 

2. Weekly Report & Branch Filtering

  • timeframe: Sep 13-20 (1 week)
  • interval size: 8h (28800s)
  • branches: mozilla-central, try
  • url: <hostname>/reports/pushes?starttime=1284361200&endtime=1284966000&int_size=28800&branch=mozilla-central,try

Notes:

  • busier in the middle of the week, less busy on weekends (as expected)
  • compare only branches of interest/relevant for a more clear view

Weekly Report - Sep 13-20, filter branches: m-c, try

 

All reports have column chart versions:

Weekly Report - Sep 13-20, filter branches: m-c, try (column chart version)

 

3. Monthly Report

  • timeframe: July 1-31 (1 month)
  • interval size: 1 day (86400s)
  • branches: all
  • url: <hostname>/reports/pushes?starttime=1277967600&endtime=1280559600&int_size=86400

Notes:

  • you can distinguish the weeks apart within the month, with little activity on weekends
  • the gap at the beginning of the chart is the Summit week 🙂

Monthly Report - July

 

4. Annual Report

  • timeframe: June-Sep (4 months – we only have data as of June)
  • interval size: 30 days (2592000s)
  • branches: all
  • url: <hostname>/reports/pushes?starttime=1275375600&endtime=1285830000&int_size=2592000

Notes:

  • compare the activity from a moth to another
  • the interval is exactly 30 days, and not the number of days in that month!

Annual Report - June, July, August, September

See Also: Pushes Query, Wait Times Report

Written by Anamaria Stoica

October 16, 2010 at 9:33 am

Posted in Buildapi, Mozilla

Tagged with , ,

Pushes Query

with 2 comments

One other very important piece of information that can be extracted from the Scheduler Database, besides build requests and jobs, are pushes.

The information about one push is spread among 3 tables: sourcestams, sourcestamp_changes and changes.

The SQLAlchemy query fetches all pushes in a specific time frame, and allows filtering and exclusion of specific branches:

s = meta.scheduler_db_meta.tables[‘sourcestamps’]
sch = meta.scheduler_db_meta.tables[‘sourcestamp_changes’]
c = meta.scheduler_db_meta.tables[‘changes’]

q = select([s.c.revision, s.c.branch, c.c.author, c.c.when_timestamp],
and_(sch.c.changeid == c.c.changeid, s.c.id == sch.c.sourcestampid))
q = q.group_by(c.c.when_timestamp, s.c.branch)

# 3. exlude branches – not of interest / fake
# 4. filter branches
# 5. timeframe

Query explained:

  1. JOIN between sourcestamps, sourcestamp_changes and changes tables. The sourcestamps table contains information about the revision, branch, author and the changes table contains information about the change’s timestamp (when_timestamp).
  2. GROUP BY – next we group by the change’s timestamp (multiple builds of the same push will have the same when_timestamp) and branch (one push could affect multiple branches).
  3. Exclude branches that are not of interest like l10n and fake branches like addontester or the ones generated by unittests or talos tests (e.g. mozilla-central-win32-debug-unittest or mozilla-central-macosx64-talos).
  4. Fetch only pushes of requested branches (e.g. mozilla-central, try).
  5. Fetch only pushes having the change’s timestamp (c.when_timestamp) in the requested time frame, specified by starttime and endtime.

See Also: Build Request Query, Wait Times Query

Written by Anamaria Stoica

October 15, 2010 at 4:51 am

Posted in Buildapi, Mozilla

Tagged with , ,

Introducing the Wait Times Report

with 3 comments

The Wait Times Report was the first report I got to work on. The report measures how long jobs wait in the queue before starting, more specific, it measures the time difference between the timestamp of the change that generated that job and the timestamp of when that job is assigned to a free slave.

The report is per build pool: build pool, try build pool and test pool. For more specific details on the report contents jump further to Report Contents.

It also allows the specification of a timeframe for the jobs (starttime and endtime as UNIX timestamps). If these parameters are not specified, the defaults are used: endtime will be the server’s current timestamp and starttime 24 hours before (i.e. the last 24 hours).

To see exactly how jobs are selected from the Scheduler Database, and what restrictions are applied on them, see Wait Times Query.

URL & Parameters

The Wait Times report can be accessed the following URL:

<hostname>/reports/waittimes/<pool>?(<param>=<value>&)*

, <pool> := buildpool | trybuildpool | testpool

Prameters (all optional):

  • format – format of the output; allowed values: html, json, chart; default: html
  • starttime – start time, UNIX timestamp (in seconds); default: endtime minus 24 hours
  • endtime – end time, UNIX timestamp (in seconds); default: starttime plus 24 hours or current time (if starttime is not specified either
  • int_size – break down results per time intervals of this length if value > 0; values are specified in seconds; default: 7200 (2 hours)
  • mpb – minutes per block, length of wait time block in minutes; default: 15
  • maxb – maximum block size; for wait times larger than maxb, group stats into the largest block, if maxb > 0; default: 0
  • num – the wait times for each block are represented either as the actual values (full) or percentages (ptg), relevant only if format=chart; allowed values: full, ptg; default: full
  • tqx – used by Google Visiualization API (automatically appended by the library), relevant only if format=chart; default:

Wait Time E-Mails

The Wait Time e-mails are sent by fetching and parsing the JSON format of these reports (found at <report_url>?<report_params>&format=json).

Report Contents

The report measures how long jobs wait in the queue before starting, considering all jobs in one build pool, submitted in a specified timeframe (several other filters are applied too).

The report groups jobs’ wait times in blocks of mpb minutes, for example: 0-15, 15-30, 30-45,… are the first 3 blocks, where a block has 15 minutes (mpb=15). For each of these blocks, the report counts how many jobs had their wait time in that interval.

Let’s say we have:
0-15  44 88%
15-30 5 10%
30-45 1 2%
In the report above, we have 50 jobs, from which 44 jobs waited between 0 and 15 minutes, representing 88% of all jobs registered, 5 jobs (10%) waited between 15 and 30 minutes and only 1 job (2%) waited more than 30 minutes.
For a real, more detailed example, scroll down to Wait Times Example.

The same stats are computed, but broke down by platform (linux, linux64, fedora, snowleopard, xp, … for complete list see buildapi.model.util.PLATFORMS_BUILDERNAME).

Report Python Class
The Wait Times Report Python class can be found at buildapi.model.waittimes.WaitTimesReport.

Constructing the Report
The report is computed by calling buildapi.model.waittimes.GetWaitTimes. This function calls buildapi.model.waittimes.WaitTimesQuery, which handles the logic of selecting only the jobs of interest. See Wait Times Query post for further details.

Each of the jobs are added to the report one by one, and the report stats are updated in the same time.

Other Report Info:

  • unknownbuilders – excluded builders, like l10n
  • otherplatforms – platforms not found in known platforms, and not excluded
  • pending – jobs that have not started yet (still waiting)
  • has_no_changes – jobs that have no change, like nightly builds

Example

Wait Times for August 6th, 2010, for try build pool. The report online looks like this:

We can see the wait times were bad for that day, only 58.84% (752) jobs waited between 0 and 15 minutes, 5.24% (64) jobs waited between 15 and 30 minutes, and over 28% (362) jobs waited more than 60 minutes (blue table on the left)! On the right the numbers are broke down by platform (green tables on the right).

The overall wait times (blue table on the left) are also displayed as charts broke down by time intervals (int_size = 2 hours):

Wait Times Aug 6th Trybuildpool - Percentage Stacked Chart

Chart 1 - Percentage Stacked Chart

Chart 1 displays the percentage of each of the wait time blocks per time interval. For example, in the 2:00-4:00 interval, around 50% of the jobs waited less than 15 minutes (blue color), around 30% jobs waited 15 to 30 minutes (red color), and 20% jobs waited 30 to 45 minutes (orange), and there are no jobs that waited more than 45 minutes. You can see that starting with 2PM (14:00) wait times started going really bad, and from 6PM-8PM the majority of jobs waited more than 60 minutes (purple block)!

Same data, but scaled by number of jobs:

Chart 2 - Stacked Chart

Chart 2 - Column Chart

See Also: Wait Times Query, Pushes Report

Written by Anamaria Stoica

October 13, 2010 at 12:38 pm

Posted in Buildapi, Mozilla

Tagged with , ,

Wait Times Query

with 3 comments

The Wait Times Query is very similar to Build Request Query, only that it fetches jobs (does not care about multiple builds of the same BuildRequest), it selects a different subset of columns and has several other restrictions in addition.

The base query, using SQLAlchemy, looks like this:

q = outerjoin(br, b, b.c.brid == br.c.id) \
.join(bs, bs.c.id == br.c.buildsetid) \
.join(s, s.c.id == bs.c.sourcestampid) \
.outerjoin(sch, sch.c.sourcestampid == s.c.id) \
.outerjoin(c, c.c.changeid == sch.c.changeid) \
.select().with_only_columns([…])
# multiple restrictions
.group_by(br.c.id)

For the meaning of the JOINs and the tables involved, see Build Request Query. In this post, I’ll continue by describing only the differences (placed where commented #more restrictions):

1. Pool selection – fetching the jobs belonging only to a pool

This is done by filtering jobs claimed by masters in the selected pool (i.e. by looking at values of buildrequests.claymed_by_name column). There are currently 3 pools: ‘buildpool’, ‘trybuildpool’ and ‘testpool’, each having a different number of masters. For example, buildpool has 4 masters:

  • ‘production-master01.build.mozilla.org’
  • ‘production-master03.build.mozilla.org’
  • ‘buildbot-master1.build.scl1.mozilla.com:/builds/buildbot/build_master3’
  • ‘buildbot-master2.build.scl1.mozilla.com:/builds/buildbot/build_master4’

The masters in each pool are specified by BUILDPOOL_MASTERS in buildapi.model.util module.

One exception are PENDING jobs, as they haven’t been claimed by any master yet (buildrequest.claimed_by_name is NULL). However, it is possible to tell which pool they belong to by looking at buildrequests.buildername‘s value:

  • buildpool: br.claimed_by_name is NULL AND br.complete = 0 AND br.buildername NOT LIKE ‘Rev3%’ AND br.buildername NOT LIKE ‘% tryserver %’
  • trybuildpool: br.claimed_by_name is NULL AND br.complete = 0 AND br.buildername NOT LIKE ‘Rev3%’ AND br.buildername LIKE ‘% tryserver %’
  • testpool: br.claimed_by_name is NULL AND br.complete = 0 AND br.buildername LIKE ‘Rev3%’

(where br is buildrequests table)

2. Timeframe filtering

Filters out only the jobs with the change’s timestamp in the interval [starttime, endtime). The change’s timestamp is specified by changes.when_timestamp column, except for the nightly builds that have no changes. In those cases we’ll look at buildrequest.submitted_at values (which are usually at most a few minutes later).

q = q.where(or_(c.c.when_timestamp >= starttime, br.c.submitted_at >= starttime))
q = q.where(or_(c.c.when_timestamp < endtime, br.c.submitted_at < endtime))

3. Rebuilds and forced builds exclusion

All rebuilds and forced builds are excluded from the stats. This is done by looking at buildsets.reason column, and filtering out values found in buildapi.model.util.WAITTIMES_BUILDSET_REASON_SQL_EXCLUDE.

4. Exclude buildernames that are not of interest, like fuzzers

The exclusion list is specified by buildapi.model.util.WAITTIMES_BUILDREQUESTS_BUILDERNAME_SQL_EXCLUDE.

See Also: Build Request Query, Pushes Query, Wait Times Report.

Written by Anamaria Stoica

October 13, 2010 at 12:30 am

Build Request Query

with 6 comments

Many of the reports (End to End Times Report, Build Run Report, TryChooser Report, Average Time per Builder Report, Builder Report) use BuildRequests as constructing blocks. In this post I will describe how BuildRequests are fetched from Buildbot’s scheduler database.

First of all, scheduler database has the following schema:

Scheduler Database Schema

Scheduler Database Schema

The information about one BuildRequest is spread among at least 5 tables: builds, buildrequests, buildsets, sourcestamps, sourcestamp_changes and changes. Which means there is no other way to fetch the data we need other than creating a big JOIN/OUTERJOIN for the 5 above mentioned tables. This rather unfriendly query is necessary as a result of scheduler database’s design to work optimal with Buildbot’s internal mechanisms rather than our current query’s need.

The actual query, using SQLAlchemy, looks like this:

b = meta.scheduler_db_meta.tables[‘builds’]
br = meta.scheduler_db_meta.tables[‘buildrequests’]
bs = meta.scheduler_db_meta.tables[‘buildsets’]
s = meta.scheduler_db_meta.tables[‘sourcestamps’]
sch = meta.scheduler_db_meta.tables[‘sourcestamp_changes’]
c = meta.scheduler_db_meta.tables[‘changes’]

q = outerjoin(br, b, b.c.brid==br.c.id) \
.join(bs, bs.c.id==br.c.buildsetid) \
.join(s, s.c.id==bs.c.sourcestampid) \
.outerjoin(sch, sch.c.sourcestampid==s.c.id) \
.outerjoin(c, c.c.changeid==sch.c.changeid) \
.select().with_only_columns([…]) \
.group_by(br.c.id, b.c.id)

Query explained:

JOINS:

  1. OUTERJOIN (LEFT OUTER JOIN) between buildrequests and builds tables – OUTERJOIN is required because some of the BuildRequests might be PENDING or be CANCELLED (thus having no builds, i.e. entries in the builds table)
  2. JOIN with buildests table on buildsetsid column (bs.id = br.buildsetid) – we need to go through the buildesets table in order to link the BuildRequests to the sourcestamps and changes information
  3. JOIN with sourcestamps table – sourcestamps information
  4. OUTERJOIN with sourcestamp_changes on sourcestampsid column (s.id = sch.sourcestampid) – linking further along to changes. An OUTERJOIN was necessary instead of an INNER JOIN, because the nightly builds don’t have a revision number or any entries in the changes table
  5. OUTERJOIN with changes table on changeschangeid column (sch.changeid = c.changeid) – OUTERJOIN again needed in order to include all BuildRequests belonging to nightly builds (see JOIN 4. above)

GROUP BY:
A final group by buildrequests.id and builds.id columns (GROUP BY br.id, b.id) is needed to capture multiple builds for the same BuildRequest. One BuildRequest might have multiple builds (usually very few and at most 2 or 3), if the builds have been retriggered or forced build manually.

Selected table columns explained:

  • b.number
  • b.c.start_time
  • b.c.finish_time
  • br.c.id.label(‘brid’)
  • br.c.buildername
  • br.c.submitted_at
  • br.c.claimed_at
  • br.c.claimed_by_name
  • br.c.complete
  • br.c.complete_at
  • br.c.results
  • br.c.buildsetid
  • bs.c.reason
  • s.c.id.label(‘ssid’)
  • s.c.branch
  • s.c.revision
  • c.c.when_timestamp
  • c.c.author
  • c.c.comments
  • c.c.revlink
  • c.c.category
  • c.c.repository
  • c.c.project

BuildRequest statuses:

  • PENDING – the BuildRequest has not started yet / no Build Master claimed it yet:
NOT b.start_time AND NOT br.claimed_at AND NOT br.complete AND NOT br.complete_at AND NOT b.finish_time
  • RUNNING – the BuildRequest is running (a Build Master claimed the BuildRequest already), and has not finished yet:
b.start_time AND br.claimed_at AND NOT br.complete AND NOT br.complete_at AND NOT b.finish_time
  • COMPLETE – the BuildRequest was completed without any internal errors or external interruptions (i.e. not CANCELLED / INTERRUPTED):
b.start_time AND br.claimed_at AND br.complete AND br.complete_at AND b.finish_time
  • CANCELLED – the BuildRequest was cancelled (i.e. it never got to start):
NOT b.start_time AND NOT br.claimed_at AND br.complete AND br.complete_at AND NOT b.finish_time
  • INTERRUPTED – the build was interrupted (e.g. slave disconnected) and Buildbot retriggered the build:
b.start_time AND br.claimed_at AND br.complete AND br.complete_at AND NOT b.finish_time
  • MISC – should never happen

BuildRequest results (buildrequests.results):
This column specifies how the BuildRequest execusion went, if it is completed. Naturally, the PENDING and RUNNING ones will have NO_RESULT:

  • -1 (NULL) – NO_RESULT
  • 0 – SUCCESS
  • 1 – WARNINGS
  • 2 – FAILURE
  • 3 – SKIPPED
  • 4 – EXCEPTION
  • 5 – RETRY

BuildRequest reasons (buildsets.reason):
The reason of the build, might be the scheduler (normal case), the nightly sheduler, a rebuild or forced build:

  • scheduler
  • nightly, e.g. ‘The Nightly scheduler named ‘Linux x86-64 mozilla-central nightly’ triggered this build’
  • rebuild, e.g. ‘The web-page ‘rebuild’ button was pressed by ‘<unknown>’: redo for slave disconect (nthomas)’
  • force build, e.g. ‘The web-page ‘force build’ button was pressed by ‘jhford’: hg poller is busted’

BuildRequest wait time:
How much the BuildRequest waited from when the change was created or the time it was submitted (only for nightlies because they have no changes) until the build has started (was assigned to a free slave):

change_time := c.when_timestamp, if c.when_timestamp != NULL
:= br.submitted_at, otherwise
WAIT_TIME := b.start_time – change_time, if b.start_time != NULL AND change_time != NULL
:= 0, otherwise

BuildRequest duration:
How long from when the change was created  or the time it was submitted (only for nightlies because they have no changes) until the build was complete, whether if successful or not:

change_time := c.when_timestamp, if c.when_timestamp != NULL
:= br.submitted_at, otherwise
DURATION := br.complete_at – change_time, if br.complete_at != NULL AND change_time != NULL
:= 0, otherwise

BuildRequest run time:
The actual run time of the build:

RUN_TIME := DURATIONWAIT_TIME

See Also: Wait Times Query, Pushes Query

Written by Anamaria Stoica

October 4, 2010 at 10:06 pm