Anamaria Stoica

My Mozilla Blog

Posts Tagged ‘Query

Pushes Query

with 2 comments

One other very important piece of information that can be extracted from the Scheduler Database, besides build requests and jobs, are pushes.

The information about one push is spread among 3 tables: sourcestams, sourcestamp_changes and changes.

The SQLAlchemy query fetches all pushes in a specific time frame, and allows filtering and exclusion of specific branches:

s = meta.scheduler_db_meta.tables[‘sourcestamps’]
sch = meta.scheduler_db_meta.tables[‘sourcestamp_changes’]
c = meta.scheduler_db_meta.tables[‘changes’]

q = select([s.c.revision, s.c.branch, c.c.author, c.c.when_timestamp],
and_(sch.c.changeid == c.c.changeid, s.c.id == sch.c.sourcestampid))
q = q.group_by(c.c.when_timestamp, s.c.branch)

# 3. exlude branches – not of interest / fake
# 4. filter branches
# 5. timeframe

Query explained:

  1. JOIN between sourcestamps, sourcestamp_changes and changes tables. The sourcestamps table contains information about the revision, branch, author and the changes table contains information about the change’s timestamp (when_timestamp).
  2. GROUP BY – next we group by the change’s timestamp (multiple builds of the same push will have the same when_timestamp) and branch (one push could affect multiple branches).
  3. Exclude branches that are not of interest like l10n and fake branches like addontester or the ones generated by unittests or talos tests (e.g. mozilla-central-win32-debug-unittest or mozilla-central-macosx64-talos).
  4. Fetch only pushes of requested branches (e.g. mozilla-central, try).
  5. Fetch only pushes having the change’s timestamp (c.when_timestamp) in the requested time frame, specified by starttime and endtime.

See Also: Build Request Query, Wait Times Query

Advertisements

Written by Anamaria Stoica

October 15, 2010 at 4:51 am

Posted in Buildapi, Mozilla

Tagged with , ,

Wait Times Query

with 3 comments

The Wait Times Query is very similar to Build Request Query, only that it fetches jobs (does not care about multiple builds of the same BuildRequest), it selects a different subset of columns and has several other restrictions in addition.

The base query, using SQLAlchemy, looks like this:

q = outerjoin(br, b, b.c.brid == br.c.id) \
.join(bs, bs.c.id == br.c.buildsetid) \
.join(s, s.c.id == bs.c.sourcestampid) \
.outerjoin(sch, sch.c.sourcestampid == s.c.id) \
.outerjoin(c, c.c.changeid == sch.c.changeid) \
.select().with_only_columns([…])
# multiple restrictions
.group_by(br.c.id)

For the meaning of the JOINs and the tables involved, see Build Request Query. In this post, I’ll continue by describing only the differences (placed where commented #more restrictions):

1. Pool selection – fetching the jobs belonging only to a pool

This is done by filtering jobs claimed by masters in the selected pool (i.e. by looking at values of buildrequests.claymed_by_name column). There are currently 3 pools: ‘buildpool’, ‘trybuildpool’ and ‘testpool’, each having a different number of masters. For example, buildpool has 4 masters:

  • ‘production-master01.build.mozilla.org’
  • ‘production-master03.build.mozilla.org’
  • ‘buildbot-master1.build.scl1.mozilla.com:/builds/buildbot/build_master3’
  • ‘buildbot-master2.build.scl1.mozilla.com:/builds/buildbot/build_master4’

The masters in each pool are specified by BUILDPOOL_MASTERS in buildapi.model.util module.

One exception are PENDING jobs, as they haven’t been claimed by any master yet (buildrequest.claimed_by_name is NULL). However, it is possible to tell which pool they belong to by looking at buildrequests.buildername‘s value:

  • buildpool: br.claimed_by_name is NULL AND br.complete = 0 AND br.buildername NOT LIKE ‘Rev3%’ AND br.buildername NOT LIKE ‘% tryserver %’
  • trybuildpool: br.claimed_by_name is NULL AND br.complete = 0 AND br.buildername NOT LIKE ‘Rev3%’ AND br.buildername LIKE ‘% tryserver %’
  • testpool: br.claimed_by_name is NULL AND br.complete = 0 AND br.buildername LIKE ‘Rev3%’

(where br is buildrequests table)

2. Timeframe filtering

Filters out only the jobs with the change’s timestamp in the interval [starttime, endtime). The change’s timestamp is specified by changes.when_timestamp column, except for the nightly builds that have no changes. In those cases we’ll look at buildrequest.submitted_at values (which are usually at most a few minutes later).

q = q.where(or_(c.c.when_timestamp >= starttime, br.c.submitted_at >= starttime))
q = q.where(or_(c.c.when_timestamp < endtime, br.c.submitted_at < endtime))

3. Rebuilds and forced builds exclusion

All rebuilds and forced builds are excluded from the stats. This is done by looking at buildsets.reason column, and filtering out values found in buildapi.model.util.WAITTIMES_BUILDSET_REASON_SQL_EXCLUDE.

4. Exclude buildernames that are not of interest, like fuzzers

The exclusion list is specified by buildapi.model.util.WAITTIMES_BUILDREQUESTS_BUILDERNAME_SQL_EXCLUDE.

See Also: Build Request Query, Pushes Query, Wait Times Report.

Written by Anamaria Stoica

October 13, 2010 at 12:30 am

Build Request Query

with 6 comments

Many of the reports (End to End Times Report, Build Run Report, TryChooser Report, Average Time per Builder Report, Builder Report) use BuildRequests as constructing blocks. In this post I will describe how BuildRequests are fetched from Buildbot’s scheduler database.

First of all, scheduler database has the following schema:

Scheduler Database Schema

Scheduler Database Schema

The information about one BuildRequest is spread among at least 5 tables: builds, buildrequests, buildsets, sourcestamps, sourcestamp_changes and changes. Which means there is no other way to fetch the data we need other than creating a big JOIN/OUTERJOIN for the 5 above mentioned tables. This rather unfriendly query is necessary as a result of scheduler database’s design to work optimal with Buildbot’s internal mechanisms rather than our current query’s need.

The actual query, using SQLAlchemy, looks like this:

b = meta.scheduler_db_meta.tables[‘builds’]
br = meta.scheduler_db_meta.tables[‘buildrequests’]
bs = meta.scheduler_db_meta.tables[‘buildsets’]
s = meta.scheduler_db_meta.tables[‘sourcestamps’]
sch = meta.scheduler_db_meta.tables[‘sourcestamp_changes’]
c = meta.scheduler_db_meta.tables[‘changes’]

q = outerjoin(br, b, b.c.brid==br.c.id) \
.join(bs, bs.c.id==br.c.buildsetid) \
.join(s, s.c.id==bs.c.sourcestampid) \
.outerjoin(sch, sch.c.sourcestampid==s.c.id) \
.outerjoin(c, c.c.changeid==sch.c.changeid) \
.select().with_only_columns([…]) \
.group_by(br.c.id, b.c.id)

Query explained:

JOINS:

  1. OUTERJOIN (LEFT OUTER JOIN) between buildrequests and builds tables – OUTERJOIN is required because some of the BuildRequests might be PENDING or be CANCELLED (thus having no builds, i.e. entries in the builds table)
  2. JOIN with buildests table on buildsetsid column (bs.id = br.buildsetid) – we need to go through the buildesets table in order to link the BuildRequests to the sourcestamps and changes information
  3. JOIN with sourcestamps table – sourcestamps information
  4. OUTERJOIN with sourcestamp_changes on sourcestampsid column (s.id = sch.sourcestampid) – linking further along to changes. An OUTERJOIN was necessary instead of an INNER JOIN, because the nightly builds don’t have a revision number or any entries in the changes table
  5. OUTERJOIN with changes table on changeschangeid column (sch.changeid = c.changeid) – OUTERJOIN again needed in order to include all BuildRequests belonging to nightly builds (see JOIN 4. above)

GROUP BY:
A final group by buildrequests.id and builds.id columns (GROUP BY br.id, b.id) is needed to capture multiple builds for the same BuildRequest. One BuildRequest might have multiple builds (usually very few and at most 2 or 3), if the builds have been retriggered or forced build manually.

Selected table columns explained:

  • b.number
  • b.c.start_time
  • b.c.finish_time
  • br.c.id.label(‘brid’)
  • br.c.buildername
  • br.c.submitted_at
  • br.c.claimed_at
  • br.c.claimed_by_name
  • br.c.complete
  • br.c.complete_at
  • br.c.results
  • br.c.buildsetid
  • bs.c.reason
  • s.c.id.label(‘ssid’)
  • s.c.branch
  • s.c.revision
  • c.c.when_timestamp
  • c.c.author
  • c.c.comments
  • c.c.revlink
  • c.c.category
  • c.c.repository
  • c.c.project

BuildRequest statuses:

  • PENDING – the BuildRequest has not started yet / no Build Master claimed it yet:
NOT b.start_time AND NOT br.claimed_at AND NOT br.complete AND NOT br.complete_at AND NOT b.finish_time
  • RUNNING – the BuildRequest is running (a Build Master claimed the BuildRequest already), and has not finished yet:
b.start_time AND br.claimed_at AND NOT br.complete AND NOT br.complete_at AND NOT b.finish_time
  • COMPLETE – the BuildRequest was completed without any internal errors or external interruptions (i.e. not CANCELLED / INTERRUPTED):
b.start_time AND br.claimed_at AND br.complete AND br.complete_at AND b.finish_time
  • CANCELLED – the BuildRequest was cancelled (i.e. it never got to start):
NOT b.start_time AND NOT br.claimed_at AND br.complete AND br.complete_at AND NOT b.finish_time
  • INTERRUPTED – the build was interrupted (e.g. slave disconnected) and Buildbot retriggered the build:
b.start_time AND br.claimed_at AND br.complete AND br.complete_at AND NOT b.finish_time
  • MISC – should never happen

BuildRequest results (buildrequests.results):
This column specifies how the BuildRequest execusion went, if it is completed. Naturally, the PENDING and RUNNING ones will have NO_RESULT:

  • -1 (NULL) – NO_RESULT
  • 0 – SUCCESS
  • 1 – WARNINGS
  • 2 – FAILURE
  • 3 – SKIPPED
  • 4 – EXCEPTION
  • 5 – RETRY

BuildRequest reasons (buildsets.reason):
The reason of the build, might be the scheduler (normal case), the nightly sheduler, a rebuild or forced build:

  • scheduler
  • nightly, e.g. ‘The Nightly scheduler named ‘Linux x86-64 mozilla-central nightly’ triggered this build’
  • rebuild, e.g. ‘The web-page ‘rebuild’ button was pressed by ‘<unknown>’: redo for slave disconect (nthomas)’
  • force build, e.g. ‘The web-page ‘force build’ button was pressed by ‘jhford’: hg poller is busted’

BuildRequest wait time:
How much the BuildRequest waited from when the change was created or the time it was submitted (only for nightlies because they have no changes) until the build has started (was assigned to a free slave):

change_time := c.when_timestamp, if c.when_timestamp != NULL
:= br.submitted_at, otherwise
WAIT_TIME := b.start_time – change_time, if b.start_time != NULL AND change_time != NULL
:= 0, otherwise

BuildRequest duration:
How long from when the change was created  or the time it was submitted (only for nightlies because they have no changes) until the build was complete, whether if successful or not:

change_time := c.when_timestamp, if c.when_timestamp != NULL
:= br.submitted_at, otherwise
DURATION := br.complete_at – change_time, if br.complete_at != NULL AND change_time != NULL
:= 0, otherwise

BuildRequest run time:
The actual run time of the build:

RUN_TIME := DURATIONWAIT_TIME

See Also: Wait Times Query, Pushes Query

Written by Anamaria Stoica

October 4, 2010 at 10:06 pm