Mozilla’s Build System
Mozilla’s Build System is a very cool distributed system run by Buildbot. The system automatically rebuilds and tests the tree every time something has changed.
The Build Infrastructure currently has around 1,000 machines grouped into 3 pools, each made up of several Build Masters and many Slaves:
- Build Pool (handles builds triggered by all changes, except those going to Try):
- 4 Build Masters
- ~300 Slaves
- Try Build Pool (handles Try builds):
- 1 Build Master
- ~200 Slaves
- Test Pool (handles all tests, including Try)
- 7 Test Masters
- ~400 Slaves
How it works
The hg poller looks for new changes in the hg.mozilla.org repository every few minutes. The changes are picked up by the Build Scheduler Master, which creates Build Requests, one for each of the supported platforms. The Build Requests go into the Scheduler Database as pending. The Build Masters look for pending Build Requests and take them on only if there are free Slaves to assign them to.
As the builds complete, the Build Master updates their statuses in the Scheduler Database. Also, the Test Scheduler Master creates Test Build Requests for the corresponding tests.
Next, the Test Build Requests are picked up by the Test Masters and assigns them to free Slaves. When the tests are complete, the Test Master updates back their statuses in the Scheduler Database.
Build Run Life Cycle
One push to mozilla-central, if successful, generates a total of 168 Build Requests (as of October 2010, but subject to change in the future), from which 10 are builds (one for each of the supported 10 platforms), 108 unittests and 50 talos tests. All these build requests make up a Build Run.
Each of the 10 platform builds comes with its own set of test requests. The tests are created only when the corresponding build completes, and only if successful. Which means that if there are failed builds, some of the tests won’t be created, and the Build Run won’t have 168 Build Requests, but less.
Two very important measures in a Build Runs’s life cycle are the Wait Time and End to End Time.
The Wait Time measures how long Build Requests wait in the queue before starting, more specific, it measures the time difference between the timestamp of the change that generated that Build Request and the timestamp of when that Build Request is assigned to a free slave. (see Build Run Life Cycle diagram above)
The End to End Time measures how long it takes for a Build Run to complete. That is, the time difference between the timestamp of the change that triggered this Build Run and the timestamp of when the last of the generated Build Requests ends (in other words, when all builds and tests are completed). (see Build Run Life Cycle diagram above)
The normal End to End Time for mozilla-central is a little under 4 hours, but greatly varies upwards with the system load.
The Great Wall of Mac minis
The builds are done on a mix of VMs, 1U servers, xserves and Mac minis, and all the testing is done on Mac minis.
The Great Wall of Mac minis is made up of a little over 400 of the Mac minis’ boxes, and is located by the Release Engineers’ desks in the Mountain View office. 😀