“Did that really just happen?” — a behind the scenes look at IOI 2013
Bernard Blackham
As you may know, I was primarily responsible for the technical aspects of the IOI 2013 competition. Whilst I am glad that it’s now over, it has been a truly fun and rewarding experience. Here, I hope to document (i.e. brain-dump) the ins and outs of running something of this magnitude. This has been written in the days after the IOI finished, whilst it was all fresh in my mind. Please forgive me if it is at times somewhat stream-of-consciousness. It is NOT any kind of official report, and I do not speak for anybody but myself. The collective “we” used below refers to not just me, but many people who did the real work behind the scenes.
I intend these notes to serve partly as a guide for future hosts, and also partly to give some insights to those who were at the IOI or spectating at home, and might be curious as to how it all happened.
One of the very first pieces of planning to take place, after settling on a venue, was the floor plan for the competition hall. The layout affected everything from power distribution and network topology to fire safety compliance. We toyed with the idea of “pods” of 4 machines but discarded this in favour of our final layout in rows. Pods would have significantly complicated the logistical requirements, increasing set up time, requiring more cabling, as well as having cable covers/ramps everywhere.
Once the overall plan was in place, even little changes had significant flow-on effects. For example, a late change was to place the laptops on the left-hand side of the desks rather than having them centred, which affected the lengths of the network cables required for each machine.
The plan.
Our initial plans provisioned for at most 341 machines, allowing for every member country of the IOI to send a team of four students. In practice, not every country sent teams this year, or sent fewer than four contestants, and some teams were unfortunately unable to obtain visas, so our real numbers were much lower. 315 contestants had registered to compete, but only 303 were at IOI 2013. The spare machines were there as contingency in case of failed machines, or turned into workers for judging in the back room.
Early on, we opted to try and source laptops instead of desktop PCs for IOI 2013. There were several benefits to this choice:
We had tried to source laptops with 15” screens, but eventually ended up with 13” screens. We offered larger external monitors to competitors who had a medical need for larger screens. We had requests for five external monitors at the IOI – much fewer than we had expected!
We were able to secure machines that were all identical, or at least we thought. They were all the same model of laptop (Dell Latitude E4300), except that we observed some odd differences between how they represented their hardware internally. Some machines presented different devices on their USB bus (e.g. the internal trackpad would show up on the USB bus under several different names). Some had different network ROM versions. All machines had the same CPUs (except a couple of odd ones which we took out of service). All machines had at least 2 GiB of memory – some had 4 GiB. We told Linux to limit the amount of memory to 2 GiB on the kernel command-line to provide a fair contest for everybody.
We ran memtest and badblocks on all machines to track down bad memory and disks, and removed the failing machines from our pool.
We had four physical machines serving the IOI competition hall. Each machine was a typical server-class machine (they varied in CPU, memory and disk capacity, but were all deemed “fast”). These machines each hosted virtual machines (VMs) which provided the actual services. Each physical machine ran a front-end (FE) VM and sometimes a back-end (BE) VM which were connected to different networks (via 802.1q tagging described under Network Setup below).
The FE VMs hosted everything visible to contestant machines. This included an nginx web front-end, DHCP, DNS, re-imaging, printer queues, and our custom heartbeat monitoring service. The BE VMs held all services which were not to be directly accessible by contestant machines – the database, CMS services, etc.
The rough division of services over the VMs was:
The domain www.ioi pointed at both fe3 and fe4 for load-balancing and failover.
We had a split-horizon DNS setup so that contestant machines only saw www.ioi and a few other domains, whereas all the servers and staff machines could see useful forward and reverse DNS names for everything else.
We were fortunate to be hosting IOI 2013 at the University of Queensland, as the area is on an extremely reliable power grid. Although power outages are very rare, we still decided to take some precautions. We had a 300 kVA generator installed to provide backup power to the competition venue. The generator was not running during the competition – in the event of losing grid power, there would be a few minutes of power loss to the competition hall while the generator fired up. The four servers were connected to two 3 kVA UPSes, which allowed them to survive a power outage. The laptops, by nature of being laptops, would also survive the outage. However, providing backup power to the network switches was going to be an expensive exercise. This would have required an extra UPS on every group of desks (around 25), as well as in the communications room where our core switches and routers lived. We chose not to do this, which meant that if power was lost, there would be a loss of connectivity from laptops to the servers for around 5 minutes. No students would lose work though, and no servers would be interrupted. Given how unlikely such a scenario was, we deemed this to be an acceptable risk.
The power outage scenario was tested a few days before the competition. Thanks to the meticulous planning and attention to detail of our electrical guys (led by Terry Cronk), everything worked precsisely as expected. We had Terry on standby during the competition days, ready to respond to any power issues if they arose (not that any did!).
Our network was one of the more complicated pieces of the puzzle. At the physical layer, we had one 24-port GigE switch on each group of desks, connecting up to 18 machines. Each of these switches was connected either to adjacent switches or directly to one of two core aggregation switches, in a ring topology which aimed to provide a level of redundancy against a link breaking, whilst also minimising the diameter of the network.
We had several distinct layer 2 networks. The important ones were:
There were also separate networks for printers, switch and VM host management interfaces, our own staff machines, and external access. Traffic between networks was routed through a Cisco ASR1000 router. Traffic to each physical server (VM host) used 802.1q tagging so that the host could present the FE and BE networks individually to the VMs.
The Test Bed.
The wonderful wizards at UQ’s University Networking had a test-bed network created for us, configured up and tested a few weeks before the IOI using some spare switches owned by the University. There were a couple of hurdles in getting multicast imaging working, but these were all resolved and our test setup worked beautifully. We could image two machines at the same time (baby steps)!
Our first successful multicast imaging session!
For the IOI itself, all our switches were provided to us on loan from our networking sponsor. When the configuration was transferred from our testbed setup to the real switches and routers, we began experiencing some odd behaviour. The real gear had slightly different models from what we had used in the test bed. After a lot of debugging with over several nights, and a very long weekend, we discovered that somehow traffic was being directed onto the wrong VLAN. (Felix was the first to figure this out!). We ruled out any misconfiguration issues. Our best guess was that it was an odd interaction between DHCP relaying, 802.1q tagging and the private VLAN feature (i.e. a software bug in the network switch). Felix stumbled upon a workaround accidentally, by discovering a sequence of commands to unwind some of the configuration on one particular switch (to remove 802.1q tagging), then re-enable it. This magically made things work in our test lab. There was no sane explanation for this, but given it was working, everybody was happy.
When we relocated the setup to the competition hall, we experienced the exact same issue, and the same sequence of commands bizarrely resolved it. It was then that an observant eye noticed some peculiar physical damage to one of the switches. The same observant eye noticed that the LEDs on the offending switch were dimmer than on the other switches, suggesting that something was causing the power rails on the switch to be marginal. Swapping it out with another switch made everything work as advertised. After that was resolved, the L1/L2 network infrastructure remained solid.
“Hardware fault”. I’m told this port is wired straight into the backplane.
One interesting aspect of our set up was the IP addressing scheme used and how it was achieved. Our goal was to be able to easily identify a machine's location from it's IP address, in order to aid debugging and maintenance.
We classified all machines as registered or unregistered. An unregistered machine, was given an IP address 10.10.x.y where x identified a specific switch (i.e. a group of desks), and y identified the switch port on that switch (x ≥ 100, 1 ≤ y ≤ 24). To achieve this we used a switch feature which intercepts DHCP requests and injects "option 82" which contains the switch name and port. A registered machine was given an IP address of 10.10.r.c where r and c identified the row and column in the competition hall (e.g. B4 = 10.10.2.4). The location information was collected as part of the setup procedure of each machine: when an unregistered machine booted, it would present a web interface asking for the asset ID, location and any notes about the machine, registering the machine. After this has been entered, the DHCP server would link the MAC address of the machine to the IP corresponding to its location.
All of this magic was managed by a set of scripts to auto-generate the DHCP and DNS server configuration (we used ISC dhcpd and BIND 9) from a database of information about hosts, switches and switch ports. One issue is that when a machine is registered, the DHCP server needed to revoke the old DHCP lease and issue a new one. DHCP doesn't have a revocation mechanism – once a lease is issued, it is assumed valid for its lifetime. To workaround this, we initially had DHCP lease times set very low (30 seconds). Unsurprisingly, this put an intense load on the DHCP server when faced with 326 machines simultaneously DHCPing (each lease renewal was also entered into a database for tracking via a python script). We ended up increasing the lease time to a few minutes, and forcing the DHCP server to forget about leases where possible, which while not strictly RFC-compliant, was good enough to get us set up. Once all machines had been registered and things stabilised, the lease time was dropped even further (to 12 hours). We had a set of scripts in place to forcefully expire leases from the server if we needed to replace or move a machine.
We switched from using network manager (Ubuntu's default networking subsystem) to ifupdown, as this was the easiest route to making sure that the user could not configure the network and set a static IP.
It was a fair amount of effort to get this all working smoothly, but it proved to be an invaluable timesaver when it came to configuring machines, identifying failures, and diagnosing issues in the heat of the contest.
One thing I really wanted to have ready for the IOI was a screen that could tell us at a glance what, if anything, was wrong with any of the 326 machines. There are lots of existing montoring solutions that do this for server farms, but we opted to write our own, tailored to just what we needed. The end result was pretty impressive. Thanks to Evgeny for this!
What else do you do with a big screen?
This screen shows for each machine the image version number of the machine (we made it to version 5 before the end of the week), and highlights any machines with problems we need to know about. A custom monitoring service ran on each laptop, phoning home every 30-60 seconds, informing of any issues with the machine. The intention was to check for things like:
Not all of these were implemented due to time constraints but we had enough to feel confident about all the machines.
All the laptops were configured to boot from the network using PXE. We had our volunteers manually configure every machine's BIOS to boot from network, and only later realised that our supplier could have pre-configured this for us at the factory!
A TFTP server (atftpd) provided a default configuration file instructing a machine to download a new image. Once a machine had been imaged, it would notify this fact to the monitoring service, which would then create a MAC-specific PXE configuration file for that machine, directing it to boot straight from the hard disk.
We based our machine images off a master image maintained as a virtual machine in VirtualBox. To prepare an image for distribution, we:
Our lz4-compressed image was around 2.1 GiB. We streamed it over multicast continuously using the udpcast tool. Through some trial and error, we found that we could reliably stream it to all machines at around 80 Mbit/s, which let us distribute the image to all machines in parallel within 3.5 minutes. Over direct connections or short links, we could achieve in excess of 300 Mbit/s, but when pushing packets over 4 network switches and several hundred metres of Cat-6 and some Cat-5e, this was not at all reliable. We also experimented with varying the level of forward-error correction (FEC) used by udpcast, but found that simply limiting the bitrate improved reliability significantly more than any FEC parameters could (using FEC also lowers the effective bitrate).
However, there is a bootstrapping issue with running udpcast. In order for a machine to receive an image using udpcast, it needs to be running Linux (or some other environment which can run udpcast). We used TinyCoreLinux (TCL), which is a stripped down Linux distribution that can fit in under 8 MiB. We stripped this down even further to 6 MiB, by removing some unnecessary kernel modules. However, distributing even a 6 MiB image to 326 machines in parallel still requires pushing nearly 2 GiB of traffic which can't be done in a hurry over unicast TFTP. So to boot the TCL image, we used atftpd's and GPXE's support for multicast TFTP. The relevant RFC contains the nitty-gritty details, but in essence, multicast TFTP allows for many clients to request the same file from a TFTP server but only have it sent once. This worked beautifully for 1 to 20 machines, but proved to be less reliable when 326 machines were involved (atftpd would crash or hang). We never quite figured that one out due to lack of time, but worked around it by just restarting atftpd and/or the affected machines when it happened. If anybody plans to try this again, we suggest to look at somehow load-testing atftpd in a similar scenario.
The only catch with multicast TFTP as provided by GPXE, is that GPXE does not support IGMP join requests. IGMP messages are the mechanism used by multicast-capable switches to know which switch ports to forward various multicast streams to, rather than distributing all streams to all ports. The udpcast tool did not suffer this problem, as Linux takes care of sending the relevant IGMP messages. In order to get multicast TFTP in GPXE to work, we had to configure the switches to force all ports into the relevant multicast group (using the cisco "static-group" command).
We encountered some issues with the reliability of GPXE on different machines – on certain machines, GPXE would not obtain a DHCP lease. We suspected this was due to having different BIOS versions and, in particular, different network ROM versions. Interestingly, forcing GPXE to use the UNDI driver (a ROM-provided interface for network communications) instead of GPXE's native ethernet driver for the chip (an E1000 variant) worked on all machines. This was somewhat counter-intuitive, as one would think that using the ROM-provided UNDI driver would be more susceptible to differences in ROM versions than bypassing it. Apparently not, but nothing surprises me these days.
The other issue with GPXE is that it was not compatible with some of the syslinux/pxelinux binaries we wanted to use for things like boot menus. So in order to support this, we kept PXELINUX as our first-stage boot loader, which would only load GPXE if required for multicast TFTP and machine imaging.
To sum it up, the boot sequence for re-imaging a machine looks like this:
Simple, eh?
We did investigate other imaging solutions such as FOG, however these seemed to be inherently limited to much fewer than 300-400 machines in some way, by introducing unnecessary bottlenecks. For example, FOG has all clients mount an NFS share when imaging, which limits the number of possible machines which can be imaged simultaneously to however many the NFS server can simultaneously support. (Specifically all traffic for all 326 machines happens in unison). Perhaps we didn’t investigate this closely enough and there may well have been existing solutions which could scale, but we felt safer creating this one which was only as complex as we needed, we were intimately familiar with and could easily customise.
Re-imaging was only one way in which we could mass-distribute software and files to all machines. We also needed a way to be able to copy files and mass-configure all 326 machines (e.g. to distribute updates, test data, etc). For this, Xi created a utility wrapping "parallel ssh” which we affectionately called "botnet" for short. We looked at other tools such as ClusterSSH, but these just did not scale to 326 machines (e.g. ClusterSSH launched 326 xterms – one for each machine, which Xi’s X server did not appreciate). Other solutions such as puppet seemed far too complex and over-engineered for what we wanted. All we wanted was to be able to:
Parallel ssh and a bit of python hacking gave us the utility we wanted, using an ssh public key on the root user of each laptop. The results for all machines were printed out on screen, highlighting any failures. We tried hard to make sure that all the commands issued were idempotent, so that if one machine happened to fail for any reason, the same command could just be re-issued.
One of the main user-visible improvements in this year’s IOI was that there were no longer passwords required to log in to the contest. The logistics of distributing and using passwords has been a troublesome aspect at past IOIs, and we thought life would be much simpler if students did not need them. Instead, we created a utility which would lock the screen (much like a screensaver), but which could be remotely controlled to activate or deactivate. The screen also showed the contestant's photograph, name, country and location, so that it was unmistakable as to which machine a student should be sitting at. The lock screen could be configured to either deactivate on schedule with the contest clock, or be remotely deactivated. During the practice session, we also allowed the lock screen to be killed by a user pressing the Escape key.
Behind the lock screen, the students would find themselves already logged into the contest software, CMS. CMS would identify the relevant user based on the IP address from which the requests came.
This year's IOI continued the experiment of being "paperless". This meant that by default students would not be given paper printouts of the tasks, but instead just view them electronically on screen. We made all translations of all tasks available to students both on their local hard disks and downloadable from the CMS system. In the event of any network or server failures, we wanted to ensure that students would have immediate access to the task statements regardless.
However, we still allowed students to print if they wish. In a way, this simply pushes the ordeal of printing translations from before the competition starts to after it starts. The theory is that fewer printouts would be required as not everyone would print, saving paper and the environment. In practice, there were 752 print jobs on the first day and 817 print jobs on the second day, totalling around 7000 pages. This suggests that, on average, less than 15-20% of the students did not print the tasks, or that many students printed lots of code.
Like 2012, we had people acting as runners to deliver printouts from the printers to students. In order to identify who printed a page, we created a CUPS filter that ran locally on the machine, which would inject an overlay onto every printed page with the student's name, country and location in the competition hall. During the practice competition, we had some non-technical issues with printouts being handed to the wrong table. This was because the chairs were offset, and it was easy to confuse one column of desks with another, and we had not trained our runners on how to identify the correct location. Of course, this is why we have practice competitions. For day 1 and 2, we briefed the printer runners on the layout, so they had a much better feel for the room. We heard of no such issues from day 1 and 2.
On the technical side, we had 16 printers, partitioned into four groups of four. There was a central CUPS print server which presented four "printer classes". Each printer class contained four printers. CUPS would select the first available printer within a class. These printer classes are what were visible to the contestants, not the individual printers, so if a printer should go offline for some reason, the other printers would instantly pick up the slack.
While the printers were generously provided by a sponsor, an issue we did not anticipate until very late in the piece is that of the 16 printers, they were of 14 different models (although all from the same manufacturer). This caused us some headaches in getting the correct drivers to print, and increased our overheads in testing and administration. It took us (Evgeny and Xi) most of 2-3 days to iron out the last of the driver issues before we had all printers printing as expected.
Despite all our testing, and ensuring that the location overlay worked on all printers and on all translations, we still had occasional occurrences of the overlay not appearing on some printouts. We also discovered that some applications were flat out unable to print reliably (e.g. emacs and CodeBlocks). We found that different applications with different toolkits fed print jobs to CUPS in different formats, which used different processing filters, resulting in either failed print jobs, garbage, or the missing location information. The ones we had tested with were all GTK-based, which I believe uses Cairo for printing.
Fixing printing from emacs and wxWidgets-based applications such as CodeBlocks would have made life easier for the students. We investigated this after the practice competition, but it proved to be too difficult to resolve, given that we didn't have the time or manpower for it. As such, we advised all leaders to urge their students to print PDFs from Evince (the default PDF viewer), and print code with gEdit (the default code editor).
July 1 we began installing tables and power cabling into the competition hall. A team of volunteers, led by Marie Boden and Marnie Lamprecht, got crawling on the ground to install cables and power boards, and tape them down neatly so that they would not be accidentally kicked out.
On July 2 the laptops arrived, along with the networking switches and cables. The laptops were set up by our volunteers again, along with external keyboards and mice. This took most of the day to finish, and some of the following day.
By the end of July 3, we had enough of the networking infrastructure in place to begin imaging the machines so that we could test the hardware. We ran memtest on all machines overnight, and discovered three machines with memory errors. These were pulled out of service. The following day we also ran badblocks on the disks, and found another few machines with bad disks which we pulled out of service also.
Memtest errors
While 90% of the machines passed all our tests with flying colours, a small handful continued to plague us for various reasons. These issues were mostly tracked down to faulty cables or network ports, older BIOS versions causing problems, or other unidentified issues. We ended up just swapping out any suspect machines, and by the morning of July 6 we had a full set of 326 tested, reliable machines in the competition hall, and nearly 40 set up as workers in the back room. We ended up with only 307 machines allocated to students, and so we used the spares as workers.
The discard pile
We only received our 400 machines five days before the practice competition, and the full setup and installation of machines was complete only two days before the practice competition. This is not an unusual timeline for something of an IOI scale (in fact, it is more breathing room than some past IOIs have had!). It meant that we needed to be ready, have the tools and knowledge to be able to fix things quickly and adapt fast.
Our setup schedule had two days of slack for dealing with unexpected issues, and we ended up using nearly all of that slack, due to the faulty network switch, hardware issues with laptops, and other last-minute diversions which arose.
The practice competition was on July 7, and apart from some issues with printing, as described, we didn't encounter any major technical problems. We collected up all the various keyboards and mascots, of which there were a lot more than we were expecting! Most people came to deliver them to us at the very end of the competition, leading to a very long queue. We processed all of these in good time, and let students head to the opening ceremony via lunch.
Keyboards, Mascots, and Friends
Between the practice day and first competition day on July 8, we re-imaged all the machines, and came up with a random mapping of machines to students which looked reasonable to the human eye.
Translations took place on the evening of July 7, and continued through until the last countries finished in the wee hours of the morning on July 8 (as usual). Another of our HSC, Will Pettersson, has written some notes about the translation experience.
Before the start of day 1, the first hiccup we had not anticipated was that the Art Class task had a number of high-resolution photographs in the problem statement. These appeared in every translation too, meaning that the set of all translations, along with the sample data, was nearly 300 MiB (compared with less than 40 MiB for all translations of the other two tasks combined). We were only equipped to distribute these the statements and data over unicast, not multicast, requiring 100 GiB of data to be pushed out as fast as possible. "As fast as possible" turned out to be not so fast – even at 200 Mbit/s over our gigabit network, it took well over an hour, and led to the delays at the beginning of day 1.
Once the tasks were distributed and the competition began, it soon became clear that the judging farm of worker machines was not coping with the evaluation of the day 1 tasks – we were seeing workers go offline and exceptions being thrown within CMS. This was eventually tracked down to long evaluation times on the task Wombats which had a very long time limit with dozens of test cases, and the task Dreaming, which although its time limit was shorter, had hundreds of test cases. Dreaming was a problem which could have been easily avoided, because many test cases appeared across multiple subtasks, but these were evaluated multiple times. These oversights slipped through the cracks due a miscommunication within the HSC between people generating the test data and people setting up CMS and the judging farm, and weren’t noticed until it was too late.
While the long judging times would have just slowed evaluation down (a lot!), the effects on the competition were exacerbated by several further issues. The first is that CMS included a timeout of 10 minutes for workers – i.e., if a worker took more than 10 minutes to judge a submission, the worker would be marked as inactive by CMS and removed from the judging farm. The submission in question would then be re-issued to another worker, soon taking it down too 10 minutes later. Eventually, the entire judging farm was disabled. This took us some time, and the help of the CMS developers, to diagnose and narrow down to the 10 minute timeout (a huge thank you must go to Stefano Maggiolo for figuring this one out remotely!). At the time, we had many different hypotheses, but testing each one required a 10 minute turnaround (which we didn’t yet know was 10 minutes, so when we thought we had solved it and walked away before 10 minutes, the problem would return shortly after).
However, this was not the only issue. A second problem was that at least one of our worker machines developed a network fault, and became intermittently unresponsive. The communications subsystem of CMS did not cope well with the particular failure mode we encountered of a worker simply going missing from the network – it can handle a crashed worker process where RST packets are returned by the kernel, but not when it simply locks up or stops responding. The result is that CMS’s EvaluationService (ES) blocked trying to contact the worker (for what looked like about 2 minutes), which prevented new submissions from being issued to workers for judging. We managed to identify and pull out the faulty workers from the pool manually.
Finally, the procedure for reliably restarting ES when workers are working (and potentially working for a very long time) needed some special attention, due to the interactions between busy workers and ES. The procedure we settled on was to stop ES, stop all workers, start all workers, and then start ES. Without this procedure, ES and the workers would start out of sync. We also needed a patch to the isolate sandbox to ensure that candidate executables were correctly cleaned up if a worker was stopped.
By the time we had resolved all of these issues, we were simply unable to catch up with the evaluation of submissions that were coming in. It still took us several hours after the competition ended to get CMS into a state where it could judge the long-running submissions, and by 7:30pm, all submissions were evaluated and scored.
During beta testing of the tasks, the time limit for Wombats was in fact 15 seconds, which would have had all evaluations run in at most 10 minutes. The results of beta testing prompted us to raise the time limit to 20 seconds, but we never saw the 10 minute timeout issue in our tests.
Between day 1 and day 2, we made modifications to CMS to (1) ensure that it could reliably handle workers disappearing and re-connecting; (2) survive long evalution times; and (3) implement a fair round-robin queuing system between contestants so that if the queues were full, each contestants would be eventually guaranteed a turn, rather than only always judging the oldest submissions first. We also had the HSC working through the night to redo the data for all day 2 tasks to reduce their evaluation times. Between it all, CMS and the judging farm held up perfectly during competition day 2!
On day 2, the contest start was delayed for a different reason. During the presentation of the Game task to the GA, some major objections were raised. The backup task was presented to the GA to replace it (the HSC had gone to great lengths to prepare all backup tasks to the same standard as the real tasks, so they were all ready to go!). Sadly, the backup task had an even more major objection which prevented it being used – the same underlying idea was used in another recent contest. After much discussion, it was decided to salvage the Game task, adapting the bounds to overcome the objections, and actually increasing the difficulty of the task. This resulted in the HSC, ISC, and some team leaders and deputies working all through the night to write new solutions, re-do the test data, and redesign the subtasks. Ensuring this was done correctly was the reason behind the delays at the start of day 2.
The updated bounds in Game were finalised very late, and required pushing a new task statement PDF to the machines with updated subtask constraints. Unfortunately, due to a human error and lack of sleep, the updated task statement went to the task subfolder instead of overwriting the original PDF on the desktop, meaning that students saw the old and new versions of the task. We rectified this as soon as we realised, making a verbal and written announcement, and printing the relevant page from the correct PDF with the updated bounds. The mistake could have been noticed sooner by having more people checking what was being done, but at that point in time there was much to do and not enough eyes.
The final issue arose 15 minutes before the end of the competition, when an error in the test data was identified for one of the subtasks of Robots (by a contestant). The issue was that one particular test case did not comply with the constraints of the subtask which it was in, although it satisfied the global problem constraints. The HSC had written thorough sanity checkers to validate the test data for correctness, including checking each test case against per-subtask and global constraints. However, after a late-night change in test data, the person who ran the sanity checker didn’t know how to make it check per-subtask constraints as well as global constraints, and the faulty test case slipped through. A manual check of the subtask constraints could have spotted this too, rather than blindly relying on our automated checks.
During the competition, we wanted to find out in a hurry who had been affected by the faulty subtask, but CMS could not do this quickly. Using the new “task versions” feature of CMS, we were able to clone the dataset (which itself takes a few minutes once there are thousands of submissions), remove the offending test case from the subtask in question, and re-score (but not re-evaluate) all submissions. However, rescoring all submissions in CMS takes a very long time – it proceeded at the rate of around 8 submissions per second. This scoring bottleneck is something we need to investigate in CMS. It should be possible to do this much faster, which would have allowed us to identify the affected students within minutes. As it was, we only managed to identify them 10 minutes after the contest had ended, and by then it was too late to consider giving them extra time.
That wrapped up the competition. We took a contest dump of the state of the competition after the contest clock ended and all submissions had evaluated.
For appeals, we made the test data available for local testing, and re-opened both days’ contests so that contestants could submit any code. We also removed the test-case randomisation which we had implemented in CMS to prevent students from reverse-engineering the test data. This also required re-scoring all submissions for the contest, which at the break-neck speed of 8 submissions per second, took nearly half an hour.
I haven’t talked much about the live scoreboard yet (CMS’s RankingWebServer, or RWS). We had a few delays in getting it online, mostly because we were swamped with the other issues prior to the start of both competition days. When it did finally go live, we had foolishly forgotten to change the default username and password (which Luca Wehrstedt noticed and told us – thank you!). Of course, on day 1 the live scoreboard was not so live, due to the delays in judging.
One more embarrasing glitch is that we had a test user left in there which should have been hidden. There is a “hidden” flag you can set on a user in CMS, but this also stops their score from being computed at all, which is why we hadn’t set it. There needs to be a feature in CMS where scores are computed but not sent to the scoreboard.
Similarly, it would have been very useful to be able to mark competitors as “unofficial”, so that they could appear on the scoreboard but not be ranked. We received a number of comments that the live scoreboard was deceptive because it included the unofficial second team from Australia.
We initially couldn’t get the scoreboard running behind the nginx proxy either – the live updates wouldn’t work. We switched to opening the scoreboard port to the world and redirecting to that. (Later, we realised that the CMS documentation made it abundantly clear how to do it with nginx – feeling somewhat sheepish about that).
When we did have the world looking at the scoreboard, connecting directly to RWS, we quickly hit the default maximum per-process file descriptor limit on Linux of 1024. Changing this with `ulimit -n` prior to starting RWS fixed the issue – we raised it to 100000 which seemed adequate (apparently “unlimited” is not an option), but I never saw what it actually peaked at. I don’t know either what would have happened if nginx was actually reverse proxying for RWS, whether this change would have still been required. Presumably, yes, but this needs to be tested.
At 5:00pm after day 2, immediately after appeals ended, we had to close up and start tearing down the competition hall. The venue had to be cleared out by the following morning for the university to begin setting up for graduations. What we thought would take 7-8 hours to deconstruct, actually only took about 3 hours. Our team of volunteers, again led by Marie Boden and Marnie Lamprecht, worked like a machine in perfect unison to pack everything up. By 8pm, the competition hall was empty of laptops, network cables and power cables. All that remained were the servers humming in the corner.
Shutting it all down
There has been an incredible amount of effort put into this year’s IOI, from many people. We have accumulated a lot of code, patches and experience and if we were to do it again (hah!), we would undoutedly benefit greatly from all the experiences of IOI 2013.
Despite the couple of issues we experienced with CMS, it worked solidly in every other respect. CMS has many components, and is actually a pleasure to work on. I am very grateful to the IOI 2012 team for having developed a quality competition environment that can scale to something the size of an IOI, and doing it openly so that everybody can benefit from and contribute to the code, and share in the experience and lessons learnt. I will be working with the other CMS developers to integrate our changes upstream to make sure future contests do not run into the same problems as us.
Through dicussions with other CMS developers, we intend to write a check-list or knowledgebase of information for running large scale contests. I hope this will appear in the next release of CMS.
I also plan to clean up and release some of the other software we wrote for the IOI, such as the machine imaging scripts, screen locker, and the monitoring daemon.
If you are planning on running a future large-scale contest (like say, an IOI), the most important advice I can give is to make sure you know your software inside out. I would not hesitate to use CMS again, but I would urge anybody who does to have some familiarity with its inner workings. This doesn’t just apply to CMS, but any critical software used. Many of the utilities we used were written by us in the lead up to IOI 2013. We were intimately familiar with how they worked, and could diagnose issues on the fly quickly. Of course, I don’t mean to say write everything from scratch – we used many existing tools such as parallel ssh and udpcast. There’s a careful balance between reusing existing tools and writing custom code. We also had people familiar with the workings of CMS to be able to diagnose most things (except the 10 minute timeout which Stefano figured out for us).
There ended the ride that was the IOI 2013 competition. I hope that this run-down has been informative, insightful or educational. Although there were a few hiccups, I feel there was a remarkable amount that went right, given the phenomenal amount of effort involved. The competition tasks themselves were challenging and interesting, and several of the tasks were marked final at their first version, allowing leaders to start translating immediately. Nearly all of the technical setup worked flawlessly, due in no small part to the efforts of everybody involved.
To this end, I would like to thank the many people who have dedicated so much of their time and energy into making this all happen. To the technical team on the HSC, who worked relentlessly until everything was done, persevering through illness and sleep deprivation: Evgeny Martynov, Xi Chen, Will Pettersson and Niel van der Westhuizen. To the HSC, ISC, and other members of the GA, who worked through the night(s): Ben Burton, David Greenaway, Christopher Chen, John Dethridge, Jackson Gatenby, Giovanni Paolini, Kazuhiro Hosaka, Michal Forišek, Mehdi Bouaziz, and others. To the CMS team who manned the CMS hotline during the contests: Stefano Maggiolo, Giovanni Mascellani and Luca Wehrstedt. To Gary Stefano of UQ ITS, with his uncanny ability to see into the future, who secured the machines and printers for us, and ensured that the logistics behind setup ran as smoothly as possible. To Karl Blakeney, Leslie Elliot and others at the School of Maths & Physics, for setting up and supporting the labs for translations, providing us with several of the servers powering the IOI, and happily obliging to our endless requests. To our resident power guru at UQ, Terry Cronk, and our networking gurus: Sisir Roy, Felix Li, Michael Rawle and Tim O'Donohoe. Together, they provided us with a rock-solid infrastructure and were on standby nearly around the clock, ready to respond to any issues. To the staff at the UQ Centre who were there with us through the late nights and early morning starts, providing us with anything we needed: Roz Bannan, Brad Molloy and Raymond Fong. To Marie Boden, Marnie Lamprecht, Judith Helgers, Simon Victory, Betsy Alpert, and the entire team of ITEE volunteers who got down and dirty crawling on the floor to install cabling, unpack and pack laptops, clean keyboards, clean tables, and did many many laps of the 326 machines in the competition hall. To the super-human Andree “Princess of Everything” Phillips for all of her selfless support in the most demanding of times, while still running the rest of IOI 2013 on as little or less sleep than us! To all our volunteer staff in the exam room fielding questions, running from end to end, and helping throughout the competition. To all the people who have expressed their gratitude and appreciation for all we had done – your words could not have come at a better time. And of course to our sponsors who provided us with equipment: Dell, Cisco, Energex, GGFLAN, Gibztech and Ricoh. And to anybody else I’ve forgotten to mention, I hope you know that your efforts were hugely appreciated, and all helped to make the IOI 2013 competition an incredible experience.
Thank you!
Not dead, just sleeping.
(HSC tech crew: Xi Chen, Evgeny Martynov)
“Damage” – our work-harder mascot, from Jarrah Lacko and Team Australia(s)