Principles behind IOI 2004 Competition Tasks and Their Grading ========== ====== === ==== =========== ===== === ===== ======= Author: ISC Date: 12 July 2004 Status: Approved by ISC This document presents some of the principles that have guided the selection of the IOI 2004 competition tasks and their grading criteria. Please read the whole document carefully. The introduction below explains why the ISC formulated these principles. The second section states the principles. The third section provides some motivation for these principles and discusses their intention. An example is presented in a separate section. The final section addresses the future of these principles. Introduction ------------ The GA has often expressed their concern about the relatively low IOI scores and relatively few contestants that fully solve tasks. This concern can also be inferred from the responses to the survey about the IOI 2002 competition. Statistics for the results of some recent IOIs support this impression. See the appendix for detailed statistics. IOI 1999 2000 2001 2002 2003 --- ---- ---- ---- ---- ---- lowest bronze 135 149 143 135 173.1 (score out of 600 max. for tasks) # hard tasks ? 2 3 5 5 (<10% of contestants score >= 90%) The IC has asked the ISC to address these concerns in the preparation of the IOI 2004 competition. Principles ---------- After extensive discussions during the ISC review meeting in Greece at the end of June 2004, the ISC came up with the following measures to address the issues mentioned above. 1. The competition tasks will be selected to cover a range of difficulty levels. 2. For each task, half of the test inputs used for grading the submitted programs will focus on "testing for correctness". These inputs will be based on "small" cases only. What is considered "small" will be stated explicitly in the task description. These constraints will be referred to as 50%-constraints in this document, to distinguish them from the regular constraints (or 100%-constraints) on inputs. 3. The other half of the test inputs will focus on "testing for efficiency". The "size" of these cases is chosen to distinguish the efficiency for a range of algorithm classes specific to that task. Motivation ---------- Concerning the low overall scores, the ISC identified two main causes: 1. The set of competition tasks is too hard for most of the contestants. 2. The credit for correct-but-inefficient programs is too small. The capabilities of the best contestants have increased considerably over the years. However, roughly 5% to 10% of the contestants seem to lack the skills for scoring points, no matter how easy it is. We do not see ways to change the latter. It is the responsibility of the delegations to prepare their contestants for the IOI competition. The first principle (range of difficulty levels) is intended to address the first cause. We explicitly avoid a selection of tasks that are (almost) all (very) hard. This follows the recommendation made in the ISC Guidelines for IOI Competitions, but which was not enforced sufficiently in the past. Some competition tasks should provide a challenge for the very best, so that they can be recognized. Other tasks are easier, so that less capable contestants can accomplish them. Tasks without algorithmic challenge (with a "trivial" 100%-solution, involving a standard text-book algorithm and some straightforward coding) will be avoided. This follows past practice. The second principle (50% for correctness-only) is intended to address the second cause. Under this principle, a correct-but-inefficient program can still obtain 50% of the full score. In the past, such programs would typically get no more than 10% to 20% of the full score. This principle acknowledges the effort that is required to develop a correctly functioning program, even if its efficiency leaves room for improvement. This principle is the most prominent change with respect to past IOIs. The choice of 50% rather than some other percentage is a compromise. Some have argued that it should be even higher (say 70%), some that it should be lower (say 30%). For simplicity in applying the principle, we have set a uniform percentage at 50% for IOI 2004. The third principle (50% for a __range__ of efficiencies) is intended to distinguish algorithms with sufficiently different performance characteristics. These tests must not only be focused on establishing that the program under test meets the 100%-constraints. It allows for less efficient programs to score additional points over 50% as well. This principle has been applied in the past, but we seek to apply it more rigorously now. The 50%-constraints will be stated explicitly in the task description. There are several reasons for doing so: * We do not want to have arguments over whether particular algorithms qualify as correct-but-inefficient. Instead, this is now defined as the ability to solve test inputs within the 50%-constraints. In the end, all that matters is whether the program can solve certain test inputs. The 50%-constraints characterize those test inputs. * It enables contestants to make an informed trade-off. * One can view the 50%-constraints as defining an easier version of the task. A contestant has really accomplished something by writing a program that correctly solves all these cases. This serves a pedagogical and psychological purpose. No further details are provided about the characteristics of the test inputs. In particular, the distribution of the test cases for distinguishing a range of efficiency levels is not explained in the task description. There are several reasons for this: * It is often too complicated to describe these details. * It might give too many hints on possible solutions to some contestants, and it might confuse other contestants. The handouts with background information provided after the competition will explain the choice of test data in more detail. For each task, the 50%- and 100%-constraints, and the distribution of test cases for efficiency have been carefully selected, by considering various alternative approaches to solving the task. Output-only tasks are treated separately when designing the input cases. For one thing, no program is submitted, so there are no evaluation runs. Secondly, because the inputs are not secret, the contestants can analyse the inputs during the contest and they know the exact characteristics. Example ------- As a concrete example, consider the following hypothetical competition task. The hypothetical competition machine can execute 10^9 instructions per second. TASK SORT An experiment generates a sequence of distinct integers. For further analysis, this sequence needs to be sorted from low to high. Write a program that inputs a sequence and outputs the corresponding sorted sequence. INPUT FORMAT The first line on standard input contains integer N, the length of the sequence. Each of the subsequent N input lines contains one integer of the sequence. OUTPUT FORMAT The sorted sequence is to be written to standard output. The output must consist of N lines, where each line contains one integer. The integers must appear in increasing order. EXAMPLE INPUT 3 1000000000 0 99 EXAMPLE OUTPUT 0 99 1000000000 CONSTRAINTS All inputs will satisfy the following constraints: * Length N of the sequence: 0 <= N <= 10^6 * Each integer i in the sequence: 0 <= i <= 10^9 * The sequence contains no duplicates. The time limit per run is 2 second. The limit on data memory per run is 5 MB. 50% of the inputs will satisfy the following ADDITIONAL constraints: * Length N of the sequence: N <= 10^3 END OF TASK SORT Note that the wording of this task is not what matters for this example. Also note that sorting does not satisfy the novelty requirement for IOI competition tasks. We have chosen it precisely because it is a well-known problem. Since IOI'94, it has been the custom in IOI competitions that quantitative constraints are imposed on the time and memory usage of the algorithms to be designed and implemented. The scientific committee selects these constraints to provide interesting yet reasonable algorithmic challenges, that can also be graded efficiently. ANALYSIS FOR TASK SORT Each input integer can be stored in 4 bytes. Hence, the maximum-length sequence can be stored in 4 MB. This still leaves 1 MB. The machine characteristics together with the task constraints imply that an O(N log N) in-situ sorting algorithm with low constant will do. It is not possible to store 2N integers (make a copy of the sequence) for large N. It is also clear that an O(N^2) in-situ sorting algorithm is too slow. The 50%-constraints were chosen such that an O(N^2) sorting algorithm with low constant will suffice. It could use more storage than needed for just the N input numbers (i.e., not in-situ). An O(N!) or worse algorithm (e.g. that enumerates all permutations of the input sequence and tests them for sortedness) will not pass all of the 50% tests for correctness. END OF ANALYSIS FOR TASK SORT A weaker contestant, who is unable to come up with an O(N log N) in-situ sorting algorithm, but who can design and implement an O(N^2) algorithm, can still obtain 50% of the full score for this task. However, the 50%-constraints are such that a O(N!) program will __not__ score 50%. This is a deliberate choice in designing the task. The 50%-constraints make explicit what still counts as a correct-but-inefficient program. If a program cannot solve the 50%-constrained inputs within the time and memory limits, then it is judged to be an unacceptable solution for this task. It was also a deliberate choice not to limit the range of integers further in the 50%-constraints. It is not the issue here whether this was a good choice, but it illustrates that there are various trade-offs to consider when selecting the 50%-constraints. They are by no means obvious. That is why it is important to make the 50%-constraints explicit in the task description. Conclusion ---------- These principles were formulated to guide the preparations for the IOI 2004 competition only. Their application to the preparation of future IOIs after 2004 will depend on feedback from the IOI community, on statistics and surveys for IOI 2004, and on further discussions. In the future, the ISC Guidelines for IOI Competitions can be updated to reflect the experience with these principles. Appendix STATISTICS ---------- For the past five IOIs, we list the lowest score to obtain a bronze, silver, and gold medal (the bronze cutoff is approximately the median score). The maximum score for all tasks together is 600 (this excludes any non-task bonuses). IOI 1999 2000 2001 2002 2003 --- ---- ---- ---- ---- ---- Lowest bronze medalist score 135 149 143 135 173.1 (approx. median score) Lowest silver medalist score 234 310 266 226 257.7 Lowest gold medalist score 340 450 378 296 350.9 Top score 480 600 580 510 455.4 If available, we also list for each task the number and percentage of contestants that scored 90% or more on that task. The threshold of 90% allows for a contestant to fail on 1 test case in 10 and still be counted as having "fully solved" the task, that is, solved it __modulo a small mistake__. The classification as Easy, Medium, Hard is based on the following thresholds (which first appeared in the IOI 2002 competition survey): EASY : > 40% of the contestants "fully solved" the task MEDIUM: >= 10% and <= 40% of the contestants "fully solved" the task HARD : < 10% of the contestants "fully solved" the task IOI 1999: Score per task per contestant no longer available. IOI 2000 with 274 ranked contestants |---palin---|--- car ---|--median---|---post ---|---walls---|---block--- >=90%| 80 (30%) | 31 (11%) | 27 (10%) | 56 (21%) | 79 (29%) | 11 (4%) | MEDIUM | MEDIUM | HARD | MEDIUM | MEDIUM | HARD IOI 2001 with 267 ranked contestants |--twofive--|--mobiles--|--ioiwari--|---depot---|---score---|---double-- >=90%| 1 (0%) | 3 (1%) | 79 (30%) | 18 (7%) | 39 (15%) | 54 (20%) | HARD | HARD | MEDIUM | HARD | MEDIUM | MEDIUM IOI 2002 with 275 ranked contestants Unable to obtain score per task per contestant; the following approximations are inferrred from diagrams in the IOI 2004 competition report (pp. 49, 81) |---frog ---|--utopia---|--- xor ---|---batch---|--- bus ---|---rods --- >=90%| 45 (16%) | 8 (3%) | 2 (1%) | 12 (4%) | 9 (3%) | 20 (7%) | MEDIUM | HARD | HARD | HARD | HARD | HARD IOI 2003 with 265 ranked contestants |-maintain--|---code ---|--reverse--|---guess---|--robots---|-boundary-- >=90%| 81 (31%) | 4 (2%) | 0 (0%) | 14 (5%) | 11 (4%) | 12 (5%) | MEDIUM | HARD | HARD | HARD | HARD | HARD END OF STATISTICS End of Document