TheElo[a]rating systemis a method for calculating the relative skill levels of players inzero-sum gamessuch aschessoresports.It is named after its creatorArpad Elo,a Hungarian-American physics professor.
![](https://upload.wikimedia.org/wikipedia/en/thumb/f/f0/ArpadElo.jpg/220px-ArpadElo.jpg)
The Elo system was invented as an improvedchess-rating systemover the previously usedHarkness system,[1]but is also used as a rating system inassociation football (soccer),American football,baseball,basketball,pool,variousboard gamesandesports,and, more recently,large language models.
The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.[2]
A player's Elo rating is a number that may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, then only a few rating points will be taken from the lower-rated player. However, if the lower-rated player scores anupset win,many rating points will be transferred. The lower-rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. Players whose ratings are too low or too high should, in the long run, do better or worse correspondingly than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength.
Elo ratings are comparative only, and are valid only within the rating pool in which they were calculated, rather than being an absolute measure of a player's strength.
While Elo-like systems are widely used in two-player settings, variations have also been applied to multiplayer competitions.[3]
History
editArpad Elowas achess masterand an active participant in theUnited States Chess Federation(USCF) from its founding in 1939.[4]The USCF used a numerical ratings system devised byKenneth Harknessto enable members to track their individual progress in terms other than tournament wins and losses. The Harkness system was reasonably fair, but in some circumstances gave rise to ratings many observers considered inaccurate.
On behalf of the USCF, Elo devised a new system with a more sound[clarification needed]statisticalbasis.[5]At about the same time, György Karoly and Roger Cook independently developed a system based on the same principles for the New South Wales Chess Association.[6]
Elo's system replaced earlier systems of competitive rewards with a system based on statistical estimation. Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements. For example, winning an importantgolftournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament.
A statistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player.
Elo's central assumption was that the chess performance of each player in each game is anormally distributedrandom variable.Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of the performances of any given player changes only slowly over time. Elo thought of a player's true skill as the mean of that player's performance random variable.
A further assumption is necessary because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and derive a number to represent that player's skill. Performance can only be inferred from wins, draws, and losses. Therefore, a player who wins a game is assumed to have performed at a higher level than the opponent for that game. Conversely, a losing player is assumed to have performed at a lower level. If the game ends in a draw, the two players are assumed to have performed at nearly the same level.
Elo did not specify exactly how close two performances ought to be to result in a draw as opposed to a win or loss. Actually, there is a probability of a draw that is dependent on the performance differential, so this latter is more of a confidence interval than any deterministic frontier. And while he thought it was likely that players might have differentstandard deviationsto their performances, he made a simplifying assumption to the contrary.
To simplify computation even further, Elo proposed a straightforward method of estimating the variables in his model (i.e., the true skill of each player). One could calculate relatively easily from tables how many games players would be expected to win based on comparisons of their ratings to those of their opponents. The ratings of a player who won more games than expected would be adjusted upward, while those of a player who won fewer than expected would be adjusted downward. Moreover, that adjustment was to be in linear proportion to the number of wins by which the player had exceeded or fallen short of their expected number.[7]
From a modern perspective, Elo's simplifying assumptions are not necessary because computing power is inexpensive and widely available. Several people, most notablyMark Glickman,have proposed using more sophisticated statistical machinery to estimate the same variables. On the other hand, the computational simplicity of the Elo system has proven to be one of its greatest assets. With the aid of a pocket calculator, an informed chess competitor can calculate to within one point what their next officially published rating will be, which helps promote a perception that the ratings are fair.
Implementing Elo's scheme
editThe USCF implemented Elo's suggestions in 1960,[8]and the system quickly gained recognition as being both fairer and more accurate than theHarkness rating system.Elo's system was adopted by theWorld Chess Federation(FIDE) in 1970.[9]Elo described his work in detail inThe Rating of Chessplayers, Past and Present,first published in 1978.[10]
Subsequent statistical tests have suggested that chess performance is almost certainly not distributed as anormal distribution,as weaker players have greater winning chances than Elo's model predicts.[11][12]In paired comparison data, there is often very little practical difference in whether it is assumed that the differences in players' strengths are normally orlogisticallydistributed. Mathematically, however, the logistic function is more convenient to work with than the normal distribution.[13] FIDE continues to use the rating difference table as proposed by Elo.[14]: table 8.1b
The development of the Percentage Expectancy Table (table 2.11) is described in more detail by Elo as follows:[15]
The normal probabilities may be taken directly from the standard tables of the areas under the normal curve when the difference in rating is expressed as a z score. Since the standard deviation σ of individual performances is defined as 200 points, the standard deviation σ' of the differences in performances becomes σ√2 or 282.84. The z value of a difference then isD/ 282.84.This will then divide the area under the curve into two parts, the larger giving P for the higher rated player and the smaller giving P for the lower rated player.
For example, letD= 160.Thenz= 160 / 282.84 =.566.The table gives.7143and.2857as the areas of the two portions under the curve. These probabilities are rounded to two figures in table 2.11.
The table is actually built with standard deviation200(10/7)as an approximation for200√2.[citation needed]
The normal and logistic distributions are, in a way, arbitrary points in a spectrum of distributions which would work well. In practice, both of these distributions work very well for a number of different games.[citation needed]
Different ratings systems
editThe phrase "Elo rating" is often used to mean a player's chess rating as calculated by FIDE. However, this usage may be confusing or misleading because Elo's general ideas have been adopted by many organizations, including the USCF (before FIDE), many other national chess federations, the short-livedProfessional Chess Association(PCA), and online chess servers including theInternet Chess Club(ICC),Free Internet Chess Server(FICS),Lichess,Chess,andYahoo!Games. Each organization has a unique implementation, and none of them follows Elo's original suggestions precisely.
Instead one may refer to the organization granting the rating. For example: "As of April 2018,Tatev Abrahamyanhad a FIDE rating of 2366 and a USCF rating of 2473. "The Elo ratings of these various organizations are not always directly comparable, since Elo ratings measure the results within a closed pool of players rather than absolute skill.
FIDE ratings
editFor top players, the most important rating is theirFIDErating. FIDE has issued the following lists:
- From 1971 to 1980, one list a year was issued.
- From 1981 to 2000, two lists a year were issued, in January and July.
- From July 2000 to July 2009, four lists a year were issued, at the start of January, April, July and October.
- From July 2009 to July 2012, six lists a year were issued, at the start of January, March, May, July, September and November.
- Since July 2012, the list has been updated monthly.
The following analysis of the July 2015 FIDE rating list gives a rough impression of what a given FIDE rating means in terms of world ranking:
- 5,323 players had an active rating in the range 2200 to 2299, which is usually associated with theCandidate Mastertitle.
- 2,869 players had an active rating in the range 2300 to 2399, which is usually associated with theFIDE Mastertitle.
- 1,420 players had an active rating between 2400 and 2499, most of whom had either theInternational Masteror theInternational Grandmastertitle.
- 542 players had an active rating between 2500 and 2599, most of whom had theInternational Grandmastertitle.
- 187 players had an active rating between 2600 and 2699, all of whom had theInternational Grandmastertitle.
- 40 players had an active rating between 2700 and 2799.
- 4 players had an active rating of over 2800. (Magnus Carlsenwas rated 2853, and 3 players were rated between 2814 and 2816).
The highest ever FIDE rating was 2882, whichMagnus Carlsenhad on the May 2014 list. A list of the highest-rated players ever is atComparison of top chess players throughout history.
Performance rating
edit1.00 | +800 |
0.99 | +677 |
0.9 | +366 |
0.8 | +240 |
0.7 | +149 |
0.6 | +72 |
0.5 | 0 |
0.4 | −72 |
0.3 | −149 |
0.2 | −240 |
0.1 | −366 |
0.01 | −677 |
0.00 | −800 |
Performance rating or special rating is a hypothetical rating that would result from the games of a single event only. Some chess organizations[16]: p. 8 use the "algorithm of 400" to calculate performance rating. According to this algorithm, performance rating for an event is calculated in the following way:
- For each win, add your opponent's rating plus 400,
- For each loss, add your opponent's rating minus 400,
- And divide this sum by the number of played games.
Example: 2 wins (opponentsw&x), 2 losses (opponentsy&z)
This can be expressed by the following formula:
Example: If you beat a player with an Elo rating of 1000,
If you beat two players with Elo ratings of 1000,
If you draw,
This is a simplification, but it offers an easy way to get an estimate of PR (performance rating).
FIDE,however, calculates performance rating by means of the formulawhere "rating difference"is based on a player's tournament percentage score,which is then used as the key in a lookup table whereis simply the number of points scored divided by the number of games played. Note that, in case of a perfect or no scoreis 800.
Live ratings
editFIDEupdates its ratings list at the beginning of each month. In contrast, the unofficial "Live ratings" calculate the change in players' ratings after every game. These Live ratings are based on the previously published FIDE ratings, so a player's Live rating is intended to correspond to what the FIDE rating would be if FIDE were to issue a new list that day.
Although Live ratings are unofficial, interest arose in Live ratings in August/September 2008 when five different players took the "Live" No. 1 ranking.[17]
The unofficial live ratings of players over 2700 were published and maintained by Hans Arild Runde atthe Live Rating websiteuntil August 2011. Another website,2700chess,has been maintained since May 2011 byArtiom Tsepotan,which covers the top 100 players as well as the top 50 female players.
Rating changes can be calculated manually by using the FIDE ratings change calculator.[18]All top players have a K-factor of 10, which means that the maximum ratings change from a single game is a little less than 10 points.
United States Chess Federation ratings
editTheUnited States Chess Federation(USCF) uses its own classification of players:[19]
- 2400 and above: Senior Master
- 2200–2399: National Master
- 2200–2399 plus 300 games above 2200: Original Life Master[20]
- 2000–2199: Expert or Candidate Master
- 1800–1999: Class A
- 1600–1799: Class B
- 1400–1599: Class C
- 1200–1399: Class D
- 1000–1199: Class E
- 800–999: Class F
- 600–799: Class G
- 400–599: Class H
- 200–399: Class I
- 100–199: Class J
The K-factor used by the USCF
editTheK-factor,in the USCF rating system, can be estimated by dividing 800 by the effective number of games a player's rating is based on (Ne) plus the number of games the player completed in a tournament (m).[21]
Rating floors
editThe USCF maintains an absolute rating floor of 100 for all ratings. Thus, no member can have a rating below 100, no matter their performance at USCF-sanctioned events. However, players can have higher individual absolute rating floors, calculated using the following formula:
whereis the number of rated games won,is the number of rated games drawn, andis the number of events in which the player completed three or more rated games.
Higher rating floors exist for experienced players who have achieved significant ratings. Such higher rating floors exist, starting at ratings of 1200 in 100-point increments up to 2100 (1200, 1300, 1400,..., 2100). A rating floor is calculated by taking the player's peak established rating, subtracting 200 points, and then rounding down to the nearest rating floor. For example, a player who has reached a peak rating of 1464 would have a rating floor of1464 − 200 = 1264,which would be rounded down to 1200. Under this scheme, only Class C players and above are capable of having a higher rating floor than their absolute player rating. All other players would have a floor of at most 150.
There are two ways to achieve higher rating floors other than under the standard scheme presented above. If a player has achieved the rating of Original Life Master, their rating floor is set at 2200. The achievement of this title is unique in that no other recognized USCF title will result in a new floor. For players with ratings below 2000, winning a cash prize of $2,000 or more raises that player's rating floor to the closest 100-point level that would have disqualified the player for participation in the tournament. For example, if a player won $4,000 in a 1750-and-under tournament, they would now have a rating floor of 1800.
Theory
editPairwise comparisonsform the basis of the Elo rating methodology.[22]Elo made references to the papers of Good,[23]David,[24]Trawinski and David,[25]and Buhlman and Huber.[26]
Mathematical details
editPerformance is not measured absolutely; it is inferred from wins, losses, and draws against other players. Players' ratings depend on the ratings of their opponents and the results scored against them. The difference in rating between two players determines an estimate for the expected score between them. Both the average and the spread of ratings can be arbitrarily chosen. The USCF initially aimed for an average club player to have a rating of 1500 and Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has anexpected scoreof approximately 0.75.
A player'sexpected scoreis their probability of winning plus half their probability of drawing. Thus, an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead, a draw is considered half a win and half a loss. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player's current ratings as follows.
If playerAhas a rating ofand playerBa rating of,the exact formula (using thelogistic curvewithbase 10)[27]for the expected score of playerAis
Similarly, the expected score for playerBis
This could also be expressed by
and
whereandNote that in the latter case, the same denominator applies to both expressions, and it is plain thatThis means that by studying only the numerators, we find out that the expected score for playerAistimes the expected score for playerB.It then follows that for each 400 rating points of advantage over the opponent, the expected score is magnified ten times in comparison to the opponent's expected score.
When a player's actual tournament scores exceed their expected scores, the Elo system takes this as evidence that player's rating is too low, and needs to be adjusted upward. Similarly, when a player's actual tournament scores fall short of their expected scores, that player's rating is adjusted downward. Elo's original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player over-performed or under-performed their expected score. The maximum possible adjustment per game, called the K-factor, was set atfor masters andfor weaker players.
Suppose playerA(again with rating) was expected to scorepoints but actually scoredpoints. The formula for updating that player's rating is
This update can be performed after each game or each tournament, or after any suitable rating period.
An example may help to clarify:
Suppose playerAhas a rating of 1613 and plays in a five-round tournament. They lose to a player rated 1609, draw with a player rated 1477, defeat a player rated 1388, defeat a player rated 1586, and lose to a player rated 1720. The player's actual score is(0 + 0.5 + 1 + 1 + 0) = 2.5.The expected score, calculated according to the formula above, was(0.51 + 0.69 + 0.79 + 0.54 + 0.35) = 2.88.
Therefore, the player's new rating is[1613 + 32·(2.5 − 2.88)] = 1601,assuming that aK-factor of 32 is used. Equivalently, each game the player can be said to have put an ante ofKtimes their expected score for the game into a pot, the opposing player does likewise, and the winner collects the full pot of valueK;in the event of a draw, the playerssplit the potand receivepoints each.
Note that while two wins, two losses, and one draw may seem like a par score, it is worse than expected for playerAbecause their opponents were lower rated on average. Therefore, playerAis slightly penalized. If playerAhad scored two wins, one loss, and two draws, for a total score of three points, that would have been slightly better than expected, and the player's new rating would have been[1613 + 32·(3 − 2.88)] = 1617.
This updating procedure is at the core of the ratings used byFIDE,USCF,Yahoo! Games,theInternet Chess Club(ICC) and theFree Internet Chess Server(FICS). However, each organization has taken a different approach to dealing with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to dealing with the problem of ratings inflation/deflation. New players are assigned provisional ratings, which are adjusted more drastically than established ratings.
The principles used in these rating systems can be used for rating other competitions—for instance, internationalfootballmatches.
Elo ratings have also been applied to games without the possibility ofdraws,and to games in which the result can also have a quantity (small/big margin) in addition to the quality (win/loss). SeeGo rating with Elofor more.
Suggested modification
editIn 2011 after analyzing 1.5 million FIDE rated games,Jeff Sonasdemonstrated according to the Elo formula, two players having a rating difference ofXactually have a true difference of aroundX(5/6).Likewise, one can leave the rating difference alone and divide by 480 instead of 400. Since the Elo formula is overestimating the stronger player's win probability, stronger players are losing points against weaker players despite playing at their true strength. Likewise, weaker players gain points against stronger players. When the modification is applied, observed win rates deviate by less than 0.1% away from prediction, while traditional Elo can be 4% off the predicted rate.[28]
Most accurate distribution model
editThe first mathematical concern addressed by the USCF was the use of thenormal distribution.They found that this did not accurately represent the actual results achieved, particularly by the lower rated players. Instead they switched to alogistic distributionmodel, which the USCF found provided a better fit for the actual results achieved.[29][citation needed]FIDE also uses an approximation to the logistic distribution.[14]
Most accurate K-factor
editThe second major concern is the correct "K-factor "used. The chess statisticianJeff Sonasbelieves that the originalvalue (for players rated above 2400) is inaccurate in Elo's work. If theK-factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. And if the K-value is too low, the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player's actual level of performance.
Elo's originalK-factor estimation was made without the benefit of huge databases and statistical evidence. Sonas indicates that aK-factor of 24 (for players rated above 2400) may be both more accurate as a predictive tool of future performance and be more sensitive to performance.[30]
Certain Internet chess sites seem to avoid a three-level K-factor staggering based on rating range. For example, the ICC seems to adopt a globalK= 32except when playing against provisionally rated players.
The USCF (which makes use of alogistic distributionas opposed to anormal distribution) formerly staggered the K-factor according to three main rating ranges:
K-factor Used for players with ratings... below 2100 between 2100 and 2400 above 2400
Currently, the USCF uses a formula that calculates theK-factor based on factors including the number of games played and the player's rating. The K-factor is also reduced for high rated players if the event has shorter time controls.[16]
FIDE uses the following ranges:[31]
K-factor Used for players with ratings... for a player new to the rating list until the completion of events with a total of 30 games, and for all players until their 18th birthday, as long as their rating remains under 2300. for players who have always been rated under 2400. for players with any published rating of at least 2400 and at least 30 games played in previous events. Thereafter it remains permanently at 10.
FIDE used the following ranges before July 2014:[32]
K-factor Used for players with ratings...
(was 25)for a player new to the rating list until the completion of events with a total of 30 games.[33] for players who have always been rated under 2400. for players with any published rating of at least 2400 and at least 30 games played in previous events. Thereafter it remains permanently at 10.
The gradation of theK-factor reduces rating change at the top end of the rating range, reducing the possibility for rapid rise or fall of rating for those with a rating high enough to reach a lowK-factor.
In theory, this might apply equally to online chess players and over-the-board players, since it is more difficult for all players to raise their rating after their rating has become high and theirK-factor consequently reduced. However, when playing online, 2800+ players can more easily raise their rating by simply selecting opponents with high ratings – on the ICC playing site, agrandmastermay play a string of different opponents who are all rated over 2700.[34]In over-the-board events, it would only be in very high level all-play-all events that a player would be able to engage that number of 2700+ opponents. In a normal, open, Swiss-paired chess tournament, frequently there would be many opponents rated less than 2500, reducing the ratings gains possible from a single contest for a high-rated player.
Formal derivation for win/loss games
editThe above expressions can be now formally derived by exploiting the link between the Elo rating and the stochastic gradient update in the logistic regression.[35][36]
If we assume that the game results arebinary,that is, only a win or a loss can be observed, the problem can be addressed vialogistic regression,where the games results aredependent variables,the players' ratings areindependent variables,and the model relating both is probabilistic: the probability of the playerwinning the game is modeled as
where
denotes the difference of the players' ratings, and we use a scaling factor,and, bylaw of total probability
Thelog lossis then calculated as
and, using thestochastic gradient descentthe log loss is minimized as follows:
- ,
- .
whereis the adaptation step.
Since,,and,the adaptation is then written as follows
which may be compactly written as
whereis the new adaptation step which absorbsand,ifwins andifwins, and the expected score is given by.
Analogously, the update for the ratingis
- .
Formal derivation for win/draw/loss games
editSince the very beginning, the Elo rating has been also used in chess where we observe wins, losses or draws and, to deal with the latter a fractional score value,,is introduced. We note, however, that the scoresandare merely indicators to the events when the playerwins or loses the game. It is, therefore, not immediately clear what is the meaning of the fractional score. Moreover, since we do not specify explicitly the model relating the rating valuesandto the probability of the game outcome, we cannot say what the probability of the win, the loss, or the draw is.
To address these difficulties, and to derive the Elo rating in the ternary games, we will define the explicit probabilistic model of the outcomes. Next, we will minimize the log loss via stochastic gradient.
Since the loss, the draw, and the win areordinal variables,we should adopt the model which takes their ordinal nature into account, and we use the so-called adjacent categories model which may be traced to the Davidson's work[37]
where
andis a parameter. Introduction of a free parameter should not be surprising as we have three possible outcomes and thus, an additional degree of freedom should appear in the model. In particular, withwe recover the model underlying the logistic regression
where.
Using the ordinal model defined above, thelog lossis now calculated as
which may be compactly written as
whereiffwins,iffwins, andiffdraws.
As before, we need the derivative ofwhich is given by
- ,
where
Thus, the derivative of the log loss with respect to the ratingis given by
where we used the relationshipsand.
Then, the stochastic gradient descent applied to minimize the log loss yields the following update for the rating
whereand.Of course,ifwins,ifdraws, andifloses. To recognize the origin in the model proposed by Davidson, this update is called an Elo-Davidson rating.[36]
The update foris derived in the same manner as
- ,
where.
We note that
and thus, we obtain the rating update may be written as
- ,
whereand we obtained practically the same equation as in the Elo rating except that the expected score is given byinstead of.
Of course, as noted above, for,we haveand thus, the Elo-Davidson rating is exactly the same as the Elo rating. However, this is of no help to understand the case when the draws are observed (we cannot usewhich would mean that the probability of draw is null). On the other hand, if we use,we have
which means that, using,the Elo-Davidson rating is exactly the same as the Elo rating.[36]
Practical issues
editGame activity versus protecting one's rating
editIn some cases the rating system can discourage game activity for players who wish to protect their rating.[38]In order to discourage players from sitting on a high rating, a 2012 proposal by British GrandmasterJohn Nunnfor choosing qualifiers to the chess world championship included an activity bonus, to be combined with the rating.[39]
Beyond the chess world, concerns over players avoiding competitive play to protect their ratings causedWizards of the Coastto abandon the Elo system forMagic: the Gatheringtournaments in favour of a system of their own devising called "Planeswalker Points".[40][41]
Selective pairing
editA more subtle issue is related to pairing. When players can choose their own opponents, they can choose opponents with minimal risk of losing, and maximum reward for winning. Particular examples of players rated 2800+ choosing opponents with minimal risk and maximum possibility of rating gain include: choosing opponents that they know they can beat with a certain strategy; choosing opponents that they think are overrated; or avoiding playing strong players who are rated several hundred points below them, but may hold chess titles such as IM or GM. In the category of choosing overrated opponents, new entrants to the rating system who have played fewer than 50 games are in theory a convenient target as they may be overrated in their provisional rating. The ICC compensates for this issue by assigning a lower K-factor to the established player if they do win against a new rating entrant. The K-factor is actually a function of the number of rated games played by the new entrant.
Therefore, Elo ratings online still provide a useful mechanism for providing a rating based on the opponent's rating. Its overall credibility, however, needs to be seen in the context of at least the above two major issues described—engine abuse, and selective pairing of opponents.
The ICC has also recently[when?]introduced "auto-pairing" ratings which are based on random pairings, but with each win in a row ensuring a statistically much harder opponent who has also won x games in a row. With potentially hundreds of players involved, this creates some of the challenges of a major large Swiss event which is being fiercely contested, with round winners meeting round winners. This approach to pairing certainly maximizes the rating risk of the higher-rated participants, who may face very stiff opposition from players below 3000, for example. This is a separate rating in itself, and is under "1-minute" and "5-minute" rating categories. Maximum ratings achieved over 2500 are exceptionally rare.
Ratings inflation and deflation
editThe term "inflation", applied to ratings, is meant to suggest that the level of playing strength demonstrated by the rated player is decreasing over time; conversely, "deflation" suggests that the level is advancing. For example, if there is inflation, a modern rating of 2500 means less than a historical rating of 2500, while the reverse is true if there is deflation. Using ratings to compare players between different eras is made more difficult when inflation or deflation are present. (See alsoComparison of top chess players throughout history.)
Analyzing FIDE rating lists over time, Jeff Sonas suggests that inflation may have taken place since about 1985.[42]Sonas looks at the highest-rated players, rather than all rated players, and acknowledges that the changes in the distribution of ratings could have been caused by an increase of the standard of play at the highest levels, but looks for other causes as well.
The number of people with ratings over 2700 has increased. Around 1979 there was only one active player (Anatoly Karpov) with a rating this high. In 1992Viswanathan Anandwas only the 8th player in chess history to reach the 2700 mark at that point of time.[43]This increased to 15 players by 1994. 33 players had a 2700+ rating in 2009 and 44 as of September 2012. Only 14 players have ever broken a rating of 2800.
One possible cause for this inflation was the rating floor, which for a long time was at 2200, and if a player dropped below this they were struck from the rating list. As a consequence, players at a skill level just below the floor would only be on the rating list if they were overrated, and this would cause them to feed points into the rating pool.[42]In July 2000 the average rating of the top 100 was 2644. By July 2012 it had increased to 2703.[43]
Using a strongchess engineto evaluate moves played in games between rated players, Regan and Haworth analyze sets of games from FIDE-rated tournaments, and draw the conclusion that there had been little or no inflation from 1976 to 2009.[44]
In a pure Elo system, each game ends in an equal transaction of rating points. If the winner gains N rating points, the loser will drop by N rating points. This prevents points from entering or leaving the system when games are played and rated. However, players tend to enter the system as novices with a low rating and retire from the system as experienced players with a high rating. Therefore, in the long run a system with strictly equal transactions tends to result in rating deflation.[45]
In 1995, the USCF acknowledged that several young scholastic players were improving faster than the rating system was able to track. As a result, established players with stable ratings started to lose rating points to the young and underrated players. Several of the older established players were frustrated over what they considered an unfair rating decline, and some even quit chess over it.[46]
Combating deflation
editBecause of the significant difference in timing of when inflation and deflation occur, and in order to combat deflation, most implementations of Elo ratings have a mechanism for injecting points into the system in order to maintain relative ratings over time. FIDE has two inflationary mechanisms. First, performances below a "ratings floor" are not tracked, so a player with true skill below the floor can only be unrated or overrated, never correctly rated. Second, established and higher-rated players have a lower K-factor. New players have aK= 40,which drops toK= 20after 30 played games, and toK= 10when the player reaches 2400.[31] The current system in the United States includes a bonus point scheme which feeds rating points into the system in order to track improving players, and different K-values for different players.[46]Some methods, used in Norway for example, differentiate between juniors and seniors, and use a larger K-factor for the young players, even boosting the rating progress by 100% for when they score well above their predicted performance.[47]
Rating floors in the United States work by guaranteeing that a player will never drop below a certain limit. This also combats deflation, but the chairman of the USCF Ratings Committee has been critical of this method because it does not feed the extra points to the improving players. A possible motive for these rating floors is to combat sandbagging, i.e., deliberate lowering of ratings to be eligible for lower rating class sections and prizes.[46]
Ratings of computers
editHuman–computer chess matchesbetween 1997 (Deep Blue versus Garry Kasparov) and 2006 demonstrated thatchess computersare capable of defeating even the strongest human players. However,chess engineratings are difficult to quantify, due to variable factors such as the time control and the hardware the program runs on, and also the fact that chess is not a fair game. The existence and magnitude of thefirst-move advantage in chessbecomes very important at the computer level. Beyond some skill threshold, an engine with White should be able to force a draw on demand from the starting position even against perfect play, simply because White begins with too big an advantage to lose compared to the small magnitude of the errors it is likely to make. Consequently, such an engine is more or less guaranteed to score at least 25% even against perfect play. Differences in skill beyond a certain point could only be picked up if one does not begin from the usual starting position, but instead chooses a starting position that is only barely not lost for one side. Because of these factors, ratings depend on pairings and the openings selected.[48]Published engine rating lists such asCCRLare based on engine-only games on standard hardware configurations and are not directly comparable to FIDE ratings.
For some ratings estimates, seeChess engine § Ratings.
Use outside of chess
editOther board and card games
edit- Go:TheEuropean Go Federationadopted an Elo-based rating system initially pioneered by the Czech Go Federation.
- Backgammon:The popular First Internet Backgammon Server (FIBS) calculates ratings based on a modified Elo system. New players are assigned a rating of 1500, with the best humans and bots rating over 2000. The same formula has been adopted by several other backgammon sites, such asPlay65,DailyGammon,GoldTokenandVogClub.VogClub sets a new player's rating at 1600. The UK Backgammon Federation uses the FIBS formula for its UK national ratings.[49]
- Scrabble:National Scrabble organizations compute normally distributed Elo ratings except in theUnited Kingdom,where a different system is used. TheNorth American Scrabble Players Associationhas the largest rated population of active members, numbering about 2,000 as of early 2011.Lexulousalso uses the Elo system.
- Despite questions of the appropriateness of using the Elo system to rate games in which luck is a factor, trading-card game manufacturers often use Elo ratings for their organized play efforts. TheDCI(formerly Duelists' Convocation International) used Elo ratings for tournaments ofMagic: The Gatheringand otherWizards of the Coastgames. However, the DCI abandoned this system in 2012 in favor of a new cumulative system of "Planeswalker Points", chiefly because of the above-noted concern that Elo encourages highly rated players to avoid playing to "protect their rating".[40][41]Pokémon USAuses the Elo system to rank its TCG organized play competitors.[50]Prizes for the top players in various regions included holidays and world championships invites until the 2011–2012 season, where awards were based on a system of Championship Points, their rationale being the same as the DCI's forMagic: The Gathering.Similarly,Decipher, Inc.used the Elo system for its ranked games such asStar Trek Customizable Card GameandStar Wars Customizable Card Game.
Athletic sports
editThe Elo rating system is used in the chess portion ofchess bo xing.In order to be eligible for professional chess bo xing, one must have an Elo rating of at least 1600, as well as competing in 50 or more matches of amateur bo xing or martial arts.
American college footballused the Elo method as a portion of itsBowl Championship Seriesrating systems from1998to2013after which the BCS was replaced by theCollege Football Playoff.Jeff SagarinofUSA Todaypublishes team rankings for most American sports, which includes Elo system ratings for college football. The use of rating systems was effectively scrapped with the creation of theCollege Football Playoffin 2014.
In other sports, individuals maintain rankings based on the Elo algorithm. These are usually unofficial, not endorsed by the sport's governing body. TheWorld Football Elo Ratingsis an example of the method applied to men'sfootball.[51]In 2006, Elo ratings were adapted forMajor League Baseballteams byNate Silver,then ofBaseball Prospectus.[52]Based on this adaptation, both also made Elo-basedMonte Carlosimulations of the odds of whether teams will make the playoffs.[53]In 2014, Beyond the Box Score, anSB Nationsite, introduced an Elo ranking system for international baseball.[54]
In tennis, the Elo-based Universal Tennis Rating (UTR) rates players on a global scale, regardless of age, gender, or nationality. It is the official rating system of major organizations such as theIntercollegiate Tennis AssociationandWorld TeamTennisand is frequently used in segments on theTennis Channel.The algorithm analyzes more than 8 million match results from over 800,000 tennis players worldwide. On May 8, 2018,Rafael Nadal—having won 46 consecutive sets in clay court matches—had a near-perfect clay UTR of 16.42.[55]
Inpool,an Elo-based system called Fargo Rate is used to rank players in organized amateur and professional competitions.[56]
One of the few Elo-based rankings endorsed by a sport's governing body is theFIFA Women's World Rankings,based on a simplified version of the Elo algorithm, whichFIFAuses as its official ranking system for national teams inwomen's football.
From the first ranking list after the2018 FIFA World Cup,FIFA has used Elo for theirFIFA World Rankings.[57]
In 2015, Nate Silver, editor-in-chief of the statistical commentary websiteFiveThirtyEight,and Reuben Fischer-Baum produced Elo ratings for everyNational Basketball Associationteam and season through the 2014 season.[58][59]In 2014 FiveThirtyEight created Elo-based ratings and win-projections for the American professionalNational Football League.[60]
The EnglishKorfballAssociation rated teams based on Elo ratings, to determine handicaps for their cup competition for the 2011/12 season.
An Elo-based ranking ofNational Hockey Leagueplayers has been developed.[61]The hockey-Elo metric evaluates a player's overall two-way play: scoring AND defense in both even strength and power-play/penalty-kill situations.
Rugbyleagueratings uses the Elo rating system to rank international and clubrugby leagueteams.
Hemaratings was started in 2017 and uses a Glicko-2 algorithm to rank individualHistorical European martial artsfencers worldwide in different categories such asLongsword,Rapier,historicalSabreand Sword &Buckler.[62]
Video games and online games
editMany video games use modified Elo systems in competitive gameplay. TheMOBAgameLeague of Legendsused an Elo rating system prior to the second season of competitive play.[63]TheEsportsgameOverwatch,the basis of the uniqueOverwatch Leagueprofessionalsports organization,uses a derivative of the Elo system to rank competitive players with various adjustments made between competitive seasons.[64]World of Warcraftalso previously used theGlicko-2system to team up and compare Arena players, but now uses a system similar to Microsoft'sTrueSkill.[65]The gamePuzzle Piratesuses the Elo rating system to determine the standings in the various puzzles. This system is also used in FIFA Mobile for the Division Rivals modes. Another recent game to start using the Elo rating system isAirMech,using Elo[66]ratings for 1v1, 2v2, and 3v3 random/team matchmaking.RuneScape 3used the Elo system in the rerelease of the bounty hunter minigame in 2016.[67]Mechwarrior Onlineinstituted an Elo system for its new "Comp Queue" mode, effective with the Jun 20, 2017 patch.[68]Age of Empires II DEandAge of Empires III DEare using the Elo system for their Leaderboard and matchmaking, with new players starting at Elo 1000.[69]CompetitiveClassic Tetris(Tetrisplayed on theNintendo Entertainment System) derives its ratings using a combination of players'personal best scoresand a highly modified Elo system.[70]
Few video games use the original Elo rating system. According toLichess,an online chess server, the Elo system is outdated, with Glicko-2 now being used by many chess organizations.[71]PlayerUnknown’s Battlegroundsis one of the few video games that utilizes the very first Elo system. InGuild Wars,Elo ratings are used to record guild rating gained and lost through guild-versus-guild battles. In 1998, an online gaming ladder calledClanbase[72]was launched, which used the Elo scoring system to rank teams. The initial K-value was 30, but was changed to 5 in January 2007, then changed to 15 in July 2009.[73]The site later went offline in 2013.[74]A similar alternative site was launched in 2016 under the nameScrimbase,[75]which also used the Elo scoring system for ranking teams. Since 2005,Golden Tee Livehas rated players based on the Elo system. New players start at 2100, with top players rating over 3000.[76]
Despite many video games using different systems formatchmaking,it is common for players of ranked video games to refer to all matchmaking ratings asElo.
Other usage
editThe Elo rating system has been used insoft biometrics,[77]which concerns the identification of individuals using human descriptions. Comparative descriptions were utilized alongside the Elo rating system to provide robust and discriminative 'relative measurements', permitting accurate identification.
The Elo rating system has also been used in biology for assessing male dominance hierarchies,[78]and in automation and computer vision forfabric inspection.[79]
Moreover,online judgesites are also using Elo rating system or its derivatives. For example,Topcoderis using a modified version based on normal distribution,[80]whileCodeforcesis using another version based on logistic distribution.[81][82][83]
The Elo rating system has also been noted in dating apps, such as in the matchmaking appTinder,which uses a variant of the Elo rating system.[84]
The YouTuberMarques Brownleeand his team used Elo rating system when they let people to vote between digital photos taken with differentsmartphonemodels launched in 2022.[85]
The Elo rating system has also been used inU.S. revealed preference college rankings,such as those by the digital credential firm Parchment.[86][87][88]
The Elo rating system has also been adopted to evaluate AI models. In 2021, Anthropic utilized the Elo system for ranking AI models in their research.[89]The LMSYS leaderboard briefly employed the Elo rating system to rank AI models[90]before transitioning toBradley–Terry model.[91]
References in the media
editThe Elo rating system was featured prominently in the 2010 filmThe Social Networkduring the algorithm scene whereMark ZuckerbergreleasedFacemash.In the sceneEduardo Saverinwrites mathematical formulas for the Elo rating system on Zuckerberg's dormitory room window. Behind the scenes, the movie claims, the Elo system is employed to rank girls by their attractiveness. The equations driving the algorithm are shown briefly, written on the window;[92]however, they are slightly incorrect.[citation needed]
See also
edit- Elo hell
- Rating percentage index(RPI), another system that incorporates strength of opponents
Notes
editReferences
editNotes
edit- ^abElo, Arpad E. (August 1967)."The Proposed USCF Rating System, Its Development, Theory, and Applications"(PDF).Chess Life.XXII(8):242–247.
- ^Using the formula100% / (1 + 10−D/400)forDequal to 100 or 200.
- ^Elo-MMR: A Rating System for Massive Multiplayer Competitions
- ^Redman, Tim (July 2002)."Remembering Richard, Part II"(PDF).Illinois Chess Bulletin.Archived(PDF)from the original on 2020-06-30.Retrieved2020-06-30.
- ^Elo, Arpad E. (March 5, 1960)."The USCF Rating System"(PDF).Chess Life.XIV(13).USCF:2.
- ^Elo 1986, p. 4
- ^Elo, Arpad E. (June 1961)."The USCF Rating System - A Scientific Achievement"(PDF).Chess Life.XVI(6).USCF:160–161.
- ^"About the USCF".United States Chess Federation.Archivedfrom the original on 2008-09-26.Retrieved2008-11-10.
- ^Elo 1986, Preface to the First Edition
- ^Elo 1986.
- ^Elo 1986, ch. 8.73.
- ^Glickman, Mark E., and Jones, Albyn C.,"Rating the chess rating system"(1999), Chance, 12, 2, 21-28.
- ^Glickman, Mark E. (1995),"A Comprehensive Guide to Chess Ratings". A subsequent version of this paper appeared in theAmerican Chess Journal,3, pp. 59–102.
- ^abFIDE Rating Regulations effective from 1 July 2017.FIDE Online (fide )(Report).FIDE.Archivedfrom the original on 2019-11-27.Retrieved2017-09-09.
- ^Elo 1986, p159.
- ^abThe US Chess Rating system(PDF)(Report). April 24, 2017.Archived(PDF)from the original on 7 February 2020.Retrieved16 February2020– via glicko.net.
- ^Anand lost No. 1 to Morozevich (Chessbase, August 24 2008Archived2008-09-10 at theWayback Machine), then regained it, then Carlsen took No. 1 (Chessbase, September 5 2008Archived2012-11-09 at theWayback Machine), then Ivanchuk (Chessbase, September 11 2008Archived2008-09-13 at theWayback Machine), and finally Topalov (Chessbase, September 13 2008Archived2008-09-15 at theWayback Machine)
- ^Administrator."FIDE Chess Rating calculators: Chess Rating change calculator".ratings.fide.Archivedfrom the original on 2017-09-28.Retrieved2017-09-28.
- ^US Chess FederationArchived2012-06-18 at theWayback Machine
- ^USCF Glossary Quote: "a player who competes in over 300 games with a rating over 2200"Archived2013-03-08 at theWayback Machinefrom The United States Chess Federation
- ^"Approximating Formulas for the US Chess Rating System"Archived2019-11-04 at theWayback Machine,United States Chess Federation,Mark Glickman, April 2017
- ^Elo 1986, ch. 1.12.
- ^Good, I.J. (1955). "On the Marking of Chessplayers".The Mathematical Gazette.39(330):292–296.doi:10.2307/3608567.JSTOR3608567.S2CID158885108.
- ^David, H. A. (1959). "Tournaments and Paired Comparisons".Biometrika.46(1/2):139–149.doi:10.2307/2332816.JSTOR2332816.
- ^Trawinski, B.J.; David, H.A. (1963)."Selection of the Best Treatment in a Paired-Comparison Experiment".Annals of Mathematical Statistics.34(1):75–91.doi:10.1214/aoms/1177704243.
- ^Buhlmann, Hans; Huber, Peter J. (1963)."Pairwise Comparison and Ranking in Tournaments".The Annals of Mathematical Statistics.34(2):501–510.doi:10.1214/aoms/1177704161.
- ^Elo 1986, p. 141, ch. 8.4& Logistic probability as a rating basis
- ^"The Elo rating system – correcting the expectancy tables".30 March 2011.
- ^Elo 1986, ch. 8.73
- ^A key Sonas article isSonas, Jeff."The Sonas rating formula — better than Elo?".chessbase.Archivedfrom the original on 2005-03-05.Retrieved2005-05-01.
- ^abFIDE Rating Regulations effective from 1 July 2014.FIDE Online (fide )(Report).FIDE.2014-07-01.Archivedfrom the original on 2014-07-01.Retrieved2014-07-01.
- ^FIDE Rating Regulations valid from 1 July 2013 till 1 July 2014.FIDE Online (fide )(Report). 2013-07-01.Archivedfrom the original on 2014-07-15.Retrieved2014-07-01.
- ^"Changes to Rating Regulations".FIDE Online (fide )(Press release).FIDE.2011-07-21. Archived fromthe originalon 2012-05-13.Retrieved2012-02-19.
- ^"K-factor ".Chessclub.ICC Help. 2002-10-18. Archived fromthe originalon 2012-03-13.Retrieved2012-02-19.
- ^Kiraly, F.; Qian, Z. (2017). "Modelling Competitive Sports: Bradley-Terry-Elo Models for Supervised and On-Line Learning of Paired Competition Outcomes".arXiv:1701.08055[stat.ML].
- ^abcSzczecinski, Leszek; Djebbi, Aymen (2020-09-01)."Understanding draws in Elo rating algorithm".Journal of Quantitative Analysis in Sports.16(3):211–220.doi:10.1515/jqas-2019-0102.ISSN1559-0410.S2CID219784913.
- ^Davidson, Roger R. (1970)."On Extending the Bradley-Terry Model to Accommodate Ties in Paired Comparison Experiments".Journal of the American Statistical Association.65(329):317–328.doi:10.2307/2283595.ISSN0162-1459.JSTOR2283595.
- ^A Parent's Guide to ChessArchived2008-05-28 at theWayback MachineSkittles,Don Heisman, Chesscafe, August 4, 2002
- ^"Chess News – The Nunn Plan for the World Chess Championship".ChessBase. 8 June 2005.Archivedfrom the original on 2011-11-19.Retrieved2012-02-19.
- ^ab"Introducing Planeswalker Points".September 6, 2011. Archived fromthe originalon September 30, 2011.RetrievedSeptember 9,2011.
- ^ab"Getting to the Points".September 9, 2011.Archivedfrom the original on October 18, 2016.RetrievedSeptember 9,2011.
- ^abJeff Sonas (27 July 2009)."Rating inflation – its causes and possible cures".chessbase.Archivedfrom the original on 23 November 2013.Retrieved27 August2009.
- ^ab"Viswanathan Anand".Chessgames.Archivedfrom the original on 2013-03-28.Retrieved2012-08-14.
- ^Regan, Kenneth; Haworth, Guy (2011-08-04)."Intrinsic Chess Ratings".Proceedings of the AAAI Conference on Artificial Intelligence.25(1):834–839.doi:10.1609/aaai.v25i1.7951.ISSN2374-3468.S2CID15489049.Archivedfrom the original on 2021-04-20.Retrieved2021-09-01.
- ^Bergersen, Per A."ELO-SYSTEMET"(in Norwegian). Norwegian Chess Federation. Archived fromthe originalon 8 March 2013.Retrieved21 October2013.
- ^abcA conversation with Mark Glickman[1]Archived2011-08-07 at theWayback Machine,Published inChess LifeOctober 2006 issue
- ^"Elo-systemet".Norges Sjakkforbund.Archived fromthe originalon December 5, 2013.Retrieved2009-08-23.
- ^Larry Kaufman, Chess Board Options (2021), p. 179
- ^"Backgammon Ratings Explained".results.ukbgf.Archived fromthe originalon 2019-11-14.Retrieved2020-06-01.
- ^"Play! Pokémon Glossary: Elo".Archivedfrom the original on January 15, 2015.RetrievedJanuary 15,2015.
- ^Lyons, Keith (10 June 2014)."What are the World Football Elo Ratings?".The Conversation.Archivedfrom the original on 15 June 2019.Retrieved3 July2019.
- ^Silver, Nate(2006-06-28)."Lies, Damned Lies: We are Elo?".Archived fromthe originalon 2006-08-22.Retrieved2023-01-13.
- ^"Postseason Odds, ELO version".Baseballprospectus.Archivedfrom the original on 2012-03-07.Retrieved2012-02-19.
- ^Cole, Bryan (August 15, 2014)."Elo rankings for international baseball".Beyond the Box Score.SB Nation.Archivedfrom the original on 2 January 2016.Retrieved4 November2015.
- ^"Is Rafa the GOAT of Clay?".8 May 2018.Archivedfrom the original on 27 February 2021.Retrieved22 August2018.
- ^"Fargo Rate".Retrieved31 March2022.
- ^"Revision of the FIFA/Coca-Cola World Ranking"(PDF).FIFA. June 2018. Archived fromthe original(PDF)on 2018-06-12.Retrieved2020-06-30.
- ^Silver, Nate; Fischer-Baum, Reuben (May 21, 2015)."How We Calculate NBA Elo Ratings".FiveThirtyEight.Archived fromthe originalon 2015-05-23.
- ^Reuben Fischer-Baum and Nate Silver, "The Complete History of the NBA,"FiveThirtyEight,May 21, 2015.[2]Archived2015-05-23 at theWayback Machine
- ^Silver, Nate (September 4, 2014)."Introducing NFL Elo Ratings".FiveThirtyEight. Archived fromthe originalon September 12, 2015.Paine, Neil (September 10, 2015)."NFL Elo Ratings Are Back".FiveThirtyEight. Archived fromthe originalon September 11, 2015..
- ^"Hockey Stats Revolution – How do teams pick players?".Hockey Stats Revolution.Archivedfrom the original on 2016-10-02.Retrieved2016-09-29.
- ^"About the Ratings - Hema Ratings".Hemaratings.Retrieved2024-01-30.
- ^"Matchmaking | LoL – League of Legends".Na.leagueoflegends. 2010-07-06.Archivedfrom the original on 2012-02-26.Retrieved2012-02-19.
- ^"Welcome to Season 8 of competitive play".PlayOverwatch.Blizzard Entertainment.Archivedfrom the original on 12 March 2018.Retrieved11 March2018.
- ^"World of Warcraft Europe -> The Arena".Wow-europe. 2011-12-14. Archived fromthe originalon 2010-09-23.Retrieved2012-02-19.
- ^"AirMech developer explains why they use Elo".Archivedfrom the original on February 17, 2015.RetrievedJanuary 15,2015.
- ^[3][dead link ]
- ^"MWO: News".mwomercs.Archivedfrom the original on 2018-08-27.Retrieved2017-06-27.
- ^"Age of Empires II: DE Leaderboards - Age of Empires".14 November 2019.Archivedfrom the original on 27 January 2022.Retrieved27 January2022.
- ^"List of the Best Tetris Players in the World (NES NTSC)".27 October 2020.RetrievedJuly 15,2024.
- ^"Frequently Asked Questions: ratings".lichess.org.Archivedfrom the original on 2019-04-02.Retrieved2020-11-11.
- ^"Wayback Machine record of Clanbase".Archived fromthe originalon 2017-11-05.Retrieved2017-10-29.
- ^"Guild ladder".Wiki.guildwars. Archived fromthe originalon 2012-03-01.Retrieved2012-02-19.
- ^"Clanbase farewell message".Archivedfrom the original on 2013-12-24.Retrieved2017-10-29.
- ^"Scrimbase Gaming Ladder".Archivedfrom the original on 2017-10-30.Retrieved2017-10-29.
- ^"Golden Tee Fan Player Rating Page".26 December 2007.Archivedfrom the original on 2014-01-01.Retrieved2013-12-31.
- ^"Using Comparative Human Descriptions for Soft Biometrics"Archived2013-03-08 at theWayback Machine,D.A. Reid and M.S. Nixon, International Joint Conference on Biometrics (IJCB), 2011
- ^Pörschmann; et al. (2010). "Male reproductive success and its behavioural correlates in a polygynous mammal, the Galápagos sea lion (Zalophus wollebaeki)".Molecular Ecology.19(12):2574–86.doi:10.1111/j.1365-294X.2010.04665.x.PMID20497325.S2CID19595719.
- ^Tsang; et al. (2016)."Fabric inspection based on the Elo rating method".Pattern Recognition.51:378–394.Bibcode:2016PatRe..51..378T.doi:10.1016/j.patcog.2015.09.022.hdl:10722/229176.Archived fromthe originalon 2020-11-05.Retrieved2020-05-05.
- ^"Algorithm Competition Rating System".December 23, 2009. Archived fromthe originalon September 2, 2011.RetrievedSeptember 16,2011.
- ^"FAQ: What are the rating and the divisions?".Archivedfrom the original on September 25, 2011.RetrievedSeptember 16,2011.
- ^"Rating Distribution".Archivedfrom the original on October 13, 2011.RetrievedSeptember 16,2011.
- ^"Regarding rating: Part 2".Archivedfrom the original on October 13, 2011.RetrievedSeptember 16,2011.
- ^"Tinder matchmaking is more like Warcraft than you might think – Kill Screen".Kill Screen.2016-01-14. Archived fromthe originalon 2017-08-19.Retrieved2017-08-19.
- ^"The Best Smartphone Camera 2022!".YouTube.2022-12-22.Retrieved2023-01-07.
- ^Avery, Christopher N.;Glickman, Mark E.; Hoxby, Caroline M.; Metrick, Andrew (2013-02-01). "A Revealed Preference Ranking of U.S. Colleges and Universities".The Quarterly Journal of Economics.128(1):425–467.doi:10.1093/qje/qjs043.
- ^Irwin, Neil (4 September 2014)."Why Colleges With a Distinct Focus Have a Hidden Advantage".The Upshot.The New York Times.Retrieved9 May2023.
- ^Selingo, Jeffrey J. (September 23, 2015)."When students have choices among top colleges, which one do they choose?".The Washington Post.Retrieved9 May2023.
- ^Askell, Amanda; Bai, Yuntao; Chen, Anna; Drain, Dawn; Ganguli, Deep; Henighan, Tom; Jones, Andy; Joseph, Nicholas; Mann, Ben (2021-12-09). "A General Language Assistant as a Laboratory for Alignment".arXiv:2112.00861[cs.CL].
- ^"Chatbot Arena Leaderboard Week 8: Introducing MT-Bench and Vicuna-33B | LMSYS Org".lmsys.org.Retrieved2024-02-28.
- ^"Chatbot Arena: New models & Elo system update | LMSYS Org".lmsys.org.Retrieved2024-02-28.
- ^Screenplay forThe Social Network,Sony PicturesArchived2012-09-04 at theWayback Machine,p. 16
Sources
edit- Elo, Arpad(1986) [1st pub. 1978].The Rating of Chessplayers, Past and Present(Second ed.). New York: Arco Publishing, Inc.ISBN978-0-668-04721-0.
Further reading
edit- Harkness, Kenneth(1967).Official Chess Handbook.McKay.