Odds of beating a player rated x

Kodachrome · Jan 4, 2015

Magicweed said:
While looking through data, I found some amateur players with standard deviations in the low 20s, which lead me to believe they may have capped in terms of their rating and potential. What does everyone think?

no.

it's like saying "nolangherity throws, on average, 10 babies off a cliff per day. he is fairly consistent, therefore he will never throw more than 10."

Magicweed · Jan 4, 2015

Kodachrome said:
no.

it's like saying "nolangherity throws, on average, 10 babies off a cliff per day. he is fairly consistent, therefore he will never throw more than 10."

Ok. You would be making blanket assumptions without knowing the capabilities and limitations of the individuals in question. nolangherity could easily be capable of throwing more than 10 babies off a cliff, but there is currently an unknown variable keeping him from doing so within the data.

Kodachrome · Jan 4, 2015

Magicweed said:
Ok. You would be making blanket assumptions without knowing the capabilities and limitations of the individuals in question. nolangherity could easily be capable of throwing more than 10 babies off a cliff, but there is currently an unknown variable keeping him from doing so within the data.

exactly.

i know lots of guys that throw consistently, but they aren't that good. it's just their technique.

Broken Shoulder · Jan 4, 2015

He can only fit ten babies in his smart car. When he buys a van his average will go up.

Cgkdisc · Jan 4, 2015

I think looking at SKEW of a player's round ratings might be a better indicator of a player whose rating is on the move upward (or downward) separate from SD.

Magicweed · Jan 4, 2015

Cgkdisc said:
I think looking at SKEW of a player's round ratings might be a better indicator of a player whose rating is on the move upward (or downward) separate from SD.

Since SKEW indicates that the bulk of the data falls on one side of the distribution, a negative SKEW (-.40) would indicate that more of the data was located above the median. This could potentially indicate an upward trend in the players rating. The value (on a scale of 0 to 1) would indicate how severe the SKEW was, with values closer to +/- 1 indicating the steepest slopes. Is that right?

Cgkdisc · Jan 4, 2015

Sometimes a severe skew in either direction might occur with only a few data points so having more data (round ratings) would be more likely to actually indicate movement in a direction. You would also want to look at a trend over several rating updates to see if it matches the direction of the skew.

MarkDSM · Jan 4, 2015

http://en.m.wikipedia.org/wiki/Glicko_rating_system

glicko is a badass system. The ratings deviation keyed with frequency of play helps push activity. (Certainly does in chess..)

Cgkdisc · Jan 4, 2015

Match play ratings systems like those used for chess or Sagarin don't work or work well for sports with individual performance against a course.

Lewis · Jan 7, 2015

Cgkdisc said:
Match play ratings systems like those used for chess or Sagarin don't work or work well for sports with individual performance against a course.

Are you saying Elo type rating systems work better when the competition is more direct, e.g., Elo should predict outcomes better for boxing than for downhill skiing? I'm curious about this. Can you point me towards some helpful reading?

gdub58 · Jan 7, 2015

Lewis said:
Are you saying Elo type rating systems work better when the competition is more direct, e.g., Elo should predict outcomes better for boxing than for downhill skiing? I'm curious about this. Can you point me towards some helpful reading?

I think that depends on two things - the variability of the performance based on the opponent, and the variability based on the environment. Chess is entirely opponent-dependent, which I believe is what Elo is designed for.

Along the continuum, boxing is almost entirely opponent-dependent (one could argue that looser/tighter ropes could impact the result between two relatively equal opponents), downhill skiing would be a bit more environment driven, and disc golf would probably be one of the most environment-dependent, even moreso than ball golf.

That said, in disc golf our ratings system is quite liberal in that it doesn't have to be based on a high degree of recency and the variability in courses (and setups within courses) can benefit certain players and inflate ratings. Both of these are more tightly controlled in ball golf, and still those can be (and are) manipulated by those who are so inclined.

Of course, ratings of individual rounds are partially driven by the opponents in a given round, to a fault some would say. But, unless we have more courses with fixed setups that would enable us to collect enough data to say with confidence that a round of X on course Y should always be rated 1000 (with a reasonable +/-)

The other impactful variable disc golf has more than ball golf or other "rated" sports is weather, so there's that too.

Depending on how familiar I am with a course and how well it suits my game, I think I would have a good chance beating a player rated 50 points higher than me, and I would not be surprised to lose to a player rated 50 points lower. Now, if my rating and my opponent's were based on the same 10 tournaments held in the past 12 months, then I would narrow that range considerably. In the end, I don't think anyone is going to be making betting lines for disc golf any time soon.

jeverett · Jan 7, 2015

gdub58 said:
I think that depends on two things - the variability of the performance based on the opponent, and the variability based on the environment. Chess is entirely opponent-dependent, which I believe is what Elo is designed for.

Rating system like Elo/Glicko/TrueSkill are actually able to handle higher-variance (e.g. environmental variables) results reasonably still, they just need an adjusted (i.e. lower) 'k factor'. The k factor is in internal (constant or variable) that determines confidence.. i.e. how confident is the rating system that the input match results are 'meaningful'. A low k factor means that the system has low confidence that the input match results are meaningful, and thus the match will have a low impact on the ratings of those players. Rating systems like Elo and Glicko use a constant value for their k factor, but systems like TrueSkill use a dynamic k factor, which changes based on variables like number of prior matches played (more matches played = greater confidence in a players' rating) and standard deviation (smaller standard deviation = greater confidence in a players' rating).

The one aspect of these particular rating systems that has seemed problematic, however, is that they all use a constant value to assess odds of a tie (against an equally-rated player). i.e. for any two players of equal rating, the odds of them tying is a constant value. In disc golf, by contrast, I would strongly suspect that the odds of two equally-rated player tying is not a constant value, but at least partially-dependent on the course or layout (environment). e.g. the chances of a tie are higher on a low-SSA course than a higher-SSA one.

Cgkdisc · Jan 7, 2015

Elo and Sagarin rankings are fundamentally binary result calculators that do not use the richness of data provided by sport competitions using some metric like score/time/distance for ranking by individual or team performance. Chess has three results with win-lose-draw but no credit in the Elo system is given for winning in 4, 20 or 40 moves. Sagarin primarily looks at whether players or teams in a sport win, lose or tie.

In the case of DG or skiing, we have an additional metric which is score or time that can be used to refine rankings. In our ratings system, not only does it matter whether someone beats another player in sudden death or by 5 throws but also what their actual score was on the course. The Sagarin system as I understand it would only give the 3rd place finisher credit for beating those who finished 4th through 10th not by how many throws nor their actual score(s) on the course(s).

jeverett · Jan 7, 2015

Cgkdisc said:
Elo and Sagarin rankings are fundamentally binary result calculators that do not use the richness of data provided by sport competitions using some metric like score/time/distance for ranking by individual or team performance. Chess has three results with win-lose-draw but no credit in the Elo system is given for winning in 4, 20 or 40 moves. Sagarin primarily looks at whether players or teams in a sport win, lose or tie.

In the case of DG or skiing, we have an additional metric which is score or time that can be used to refine rankings. In our ratings system, not only does it matter whether someone beats another player in sudden death or by 5 throws but also what their actual score was on the course. The Sagarin system as I understand it would only give the 3rd place finisher credit for beating those who finished 4th through 10th not by how many throws nor their actual score(s) on the course(s).

TrueSkill, similarly, can accept the event results of a full field, but only as a rank-ordered (place) list. While this method is definitely throwing out potentially-valuable (round score) data, it also has the potential advantage of ignoring course scoring spread differences/discrepancies. i.e. if we accept round scores as inputs, we have to also compensate for how any particular course/layout spreads out round scores. The current PDGA system does this by making the assumption that all rounds of equal SSA spread out scores equally (an unproven assumption, at best). A system that ignores round scores, and only accepts a rank-ordered list *could* possibly get around this problem.. however as mentioned above I think computing odds of ties throws a huge monkey-wrench into the mix.

Cgkdisc · Jan 7, 2015

jeverett said:
The current PDGA system does this by making the assumption that all rounds of equal SSA spread out scores equally (an unproven assumption, at best).

Ask grodney about it. He went through our piles of data around 2003-4 and did not discover this "speculative" slope factor used, but not even proven, in ball golf because they don't have the ability to do so.

Lewis · Jan 8, 2015

Cgkdisc said:
Elo and Sagarin rankings are fundamentally binary result calculators that do not use the richness of data provided by sport competitions using some metric like score/time/distance for ranking by individual or team performance. ...

In the case of DG or skiing, we have an additional metric which is score or time that can be used to refine rankings. In our ratings system, not only does it matter whether someone beats another player in sudden death or by 5 throws but also what their actual score was on the course. The Sagarin system as I understand it would only give the 3rd place finisher credit for beating those who finished 4th through 10th not by how many throws nor their actual score(s) on the course(s).

I don't know about Sagarin, but Elo definitely doesn't account for margin of victory. Beating 10 players by a combined 10 strokes would be counted the same as beating 10 players by a combined 50 strokes. There are probably modified versions of Elo that take margin of victory into account, but I don't know if anyone could find a consistent, non-arbitrary way of assessing this for any particular sport. I have read that the international organization for the board game "Go" uses a version of Elo that matches their handicapping system, but it probably took them a lot of trial and error to get it right.

I would love to hear from Grodney about his research on slope factor in disc golf.

opcorn:

jeverett · Jan 8, 2015

Lewis said:
I would love to hear from Grodney about his research on slope factor in disc golf. opcorn:

I second this.

paulw · Jan 8, 2015

What course set-up conditions would likely produce the highest ever rated-rounds?

Is there anything in the works to bring the ratings more in line with the more difficult courses? It seems so counter-intuitive that the most difficult courses can't produce the highest rated rounds.

Cgkdisc · Jan 8, 2015

Courses with SSAs that come in just under 54 that have lots of trouble whether narrow fairways or OB are the best candidates for the highest rated overall. You're unlikely to ever see an overall highest rated round on a course in the SSA ranges above 60 or below 48.

Lewis · Jan 8, 2015

Cgkdisc said:
Courses with SSAs that come in just under 54 that have lots of trouble whether narrow fairways or OB are the best candidates for the highest rated overall. You're unlikely to ever see an overall highest rated round on a course in the SSA ranges above 60 or below 48.

Serious question: if distinguishable "slope" does not exist among disc golf courses, are these super-high round ratings caused by luck, i.e., is the kind of course you describe simply less fair than others, and the high ratings come when the right guy gets lucky?

Theme font size

Odds of beating a player rated x

* Ace Member *

Eagle Member

* Ace Member *

* Ace Member *

.:Hall of Fame Member:.

Eagle Member

.:Hall of Fame Member:.

Double Eagle Member

.:Hall of Fame Member:.

* Ace Member *

Birdie Member

Double Eagle Member

.:Hall of Fame Member:.

Double Eagle Member

.:Hall of Fame Member:.

* Ace Member *

Double Eagle Member

Eagle Member

.:Hall of Fame Member:.

* Ace Member *

Similar threads