Why Ratings are awful

smyith · Aug 27, 2013

tampora said:
That's not calculus. That's not even math.

either you dont know anything about calculus, as in the point behind it, or your teacher didn't explain it to you. just like quantum physics we are making it all up to explain things we cant explain with concrete math. If I remember correctly there are no LAWS in calculus just theorems.

grodney · Aug 27, 2013

smyith said:
either you dont know anything about calculus, as in the point behind it, or your teacher didn't explain it to you. just like quantum physics we are making it all up to explain things we cant explain with concrete math. If I remember correctly there are no LAWS in calculus just theorems.

smyith, I think you've been led astray.

_MTL_ · Aug 27, 2013

There are clear flaws. Looks no further than when a field has two pools and the courses played are similar but at different times.

Here's an event. MPO, FPO, MPM, MG1, MA3 and MJ2 played this order:

ABCA

All others played this order:

CAAB

http://www.pdga.com/tournament_results/105282

It's a joke how different they are.

Now, I'm going to post the reply everyone will say. "But when the final results come out, they will fix it so each layout rates the same based on each day."

And in the past, they didn't fix this issue and I applaud Chuck for taking about 7 years to see what everyone saw every weekend.

However, this still shows flaws in the calculations. What if there was no other pool and it was just the half of the field that played the event? The event with Open players would be high. The event with no open players would be low.

smyith · Aug 27, 2013

grodney said:
smyith, I think you've been led astray.

then find me a LAW in calculus.

TOURNEYPLAYER · Aug 27, 2013

Cmon Bobby.

we all know the ratings at the Memorial are inflated. use a different example to prove your dominance.

1978 · Aug 27, 2013

Regardless of all the debate, I like ratings. They give you goals to shoot towards. There are some inconsistencies, but by enlarge... you can accurately assess your skill compared to someone else based on rating. Maybe a NC 950 is better than a ME 950, that is up for debate. But a NC 850 is always worse than ME 950 if they have enough rounds and time playing.

+1 for the current rating system.

bombmk · Aug 27, 2013

To answer the original question:

Yes, it does make little sense to say they ratings cannot be compared and then average them out to get one rating. I can somewhat understand if it infuriates you a little bit that this does not get sufficiently acknowledged here.

Because that is senseless - by itself. If you cannot say that an 1136 round is better than an 1120 round, then it makes little sense to average them out to a number and say that it indicates the skill of the player. Strictly speaking.

A better way to say it would have been that the individual round ratings suffers from flaws that will be somewhat averaged out when several rounds are averaged together. And have proven to be good enough in establishing a players skill. As demonstrated by a few people in this thread.

As long as the rating is influenced by the rest of the field an individual round rating will at most be an approximation. That should be relatively clear. And admitted.

But given that we are very much subject to the whims of the weather gods, it would be even harder to come up with a system that evaluated the player against the course alone.

So, MTL, I would say you correct in your observation - but need to look at the overall result of it, before saying its "wrong". Experience proves that it gives us a significantly trustworthy result. The conclusion from your observation should be "This can impossibly be perfect" - with which I think noone would disagree.

And yes, this means that a player could be forced up into a division when he really should not, if the system was perfect. But its reasonably good if you can't really point to who that would be.

Lewis · Aug 27, 2013

TOURNEYPLAYER said:
Cmon Bobby.

we all know the ratings at the Memorial are inflated. use a different example to prove your dominance.

After reading the whole thread just now, I think I see some of our confusion coming to light in this post. The confusion I see comes from the fact that PDGA ratings measure the relative performance of players, and nothing besides. When you try to apply player ratings to anything else, whether it be an event, a course, a layout, whatever, you're misusing them. To put a finer point on it, player ratings have no units. They have a relationship to strokes in a given round, but as a metric, they exist in a bubble, and are only really useful at assessing the relative skill of two particular players. The sentence, "that was a 1000 rated round," has no meaning until it is put in the context of other players.

What I see as the main vulnerability of Chuck's rating system is that player ratings could get skewed over time in local areas that exist in a bubble, with very little crossover of players inside the bubble competing against those outside it. Hypothetically, you could have just enough propagators develop inside the bubble with a small set of skewed rounds from elsewhere to have the bubble reproduce skewed ratings relative to the outside world. But with all the travel and touring that disc golfers like to do, do you think that's really a significant risk?

It's possible, maybe even likely, that the relationship between strokes and relative skill is not linear as y'all say Chuck's formulas assume, but that doesn't mean that ratings don't accurately predict who is the better player. To get a better feel for whether the relationship is not linear, you'll have to look into how often a 930 player gets beaten by a 910 player, vs. how often a 1030 player gets beaten by a 1010 player. I betcha Chuck has already done some of that.

grodney · Aug 27, 2013

smyith said:
then find me a LAW in calculus.

It's math. It exists as much as any other math. There's no LAW that says 1+1 is 2. That's a human invention.

wake911 · Aug 27, 2013

Dave242 said:
This analysis is based on actual results data from Pro Worlds 2013.

I did my own version of the stats as well for worlds (long and not well put together, sorry and all based on unofficial ratings still)

The largest variation above rating was 4.48%, avg 1022 for worlds, rated 978.

The largest variation below rating was -2.97% avg 936, rated 965.

McBeth avg 1053, .94% (9.8) over his rating of 1043

57.6% (72 of the 125) who finished all 6 rounds (excluding semis and finals) shot within 1% of their rating, either above or below.

4.8% shot at least 20 points below their rating.

12.7% shot at least 20 points above than their rating.

36 people avg'd >10 points over their rating
16 people avg'd >10 points below their rating.

Avg rating of all competitiors who finished 6 rounds = 990
avg rating of tourney rounds = 993 = 0.3% above rating.

Rating Range # avg Rating avg round variance
1019+ 25 1030.5 1034.0 3.5
1002-1017 24 1007.8 1010.4 2.6
980-1001 25 989.8 992.7 2.9
969-979 25 974.0 978.7 4.7
<969 26 950.1 953.8 3.6

since everyone loves talking about Joe Bishop, he was the 6th highest over his rating, avg 943 compared to his 913 rating (again, I shot a round with him, his 991, and it was a damn good time, most fun round of the week, and a crappy scoring round for me)

So the issue to me is either the ratings system is accurate, or it is so inaccurate that it has become consistently very bad. Either way, i think its pretty interesting, and more times than not (When a solid sample size) it is quite accurate for how someone would do compared to me on a normal course.

I think this does bring up the fact that we need a more accurate way to rate an individual course on any given day. That way, our ratings would mean something. if SSE is the "par" or (what a 1000 rated would shoot) then i should be able to figure out pretty quick what i should shoot based on my rating. That is where we aren't at yet, and once we are, i think that will be better and make ratings more accepted as accurate descriptions of ability.

Lewis · Aug 27, 2013

bombmk said:
To answer the original question:

Yes, it does make little sense to say they ratings cannot be compared and then average them out to get one rating. I can somewhat understand if it infuriates you a little bit that this does not get sufficiently acknowledged here.

Because that is senseless - by itself. If you cannot say that an 1136 round is better than an 1120 round, then it makes little sense to average them out to a number and say that it indicates the skill of the player. Strictly speaking.

A better way to say it would have been that the individual round ratings suffers from flaws that will be somewhat averaged out when several rounds are averaged together. And have proven to be good enough in establishing a players skill. As demonstrated by a few people in this thread.

As long as the rating is influenced by the rest of the field an individual round rating will at most be an approximation. That should be relatively clear. And admitted.

But given that we are very much subject to the whims of the weather gods, it would be even harder to come up with a system that evaluated the player against the course alone.

So, MTL, I would say you correct in your observation - but need to look at the overall result of it, before saying its "wrong". Experience proves that it gives us a significantly trustworthy result. The conclusion from your observation should be "This can impossibly be perfect" - with which I think noone would disagree.

And yes, this means that a player could be forced up into a division when he really should not, if the system was perfect. But its reasonably good if you can't really point to who that would be.

Well written! :clap:

grodney · Aug 27, 2013

wake911 said:
The largest variation above rating was 4.48%, avg 1022 for worlds, rated 978.

These kinds of percentages are totally meaningless, aren't they? 1000 was arbitrarily chosen as the 'scratch' golfer. It could just as easily be 500 or 2000 or 10,000. Then the "percent" variance would be completely different (assuming all the other number (points-per-stroke) stay the same).

Lewis · Aug 27, 2013

wake911 said:
So the issue to me is either the ratings system is accurate, or it is so inaccurate that it has become consistently very bad. Either way, i think its pretty interesting, and more times than not (When a solid sample size) it is quite accurate for how someone would do compared to me on a normal course.

I think this does bring up the fact that we need a more accurate way to rate an individual course on any given day. That way, our ratings would mean something. if SSE is the "par" or (what a 1000 rated would shoot) then i should be able to figure out pretty quick what i should shoot based on my rating. That is where we aren't at yet, and once we are, i think that will be better and make ratings more accepted as accurate descriptions of ability.

Sample size is critical. When it's small, you can't be as confident that the numbers are accurate. The reason we don't want to say which of the two highest rated rounds ever was more impressive is that it's a tiny sample size, and therefore subject to a wide margin of error. Given that people are prone to poor instincts when they try to solve statistical problems with their gut feelings (google "Monty Hall problem"), I'm not very confident that our feelings about one round vs. another are any better than the numbers themselves.

The same sample-size related error applies when you try to judge what you should do on a particular course at a particular hour of a particular day. You can do it, but not with a lot of confidence. That's why a competitive round requires a minimum number of propagators before it can generate ratings.

Grungedude42 · Aug 27, 2013

Until we develop a standardized Course rating and Slope rating like golf, monitored by a much larger regulating agency, I don't see how much more accurate any player rating system could be. Is our Rating system designed to represent a player's average score, or reflect a player's ultimate best potential, like golf's Handicap system?

If the purpose of ratings is accurately predicting top pro performance, ratings seem to live up to their purpose fairly well. I'm not so sure it fits the lower level players, particularly players on a bubble between two divisions, as well. But then, is that really its purpose anyway? If someone's rating excludes them from moving up to a more difficult division or tournament, should they really be considering playing that division or tournament? If a player's rating excludes them from a lower division, shouldn't they be setting their sights higher? (Presuming rating represents total potential, not recent averages)

I guess it's difficult from the OP to determine exactly what the reasoning is behind declaring the system "awful". We have too much variation in courses across regions, installation dates, designers goals, etc, compounded with the lack of standardization in installation specifications. We still have school, parks and churches installing courses without ever knowing a governing body even exists for the sport. Leagues and tourneys full of locals spring up, and never an eye is cast toward sanctioning the league. It's a grassroots sport, after all.

Is it even feasible to establish course and slope ratings for courses used for a certain level of tournament and up? Even is we did, what's the goal? In the end, aren't we just trying to provide a leveling system for fair competition based on skill level? Is there another sport that uses such a system that isn't based off of simple win/loss performance?

Maybe our real problem is that no one wants to accept being 49th out of 50, and so we have to have a bazillion divions with 1st, 2nd and 3rd instead.

Alternatively, maybe we should have more drawn out match play tournments, and base ratings off of wins/losses instead of by the stroke. Otherwise, disc golfers need to quit expecting a 100% retail return value on every entry fee, and start contributing to a governing body large enough to provide paid officials to rate courses and run tourneys. You can't tighten a ship with volunteers running the show.

Discette · Aug 27, 2013

_MTL_ said:
Someone explain this.

You complain to Chuck about ratings too low on high SSA's - like the one at Maple Hill and especially compared to rounds like the Fountain Hills round. He says you can't compare the two. Ok, I get that.

But then they use every round played - regardless of SSA - and average them together to come up with a player rating. Hold on. I thought that you couldn't compare high SSA's and low SSA's?

And then here's the kicker. Then they use the player's ratings - which we established are questionable - to come up with the round rating of each round.

How does anyone think that this is accurate?

I think the ratings are accurate for the intended purpose: to place Amateur players in divisions for which they are eligible. The ratings will never be perfect and I think vslaugh explains why you can't compare certain rounds.

vslaugh said:
...

Anyway, to get back to the original post, the constant points per stroke assumption of the PDGA rating system -- which helps to keep things somewhat simple -- prevents the rating system from accurately comparing the relative difficulty of two "hot" (outlier) rounds on courses with different SSAs. ....

Vslaugh is saying the exact same thing as Chuck.

It has been proven that ratings (however flawed) are an excellent indicator of how well a player will perform. I think most will agree that the PDGA rating system has flaws. It is not necessary to bring up seven year old events to prove this point.

_MTL_ said:
There are clear flaws. Looks no further than when a field has two pools and the courses played are similar but at different times.

Here's an event. MPO, FPO, MPM, MG1, MA3 and MJ2 played this order:

ABCA

All others played this order:

CAAB

http://www.pdga.com/tournament_results/105282

It's a joke how different they are.

Now, I'm going to post the reply everyone will say. "But when the final results come out, they will fix it so each layout rates the same based on each day."

And in the past, they didn't fix this issue and I applaud Chuck for taking about 7 years to see what everyone saw every weekend.

However, this still shows flaws in the calculations. What if there was no other pool and it was just the half of the field that played the event? The event with Open players would be high. The event with no open players would be low.

You could easily prove or disprove your theory (about half the field or no open players). Simply prepare the alternate scoring report(s) for the August 2013 event and have the TD upload it temporarily to PDGA Admin to generate the ratings. Then take the screen shots and get back to us with the "proof".

It has been proven that ratings are an ACCURATE predictor of performance, therefore, the ratings are accurate. The ratings could be more accurate, which brings us to this answer:

vslaugh said:
"All models are wrong, but some models are useful."
--George Box

Of course the PDGA ratings system is going to have flaws. It has to strike a balance between power and simplicity. Chuck could probably add dozens of bells and whistles to make the PDGA ratings system really really complicated so that it could more accurately compare two ridiculously good rounds like Paul's, but more complicated is not always better.

To summarize: Ratings are flawed but they are an accurate predictor of performance and ...more complicated is not always better!

tarel · Aug 27, 2013

are you upset Paul's 45 is not rated higher?

garublador · Aug 27, 2013

_MTL_ said:
On my facebook page, Chuck said:

"It's apples and oranges comparing courses like Fountain and Maple Hill which are about 10 apart on SSA values that's why we have "Best Ever" rounds in different SSA ranges."

Don't think I'm extrapolating....

About that comment, it's easier to understand what he means if you take it to an extreme.

It's like asking what score is better, an 18 on an 18 hole course where each hole is 1', or a 68 on a par 72? It just doesn't make sense to compare those two scores because a super good player and a really crappy player will get the same score on the first course but very different scores on the second course. So you can't tell how "good" of a score an 18 is on the first course.

It can go the other way, too. Bad players playing a difficult course will probably lump together with low scores. They won't have the skill to differentiate themselves from one another. On an easier course they might have a larger spread of scores which will give more accurate ratings.

That ratings "compression" will affect the "quality" of the score for a different SSA course making it difficult to compare that with a course with a wider variety of end scores. You just aren't comparing two similar rounds.

Those rounds can be averaged together just fine, it's just that the low SSA course scores won't do as good of a job differentiating between players. It won't change the order of who's best to worst, it just won't be as much help determining that, either.

This is why course design and having players play from the correct tees is important when it comes to ratings. The better the course is suited for your specific skill level, the more accurate your rating will be.

fountg · Aug 27, 2013

There is never going to be an end all-beat all ratings system unless every course had the same par, or was the same difficulty. The ratings are great because they assess how a player will shoot ON AVERAGE. You wont find a solid number for ratings because people don't shoot the same exact number every round. Obviously you cannot compare any two courses unless they are of similar difficulty. But the ratings are for how the PLAYER plays on AVERAGE, meaning how the player will play on any given course. He could play a little better or worse, and thats to be expected with any statistic.

bneely · Aug 27, 2013

I don't have much to add here, but I'd like to thank mashnut, Dave242, et al for putting forth very good and reasoned explanations of the system.

From a math standpoint, considering the practical constraints of the tournament system it's trying to represent, the ratings methodology is quite sound.

Outlier rounds - both extremely good and poor shooting - will always present a higher standard error. Perhaps a non-linear approach on either side of the SSA for a given round would help a little with this, but I'm not sure the end result (being able to more carefully compare a Memorial 39 with a Maple Hill 45) is worth the level of effort.

megamerican · Aug 27, 2013

Statistics are used when we don't or can't know everything about a situation, but want to make predictions and by its very nature is never going to be 100% accurate.

As shown by the Worlds data presented by Dave, it does a very good job at predicting who will be on the top, middle and bottom for people who have a lot of data points.

There are flaws, mostly due to lack of data points for many members and the lag at which ratings get updated, due to them being updated only on a monthly basis and also rely on TD's to get their reports and fees in order in a timely manner.

Theme font size

Why Ratings are awful

Suffers from Delusions of Grandeur

* Ace Member *

Flippy Flopper

Suffers from Delusions of Grandeur

*Reform School Scholar*

* Ace Member *

Eagle Member

* Ace Member *

* Ace Member *

Double Eagle Member

* Ace Member *

* Ace Member *

* Ace Member *

Double Eagle Member

Independent Operator*

Double Eagle Member

* Ace Member *

Par Member

Birdie Member

Par Member

Similar threads

Reform School Scholar