• Discover new ways to elevate your game with the updated DGCourseReview app!
    It's entirely free and enhanced with features shaped by user feedback to ensure your best experience on the course. (App Store or Google Play)

Hey hey ho ho round ratings have got to GO!

Of course not, my point about it being mathematically possible was to point out the flaw/bug/anomaly whatever you want to call it that exists in the ratings system, that as long as divisions are rated separately, identical scores can yield different round ratings, and will always favor the division with higher rated players. As far as inflation, bubbling up, whatever you want to call it, it's pretty easy to grasp the concept that as long as the higher rated players are always rated among themselves (DGPT events for example) combined with a ratings system that knows no limits, well...the rich will get richer. ;)

It just doesn't follow that the ratings would necessarily inflate, they might but I don't see any guaranteed mechanism by which it would. Identical scores yielding different round ratings doesn't necessarily favor the division with higher rated players. I'm sure that stat could be pulled. If we look at round ratings for fields with higher rated players and round ratings for fields with lower rated players do the identical scores on identical courses average higher when scored by the higher field (averaging over enough separate events should minimize the effect of course conditions)?

I can do this analysis quite easily if there's a good source for the data and it's easy to identify that identical courses are being played. Anyone know how/where it's available in an easy to consume form? (the format on PDGA website isn't ideal, I don't think it is easy to identify identical course layouts and writing a script to scrape it just doesn't seem appealing right now).
 
Of course not, my point about it being mathematically possible was to point out the flaw/bug/anomaly whatever you want to call it that exists in the ratings system, that as long as divisions are rated separately, identical scores can yield different round ratings, and will always favor the division with higher rated players. As far as inflation, bubbling up, whatever you want to call it, it's pretty easy to grasp the concept that as long as the higher rated players are always rated among themselves (DGPT events for example) combined with a ratings system that knows no limits, well...the rich will get richer. ;)

The biggest problem you have in this thread is the way you propose to test your hypothesis.

Why not start by simply taking a set of scores from a division in a PDGA tournament and compare the resulting ratings for three conditions:

1. Using the player's ratings
2. Using the player's ratings - 50
3. Using the player's ratings + 50

Do the results support your hypothesis?
 
It just doesn't follow that the ratings would necessarily inflate, they might but I don't see any guaranteed mechanism by which it would.

The ratings system has no cap, how can ratings not continue to rise? :confused:

Identical scores yielding different round ratings doesn't necessarily favor the division with higher rated players.

ok..you got me there, please elaborate.

The biggest problem you have in this thread is the way you propose to test your hypothesis. Why not start by simply taking a set of scores from a division in a PDGA tournament and compare the resulting ratings for three conditions:

1. Using the player's ratings
2. Using the player's ratings - 50
3. Using the player's ratings + 50

Do the results support your hypothesis?

Not understanding your thought process here...sorry.
 
The biggest problem you have in this thread is the way you propose to test your hypothesis.

Why not start by simply taking a set of scores from a division in a PDGA tournament and compare the resulting ratings for three conditions:

1. Using the player's ratings
2. Using the player's ratings - 50
3. Using the player's ratings + 50

Do the results support your hypothesis?

...
Not understanding your thought process here...sorry.

Your hypothesis is that the same score will be rated higher if the division's average rating is higher. Right?

So, my suggestion is that your take a set of scores, and manipulate the ratings of the players.

Looking at one player rated 900 who's 50 was rated 900. The null hypothesis (i.e. no effect) predicts:

case 1 rating = 900 (he shot his rating)
case 2 rating = 850 (he shot his adjusted rating)
case 3 rating = 950 (he shot his adjusted rating)

Your hypothesis would be supported if:

case 1 rating = 900
case 2 rating = 825 (rating lower than adjusted rating due to lower rated field)
case 3 rating = 975 (rating higher than adjusted rating due to higher rated field)

You can then check for significance using standard statistical methods. And, even better, support your basic argument with data showing it's "mathematically possible."
 
the ratings system has a fundamental assumption to it and it's proven valid. The fundamental assumption is that everyone that shows up comes out to play to the best of their ability. If you have 20 players rated 1000 play a course, they will on average play like 1000 rated players and 20 players rated 900 will on average play like 900 rated players.

If the scores for the two separate rounds are identical, then it implies that the round played by the 900 rated players was easier for some reason.

What I do see is that if you have 100 people that play together and are rated together today, then split those two groups for one year--those two groups played only among themselves for the course of the year and their ratings would after that year only reflect the internal competition, there could be some skew in the individual ratings. Say one group was all struck by lime disease--they were playing significantly worse in real terms, but in relative terms they continued to play within their bubble, the ratings average would be the same. When the two groups played again, the lime disease group that was playing much worse will perform really bad when everyone gets back together and the individual ratings will show the poor performance...after several rounds, things will normalize and individual ratings will become accurate once again.
 
It just doesn't follow that the ratings would necessarily inflate, they might but I don't see any guaranteed mechanism by which it would. Identical scores yielding different round ratings doesn't necessarily favor the division with higher rated players. I'm sure that stat could be pulled. If we look at round ratings for fields with higher rated players and round ratings for fields with lower rated players do the identical scores on identical courses average higher when scored by the higher field (averaging over enough separate events should minimize the effect of course conditions)?

I can do this analysis quite easily if there's a good source for the data and it's easy to identify that identical courses are being played. Anyone know how/where it's available in an easy to consume form? (the format on PDGA website isn't ideal, I don't think it is easy to identify identical course layouts and writing a script to scrape it just doesn't seem appealing right now).


This is not a scripted example, but a quick manual test on a recent DGPT example. PDGA page for Idlewild. Click on Layouts and check out the "Idlewild Amateur" and "Idlewild Open" played one week apart.
https://www.pdga.com/course-directory/course/idlewild

Here are the Open results:
https://www.pdga.com/tour/event/43983

Here are the Am results:
https://www.pdga.com/tour/event/45166

For each event, 1000 rated round was playing at about 66.
 
Forgot to mention that the PDGA layout tab states that each event was played on the same layout.
Par=68
Length=9194ft
I guess that doesn't guarantee the same layout just because the length was the same, but it should be very similar.
 
Your hypothesis is that the same score will be rated higher if the division's average rating is higher. Right?

Only if the division with the higher rated players was rated on it's own and not with/against any other division. It's not my hypothesis, the simple 800vs1000 thread verified that. Also please keep in mind, the differences in round ratings will probably be fairly small and any advantage to a player's actual rating might take much longer to recognize.

They don't have a fixed cap, but I believe the system caps them for all practical purposes.

What evidence have you seen that makes you think there's any kind of cap?

What I do see is that if you have 100 people that play together and are rated together today, then split those two groups for one year--those two groups played only among themselves for the course of the year and their ratings would after that year only reflect the internal competition, there could be some skew in the individual ratings.

Yep...the "creeping up" effect is subtle and plays out over time.

Forgot to mention that the PDGA layout tab states that each event was played on the same layout. Par=68 Length=9194ft I guess that doesn't guarantee the same layout just because the length was the same, but it should be very similar.

This is a good example to study, thanks for that. I remember reading somewhere if multiple events use the same layout and are played close together it could impact ratings for all of the events. Maybe that played out here. But technically....I'm seeing the same score rated higher for the DGPT mpo division. ;) lol... only by 1-3 points.... but it is higher. :p But here you also have a good example of ams shooing well, a rec player shooting 36 points shy of a 1000 rated round. wow...
 
Yep...the "creeping up" effect is subtle and plays out over time.

My point earlier was that theres no reason to believe that any given creep will be up rather than down. By rating in silos the rating creep occurs by the average change in level of play of the wider group not being reflected in the ratings only the relative changes of players within the group. If the whole group improves their play on average then there is no reference point to increase their ratings commensurately, so you will get rating deflation (the same level of skill yields a lower rating). The opposite occurs if the group on average gets worse. There is nothing to correct for that and so you get ratings inflation (the same level of skill yields a higher rating).

So, yes, it is important to worry about ratings reflecting the same level of skill over time but the range of values that ratings take won't inflate in the sense you're talking about, except where we are able to discern more granularity of skill levels (see scoring separation)
 
I remember reading somewhere if multiple events use the same layout and are played close together it could impact ratings for all of the events.
That's not the case. Rounds that are played on the same layout at the same event get grouped together. Rounds from separate tournaments never affect each other.
 
It's not a hard cap, it's a soft cap because the ratings are anchored to 1000.

Aa long as players show up to play to the best of their ability, that anchor extends back to the original ratings basis.

The evidence of the soft cap is in the results of 20+ years of data.
 
It's not a hard cap, it's a soft cap because the ratings are anchored to 1000.

Aa long as players show up to play to the best of their ability, that anchor extends back to the original ratings basis.

The evidence of the soft cap is in the results of 20+ years of data.

The other cap is the absolute scoring cap. It's impossible to score better than an 18 on a course, so there's a relative limit to how high a rating can be. There's no upper limit to score, so therefore there's no lower limit to rating.
 
The other cap is the absolute scoring cap. It's impossible to score better than an 18 on a course, so there's a relative limit to how high a rating can be. There's no upper limit to score, so therefore there's no lower limit to rating.

Since you bring up "limits" (to the system), MY biggest issue with the rating system:

It's built on a linear algorithm when it should be built on a one-tailed exponential one.

It might be "okay" for the vast majority of applications / situations but any statistician worth his weight would not have used an inferior algorithm to devise a system when a statistically valid one is available.

Good scientific practice does not 'allow' you to "make up something that'll handle MOST of the cases" and then believe it'll be good enough. And certainly good mathematics won't stand for such.

Case in point. 18 hole, all 3pars course.. Player A shoots a -17. RR = x. Player B shoots a -18. RR = x+y. Player C shoots a -19. RR = x+y+y. The present system has the difference between pA and pB being the same as between pB and pC. This, of course, is ridiculous as it is WAY, WAY, WAY harder to shoot 17 birdies and 1 ace opposed to 18 birdies compared to shooting 18 birdies opposed to 17 birdies and 1 par...and the ratings system should reflect this. But it cannot handle such because it is built on a 'flawed (for the application) algorithm".
 
That's not the case. Rounds that are played on the same layout at the same event get grouped together. Rounds from separate tournaments never affect each other.

Just to confirm this, each event only provides a minimal amount of information about the exact layout(s) used: course name, number of holes, course and hole par, and total layout length. i.e. The PDGA system does not have enough information to understand whether two entirely different events are using the same layouts (with the same rules) or not.
 
Only if the division with the higher rated players was rated on it's own and not with/against any other division. It's not my hypothesis, the simple 800vs1000 thread verified that. Also please keep in mind, the differences in round ratings will probably be fairly small and any advantage to a player's actual rating might take much longer to recognize.



What evidence have you seen that makes you think there's any kind of cap?



Yep...the "creeping up" effect is subtle and plays out over time.



This is a good example to study, thanks for that. I remember reading somewhere if multiple events use the same layout and are played close together it could impact ratings for all of the events. Maybe that played out here. But technically....I'm seeing the same score rated higher for the DGPT mpo division. ;) lol... only by 1-3 points.... but it is higher. :p But here you also have a good example of ams shooing well, a rec player shooting 36 points shy of a 1000 rated round. wow...

Since you bring up "limits" (to the system), MY biggest issue with the rating system:

It's built on a linear algorithm when it should be built on a one-tailed exponential one.

It might be "okay" for the vast majority of applications / situations but any statistician worth his weight would not have used an inferior algorithm to devise a system when a statistically valid one is available.

Good scientific practice does not 'allow' you to "make up something that'll handle MOST of the cases" and then believe it'll be good enough. And certainly good mathematics won't stand for such.

Case in point. 18 hole, all 3pars course.. Player A shoots a -17. RR = x. Player B shoots a -18. RR = x+y. Player C shoots a -19. RR = x+y+y. The present system has the difference between pA and pB being the same as between pB and pC. This, of course, is ridiculous as it is WAY, WAY, WAY harder to shoot 17 birdies and 1 ace opposed to 18 birdies compared to shooting 18 birdies opposed to 17 birdies and 1 par...and the ratings system should reflect this. But it cannot handle such because it is built on a 'flawed (for the application) algorithm".

So, you want to increase the reward for luck? If someone shoots a par round save making 2 aces in this par 3 course, they are -4. Are they really having a better round than someone that made 4 or more birdies?
 
Since you bring up "limits" (to the system), MY biggest issue with the rating system:

It's built on a linear algorithm when it should be built on a one-tailed exponential one.

It might be "okay" for the vast majority of applications / situations but any statistician worth his weight would not have used an inferior algorithm to devise a system when a statistically valid one is available.

Good scientific practice does not 'allow' you to "make up something that'll handle MOST of the cases" and then believe it'll be good enough. And certainly good mathematics won't stand for such.

Case in point. 18 hole, all 3pars course.. Player A shoots a -17. RR = x. Player B shoots a -18. RR = x+y. Player C shoots a -19. RR = x+y+y. The present system has the difference between pA and pB being the same as between pB and pC. This, of course, is ridiculous as it is WAY, WAY, WAY harder to shoot 17 birdies and 1 ace opposed to 18 birdies compared to shooting 18 birdies opposed to 17 birdies and 1 par...and the ratings system should reflect this. But it cannot handle such because it is built on a 'flawed (for the application) algorithm".

You make a good point. However, why stop at ratings? We all know it is very much harder to get two-under than one-under on a hole. We also know it takes almost the same of amount of lack-of-skill to get double-bogey as triple-bogey.

Therefore, we should not be giving out scores that are linear and always integers. Instead, we should give out scores that reflect the underlying skill required to get those scores. Skill as measured by where each score ranks percentile-wise.

I actually calculated the appropriate values here.
http://www.stevewestdiscgolf.com/Squeezing_More_Information_out_of_Disc_Golf_Scores.pdf

For example, on a typical par 4, we would replace the player's throw count (in the left column) with the score in the right column.

1 -3.83
2 0.63
3 2.76
4 4.19
5 5.12
6 5.74
7 6.14

So, when a regular player got a 4 on a par 4, their total score would go up by 4.19. If they got a 7, their total score would go up by 6.14. If they got an ace, their total score would go down by 3.83.

Obviously, there is not as much difference in skill between getting a 7 vs 4 as there is between getting a 1 vs 4. So, the player should only be punished with 1.95 more throws for getting a 7 vs 4, while they would be rewarded with 8.02 fewer throws for getting a 1 vs. 4.

That's just a simplification. In practice, each percentile score would be calculated after each round with a custom set of values for each hole. (Which would make them independent of the par set by the TD.)
 
You make a good point. However, why stop at ratings? We all know it is very much harder to get two-under than one-under on a hole. We also know it takes almost the same of amount of lack-of-skill to get double-bogey as triple-bogey.

Therefore, we should not be giving out scores that are linear and always integers. Instead, we should give out scores that reflect the underlying skill required to get those scores. Skill as measured by where each score ranks percentile-wise.

I actually calculated the appropriate values here.
http://www.stevewestdiscgolf.com/Squeezing_More_Information_out_of_Disc_Golf_Scores.pdf

For example, on a typical par 4, we would replace the player's throw count (in the left column) with the score in the right column.

1 -3.83
2 0.63
3 2.76
4 4.19
5 5.12
6 5.74
7 6.14

So, when a regular player got a 4 on a par 4, their total score would go up by 4.19. If they got a 7, their total score would go up by 6.14. If they got an ace, their total score would go down by 3.83.

Obviously, there is not as much difference in skill between getting a 7 vs 4 as there is between getting a 1 vs 4. So, the player should only be punished with 1.95 more throws for getting a 7 vs 4, while they would be rewarded with 8.02 fewer throws for getting a 1 vs. 4.

That's just a simplification. In practice, each percentile score would be calculated after each round with a custom set of values for each hole. (Which would make them independent of the par set by the TD.)

Then the problem is we don't know who is leading during a tournament, scores can only be calculated at the end of a round. This would remove the spectator excitement. We couldn't have a fixed scoring spread per hole since such things would change with the weather and the seasons. It's obviously a better model but I think far too complex for players to feel comfortable with.
 
The other cap is the absolute scoring cap. It's impossible to score better than an 18 on a course, so there's a relative limit to how high a rating can be. There's no upper limit to score, so therefore there's no lower limit to rating.

This doesn't constrain the expansion of the rating range to only the lower ratings, having no limit to an upper score can translate into higher top ratings if some good players score higher while the very best of the best retain their low scores. The effect would be the SSA is fixed to a higher score and so the top player would score more strokes below the SSA and therefore be rated higher. There is no limit to the ratings. Imagine a course where one player aces every hole and scores 18 and the "1000 rated player" representing the rest of a large field of players scores 10, 20, 30, 40, 50, ... strokes per hole, that 18 aces player's round gets rated higher and higher and higher the more strokes you assume the 'scratch' player is making. This is why Steve and many others focus on scoring separation. The greater the scoring separation the more evidence we have to discern between ever more granular differences in skill level. The more granular we become the wider our rating range is.

I too was surprised to find out the rating formula was linear, but it is a least fairly simple to understand.
 
And watching the MVP would tell you that no one specific throw is the most valuable throw on every hole.

Is it the drive? The upshot? The scramble?

McBeth had to save bogey or worse...was that more valuable than an ace run?

Golf rates every throw as equal. The ratings reflect that. Anything else would lead to ratings that failed to align with the event outcome. All the ratings system does is create a weighting scheme that can be applied to any related event (meaning rated players participating.). You cannot create granularity where the game does not already have that granularity.
 

Latest posts

Top