Disc Golf Course Review (http://www.dgcoursereview.com/forums/index.php)
-   General Disc Golf Chat (http://www.dgcoursereview.com/forums/forumdisplay.php?f=2)
-   -   When is a 1020 round not a 1020 round? (http://www.dgcoursereview.com/forums/showthread.php?t=83596)

 jeverett 03-01-2013 07:40 PM

When is a 1020 round not a 1020 round?

Preface: this post will contain statistics, and may be best-directed at cgkdisc.

No, my rhetorical question is not actually rhetorical. I believe that the PDGA "rating interval compression formulas" are incorrect, and as such PDGA player round ratings (and subsequently player ratings) are not as accurate as they could be. The result of these formulas, from what data I've been able to collect so far, is that almost no recorded rounds for any event, for rounds above or below 1000, are being correctly rated. The discrepancy is typically very small, however may be much more pronounced on very short or very difficult courses.

Note: I am referencing the 2013 Gentlemen's Club Challenge for much of the following. Unfortunately, a ratings update has happened since the following calculations were run, making comparison with the original data more difficult. The results of that event can be found here:

http://www.pdga.com/tournament_results/99721

For any who are interested, here is a quick breakdown of how PDGA round ratings are calculated:

Step 1. Determine who really is a propagator. Unfortunately this part is actually annoying to do accurately with available data. For example, players without active PDGA memberships may still be propagators for an event, but their initial rating cannot be determined externally. Also, in order to be considered a propagator, a player must have played a sufficient number of previous rated rounds (i.e. they need more than to just have a rating), and this is very time-consuming (and sometimes impossible) to determine externally.

Step 2. Using all propagators, determine the best-fit linear equation that matches the round data. For example, here is a plot of round 1 of the 2013 Gentleman's Club Challenge (GCC):

The linear equation that best fits the above data is:

Round_Score = -0.12196922 * Initial_Rating + 182.4463478

Step 3. Determine the SSA for the round, using the best-fit linear equation above. As an example, for round 1 of the GCC, the SSA is approximately 60.47712771.

Step 4. Determine round ratings, using the following 'compression formulas':

For SSA's above 50.3289725:
Rating Increment = -0.225067 * SSA + 21.3858

For SSA's below 50.3289725:
Rating Increment = -0.487095 * SSA + 34.5734

Note: these formulas were derived from PDGA event results, and are not the precise formulas used by the PDGA. Unfortunately, due to rounding (of round ratings to the nearest whole number, and possibly also round of SSA values) it is impossible to exactly determine the actual linear formulas used. They are accurate, however, to within 0.01% of the actual PDGA linear formulas, and in the case of the GCC, accurately matched the 'unofficial' round ratings for each player in the round (subject to a small amount of rounding error). The rating increment for round 1, for example, was 7.774394297, or each throw off of the SSA increased or decreased the players' round rating by 7.774394297.

Step 5. Compare the round rating of each propagator with their initial rating, and throw out any propagator who's round rating was not within 50 point of their initial rating, and recalculate steps 2-4. Here is a plot of GCC round 1, with these propagators removed:

The linear equation that best fits the above data is:

Round_Score = -0.114947347 * Initial_Rating + 175.3579791

Producing a modified SSA of 60.41063194, and a new rating interval (using the same compression formula as above) of 7.789360301.

So.. what's wrong with any of this? Nothing.. yet. But what happens when you compare the linear equation that was used to compute the SSA with the rating interval compression formula value for that SSA? *If* it is valid to use a linear equation to model player rating vs. round score (and from the two plots above, plus countless other plots I've made from PDGA round data it does appear to be valid), should it not also be valid to use this same linear equation to determine the (averaged) number of rating points per throw? But this is not what the PDGA system does. For example, when you compare the rating interval used for each round of the GCC, against the observed rating-interval-per-throw-vs-initial-player-rating of all propagators, you get the following plot:

The PDGA Interval line above is of course this linear formula:

Rating_Increment = -0.225067 * SSA + 21.3858

But as you can observe, this in no way matches the observed rating interval based on player initial rating vs. round score, and the round data I've collected to date further supports this trend: that the relationship between initial player rating, round score, and round rating increment cannot be accurately predicted using the PDGA linear 'compression formula'. The effect of this is that round ratings are not (as) accurately reflecting the effect of player rating on round scores as they could be. In other words, if a 1020-rated player shoots an "average" round, by the PDGA compression formulas that round is not being rated a 1020. In fact, the only round rating not subject to this kind of induced error is of course a round rated exactly 1000.

What am I suggesting? I am suggesting that the PDGA instead switch to using the same linear equation used to determine SSA for a round to determine the per-throw rating increment. For example, for round 1 of the GCC, this would mean a round that was rated as a 1020 (technically a round score of 57.8430269985) would instead be rated as a 1022.3372266 (or rounded to a 1022). Yes, this difference is very small..

So why does it really matter? The goal here is correlation.. or more specifically the correlation between (initial) player rating and round/event score. i.e. How well did the initial ratings of all players predict how they scored at an event. I don't know if anyone has seen the PDGA report on correlation coefficients for their events (it was published last year), but increasing the correlation coefficient for Majors appears to be a goal of theirs. This of course can also be addressed with course design, but a very simple way to improve the correlation coefficient could be to switch how round ratings themselves are calculated. I don't, however, have any method of proving that this change will work.. and that's where I'd *love* some help (maybe even from Chuck). In order to test this, the 'real' PDGA method of determining SSA and rating intervals per throw would need to be used, using the precise number and ratings of propagators for real events, and then we'd need to test the impact both round rating methods would have on first player ratings and then the correlation coefficient of how well future event rounds are predicted by said player ratings.

Ok.. wall of text over.. thoughts?

 BionicRib 03-01-2013 07:52 PM

Do you perhaps have a link to the correlation coefficients reported by the PDGA? Or are they only sent to the TD's?

 Dave242 03-01-2013 07:53 PM

Quote:
 Originally Posted by jeverett (Post 1877148) the SSA is approximately 60.47712771
Your use of "approximate" is less accurate than PDGA ratings. :D

I guess it is important to get this down to the nearest few billionths of a throw!

 jeverett 03-01-2013 07:58 PM

Quote:
 Originally Posted by BionicRib (Post 1877166) Do you perhaps have a link to the correlation coefficients reported by the PDGA? Or are they only sent to the TD's?
What I have is the PDGA document (more of a white paper) on the topic. It is available online here:

http://www.pdga.com/files/documents/...er_Courses.pdf

 BionicRib 03-01-2013 09:09 PM

I asked Chuck a similar question a few days ago. I do see what you are getting at and after reading that file you linked me too.......I would say that it there are two issues that will take time to help with these correlation coefficients. 1. Is as you mentioned.....Course design...."luck golf/tweener holes/bad design". Having clearer definitions that are practiced by "all" designers across the country is a work in progress. 2. IMO....and I'm sure Chuck can clear this up better than I, but is there really enough data to work with? I personally don't think so. If disc golf were as popular as baseball or golf for that matter, the numbers would be more "fine tuned" because you have more players. More players equals more data. You have a fifty fifty chance of getting heads or tails when flipping a coin. If you flip it 100 times I bet you never get 50/50 on the dot.......(more like 70/30 or 60/40). If you flip it a million times you will get closer and closer to that "50/50" on paper. Just my thoughts

 jeverett 03-01-2013 09:27 PM

Quote:
 Originally Posted by BionicRib (Post 1877275) I asked Chuck a similar question a few days ago. I do see what you are getting at and after reading that file you linked me too.......I would say that it there are two issues that will take time to help with these correlation coefficients. 1. Is as you mentioned.....Course design...."luck golf/tweener holes/bad design". Having clearer definitions that are practiced by "all" designers across the country is a work in progress. 2. IMO....and I'm sure Chuck can clear this up better than I, but is there really enough data to work with? I personally don't think so. If disc golf were as popular as baseball or golf for that matter, the numbers would be more "fine tuned" because you have more players. More players equals more data. You have a fifty fifty chance of getting heads or tails when flipping a coin. If you flip it 100 times I bet you never get 50/50 on the dot.......(more like 70/30 or 60/40). If you flip it a million times you will get closer and closer to that "50/50" on paper. Just my thoughts
Hi BionicRib,

Oh definitely, due to sample size problems, the inherent random nature of disc golf (somewhat mitigated by course and equipment design), and physical/muscular limitations on just how 'controllable' disc golf is, period, getting a 100% correlation coefficient between player rating and event score is going to be impossible.. not to mention not ideal anyway.

However my hope is, that with one slight adjustment to how player round ratings (and by extension player ratings) are calculated, we can very, very slightly increase the correlation coefficient between (initial) player rating and event score. I don't really know what could be expected in terms of improvement with this one change.. probably less than 1%.. but as I said, I don't have the ability to determine this, due to not having access to the *exact* same rating methodology that the PDGA uses.

 jenb 03-01-2013 10:38 PM

Of course, one answer is that it doesn't really matter as long as it's applied consistently to everyone.

When someone gives that answer, I want you to look them square in the eye and say, "Momma said knock you out."

 Steve West 03-02-2013 02:33 PM

Why assume "linear"?

 Hampstead 03-02-2013 03:02 PM

I just want to say that I love playing, but the OP made my head swim.
Too smart for me.
Definitely not hating, I'm impressed with all the info.

 DGstatistician 03-02-2013 05:42 PM

Quote:
 Originally Posted by jeverett (Post 1877295) Hi BionicRib, Oh definitely, due to sample size problems, the inherent random nature of disc golf (somewhat mitigated by course and equipment design), and physical/muscular limitations on just how 'controllable' disc golf is, period, getting a 100% correlation coefficient between player rating and event score is going to be impossible.. not to mention not ideal anyway. However my hope is, that with one slight adjustment to how player round ratings (and by extension player ratings) are calculated, we can very, very slightly increase the correlation coefficient between (initial) player rating and event score. I don't really know what could be expected in terms of improvement with this one change.. probably less than 1%.. but as I said, I don't have the ability to determine this, due to not having access to the *exact* same rating methodology that the PDGA uses.
I will preface this by saying that currently my teaching schedule has me too busy to look through data and figure out exact models. In May, it will be in between Spring and Summer semesters and I will have plenty of time to look into the data and actually test some of my theories.

You make a very valid point that the binary model may not be as accurate as a linear model. (This is a classic cost of simplicity debate). However, I believe that this is also not accurate. It is likely that we should be looking at a quadratic of some sort. Although I am not sure of what model I would specifically fit to the design, the difference between harder or easier courses is likely not linear.

I also plan on looking into the lag that is used for ratings. Specifically the use and value of rounds played over 6 months previously.

All times are GMT -4. The time now is 10:33 AM.