Mathematical proof of new system's errors

#0 - Jan. 27, 2009, 6:46 a.m.

Everyone knows that the new system is broken. It's so horribly broken in fact that it almost hides the fact that PVP, all-around, is a shadow of its former (wobbly) self. Petitions and rants abound. I bring something new to the table.

I bring Science.

The situation was thus: I had a 2v2 team rated 1615 with a friend of mine. My entire 2v2 record this season included one week with some random dude, which went horribly and ended up in the 1440's. Then I founded a new team with said friend and played two weeks with his rogue. After those two weeks, he brought on his Death Knight and at this point in time, I had 22 games more than the DK. Team Rating was 1615, my personal was 1615, his personal was 1614. Peachy keen.

New system comes, we peak at 1682, settle at 1672. System gets reset, we peak at 1659 and after some point, for goodness knows what reason, my personal rating starts dropping horribly. Proof that I did peak above 1645 can be seen from my possession of the Hateful Gladiator legs, or if you don't believe that I bought them instead of having them drop off VoA (which you would if you knew my luck with rolls...), you can look towards my possession of the Deadly Gladiator's belt, which establishes a peak above 1630 and is enough for the purposes of this discussion.

The aforementioned horrible drop of my personal rating assumed a simple form; each time we won a match, I would gain a personal rating below that of the team rating gain. Each time we lost a game, I would lose *more* than the team. My team mate then started registering some mild gains on the team rating. After a while, I started keeping track of the changes and panicked when the differential reached roughly 50 below team rating. I submitted a ticket; no GM ever came, despite hours of waiting, and when I logged back in the following day, I received a form letter telling me the usual bland statements.

So I took a decision. I called upon Science.

*WARNING - SCIENTIFIC CONTENT AHEAD*

Applying a little reasoning, we thought that maybe the system still had some kinks in it and we created a new team. We then played 50 matches, from 1500/1500 team/personal rating for both of us, and I recorded the occurrences carefully. The results are plotted in the following figure (Blue = Team Rating, Green = My Rating, Red = Team Mate's Rating, Yellow Dots = Opposing Teams' Ratings):

http://nfist.pt/~pqueiro/Ratings50_DK.png

The drift I mentioned becomes readily apparent after less than a handful of games, and truly shocking after the 50 games where it hits a peak of -111. Here's an in-game screenshot for further proof:

http://nfist.pt/~pqueiro/WoWScrnShot.jpg

Armed with this data, a few things spring immediately to mind:

- The new system is actually quite effective at *not* matching you with equivalent opponents. Over 50 games (enough for basic statistics, certainly enough to invoke the Central Limit Theorem and similar tools in statistical analysis) you find a startling dearth of evenly matched opponents, going so far as to find matches made between a team at 1629 and another at 2048. Remarkable.
- There is an unmistakable downward trend for my rating versus the team's rating, so this was not an artifact of the transition. This was a brand-spanking new team, meaning this is ingrained in the new system. The new system condemns me to an ever declining rating unless my team has an overwhelming victory record (more on this later). Thanks, Blizzard!
- There is also an upward trend for my team mate, though far less marked.

Thundgot

#168 - Jan. 28, 2009, 11:11 a.m.

A very nice collection of data and good write-up, Vashardjor. Your numbers appear correct and based on these alone we can see how you (and others) may conclude that the system is not working correctly. However, due to the new system being hidden (kind of the idea, really) there are pieces which you couldn't include in your analysis, and which would likely have led to an entirely different conclusion.

Few system are ever completely set in stone, and we always like to see analysis and data collections and appreciate all the feedback people give about the game systems, and about the PvP rating system in particular. It could definitely be interesting to see how your ratings progress over the next 100 games. Please do keep feedback coming in this thorough and constructive form.

Edit: typo

Thundgot

#172 - Jan. 28, 2009, 11:33 a.m.

Q u o t e:
Some more info about this hidden rating would help these kind of analysis alot though.

Indeed, but that would defy the point of it being hidden.

Thundgot

#238 - Jan. 28, 2009, 4:19 p.m.

Q u o t e:
So what you, a blue poster, are telling me, is that after 73 games, I need 100 more to figure out what kind of rating Blizzard arbitrarily decided to award me?

No, only that it would have been interesting seeing an extended version of the data and graphs you presented. There was no speculation or indication as to what they would show after those additional games, only a show of interest in your research. Apologies if it seemed like a "you don't play enough Arena" - it wasn't meant like that.