[ QUOTE ]
[ QUOTE ]
I do stat's as a hobby, and as part of being a math teacher. If I have 20 ratings all 4's and 5's and I get one 1 rating, that 1 rating pushes me down to nearly a 4.0 rating. Now based on all the other data, I should haev something like a 4.5. In other words, the 'average' is not reflective of the data set.
In statisitcs there are formulas to determine outliers. They're not that complicated and we use them all the the time to get a good read on what numbers are telling us. Rather then letting one or two anomolus data points skew our measures of center we just toss out the data that is SOOOOO far from the norm as to be unlikely to be valid.
In other words, if 20 people give a 4 or a 5 and one person gives a 1, then it's probable, statistically speaking, that the single 1 star rating is not a valid data point.
Now if we have 10 5's and 10 1's, that's a TOTALLY different ball of wax, and in taht case, the standard deviation for the data will be substantially higher and suggest that the 1's ARE indeed valid data points. Part of the trick in stat's is that you don't toss the potentially invalid data totally, you just leave them out of the average until you have more data to either confirm that there is a downward trend, or to confirm that they are invalid.
In other words, you get 10 reviews. 9 5's and 1 1. The one is likely an outlier so the average is 5 (not counting the 1). Then you get 5 more reviews all 1. Now that implies that the 1 is in deed a valid data point, so you reinclude that 1 in the average, and then you recalculate.
It's not that hard and I've only had first year stats.
[/ QUOTE ]
The problem with statistics is a problem intrinsic in most computational mathematics: people tend to remember the formulas but often forget the contextual applicability. Statistics is one of the worst (or best, depending on your point of view) examples of this.
In this case, there is a subtle flaw in your reasoning (not picking on you specifically: its a flaw replicated in all posts of this nature). The flaw is that you're looking at the arcs as if they have an "intrinsic" value, and all of the individual ratings are attempts to "measure" that value. Under that context, its reasonable to consider whether some reviewers "just aren't very good at it." If 100 people rate an arc a 5, and one person rates it a zero, you could argue that "obviously" the Arc "is a five" and that one zero guy is just bad at reviewing.
However, that isn't really the case. Whether an arc is entertaining to someone or not is highly subjective. As a result, nearly all reviews are a composite score that combines the reviewer's opinion of the technical merits of the arc (which is at least somewhat objective) and its entertainment value (which is highly subjective and not always even possible to untangle from their opinion about its technical merits). There is no actual "intrinsic score" and as a result, the ratings values are an attempt to *create* a composite score across the entire playerbase (or at least the subset that plays arcs), not *measure* the score of the arc.
Suppose we have the hypothetical case that 90% of the entire playerbase universally loves challenge missions, and 10% of the entire playerbase absolutely hates them. A challenge mission might get 9 4s and 5s, and one 1. That one isn't "wrong" its representing that 10% of the player population. Claiming that the arc is really "basically a 4.5, excluding a minority that don't count" would be missing the point.
A single number cannot represent a wide range of circumstantial information. It can really only quantify a single magnitude. In this case, the proper representational number is to average all ten scores including the "outlier" as those scores properly represent the playerbase as a whole. Does that lose information in the process? Yes: as all composite scores and averages do. That's unavoidable.
Now, what happens when the scores *don't* represent the playerbase as a whole? Well, the answer to that question is: let me know when you have an example of a real-world case that can be unambiguously demonstrated to be non-representative, and I'll let you know. The problem is its very difficult to prove that a sample of the playerbase isn't a representative sample in the general case. It can be in specific cases (I can actually point to two in my own case) but even in those cases, the only things you can say are that its statistically likely that some data points are non-representative. There's no way to point to any individual one and make that claim, short of the reviewer themselves stating a clear statistically invalid bias directly.
On the subject of ratings: my opinion is still the same as my opinion in beta: while I think the rating system is problematic in a few areas, ultimately I think players should be allowed to rate by their own internal rating system, whatever that may be. Placing systematic requirements on raters - especially with any form of accountability - greatly reduces the chance for players to participate in the rating system, and in my opinion is counter to the intention of the rating system itself. The rating system is explicitly intended to be the part of the MA that is "for the masses" (as opposed to the authoring tools which are for "authors"). I think its reasonable to make suggestions to players on how to provide the most effective feedback, but I'd stop well short of telling people what constitutes each rating number.
[/ QUOTE ]
Ulli still wonder how arc with no cookies gets 5 stars...