Jump to content

The Rubrics Strike Back


Slawbug

Recommended Posts

Dintiradan's topic discussing rubrics for rating BoA scenarios got me thinking. What do folks think about rating rubrics for games in general?

 

A few years ago I put a general rubric together. It was intended mainly for RPGs, but could work for other genres with some modifications. The rubric obviously reflects what I think is important in games, and when I've tested it out it's given results that I've been pretty satisfied with.

 

I call this the 4/3/2/1 rubric because that's how the points are split up:

 

40% FACILITY (Play Control, Presentation, Flow, Substance)

30% GAME SPACE (Variety, Balance, Depth)

20% ATMOSPHERE (Story-World, Graphics & Sound)

10% STORYTELLING (Voice, Exposition, Character)

 

And here's a long explanation of the categories that I wrote way back when:

Click to reveal..
FACILITY refers to how easy it is to connect with the game, get into it, and enjoy it -- regardless of the game and the story. It doesn't consider issues of taste. If I don't like a game because I think the story is dumb, that's not a facility issue. But if the controls are frustrating to use, if the game is poorly paced and doesn't flow well, or if it's repetitive or just lacks content, those are all issues that interrupt my enjoyment of the game -- or prevent me from getting into it in the first place. Those are facility issues. Facility is worth the highest portion of points (40%) because it really can ruin an otherwise excellent game.

 

High Facility scores are given for:

— Intuitive, easy to use controls that don't require any thought while you play.

— On-screen elements that are easy to see, understand, and interact with.

— Appropriate pacing of cinematic, interactive, and game elements that keeps things flowing and engaging at all times.

— High ratios of substance to content.

 

Low Facility scores are given for:

— Frustrating controls that never seem to become intuitive.

— Clunky or confusing displays, menus, or other on-screen elements.

— "Dead time" or "sandwich time" that requires the player to wait (or enter mindlessly repetitive commands) while nothing meaningful happens. This is especially bad in short but frequent bursts.

— Poor pacing of cinematic, interactive, and game elements. This includes games that require lots of grinding.

— Extremely repetitive elements. This may be a double whammy with other categories.

 

 

GAME SPACE measures the robustness of the actual game mechanics. This is an issue of quality and not quantity. Games with relatively narrow mechanical palettes may nonetheless have balanced, varied, and tactically interesting gameplay. Often these qualities go hand-in-hand -- having a large variety of poorly balanced attacks to choose from will lead to very shallow strategy; on the other hand, when your options (and those of your opponents) are too limited, there's no room for strategy either. Game Space is worth the second highest portion of points (30%) because games with crappy stories are more playable than games with crappy mechanics.

 

High Game Space scores are given for:

— Gameplay mechanics that are expansive enough to allow for different tactics and strategies without becoming confusing or overextended.

— Well-balanced gameplay. This means balancing player abilities and stats against each other _and_ against enemy abilities and stats. In a well-balanced game, every attack and every unit are meaningful.

— Wide varieties of useful attacks, both for players and opponents.

— Depth of strategy.

 

Low Game Space scores are given for:

— Poorly-balanced gameplay that makes much of the game's attacks or units superfluous.

— A low variety to quantity ratio when it comes to attacks or units on either side.

— Lack of challenge, removing the impetus for strategy.

 

 

ATMOSPHERE refers to the game's ability to transport you into its world. That's the RP part of RPG, and it persists even in so-called "roll-playing" games. The main components here are graphics and sound, which under most rating schemata receive their own category. However, I'm less concerned with the technical virtuosity of the graphics and sound, which is highly subjective, and more with how they contribute to atmosphere. Text typically contributes a lot to atmosphere. In some games, carefully modified play controls or mechanics can be a part of the atmosphere as well. Atmosphere is worth a smaller portion of points (20%) because it's necessary for a truly engrossing game, but a game without atmosphere can still be playable and fun.

 

High Atmosphere scores are given for:

— Graphics, sound, and other elements that are seamlessly woven together to create an immersive and believable experience.

— A story-world that is internally consistent right down to the details, and thus does not require the player to suspend his disbelief.

 

Low Atmosphere scores are given for:

— A story-world that does not quite come together, whether due to ugly graphics and sound, gaping inconsistencies, piles of clichés, or whatever else.

 

 

STORYTELLING is the most straightforward category. Atmosphere to the side, how well does the game tell its story? When the plot advances, is it communicated in an engaging and believable way? Does the voice used add to the story (e.g., humor that fleshes out characters) or does it take away from it (e.g., confusing translations)? How real are the characters? Note that this category measures how the story is told, not whether the story itself is interesting, as that is too subjective. It has a place in reviews, but not a numerical place.

 

Storytelling is worth the smallest portion of points (10%) in part because it's possible to have a strong game with no story whatsoever, and in part because it is already reflected in the other categories. Developers who put a lot of effort into telling their story tend to put similar effort into developing the game mechanics that go along with that story.

 

High Storytelling scores are given for:

— Engrossing exposition of plot.

— An adequate number of fleshed out, genuine characters.

— Other engaging storytelling, such as use of humor.

 

Low Storytelling scores are given for:

— Inadequate or confusing exposition of plot.

— Cardboard characters who for some reason are given lines and screen time. If they are never developed in the first place, this is less problematic.

Anyway, I'm curious what other folks think about game rating rubrics. How would you design one?

Link to comment
Share on other sites

Most of the time, when you're reviewing something, a simple binary recommend/do not recommend is enough. Perhaps you add some text backing your position. The only time you need a score is when you're grading, and then, rubrics can be useful to enforce fairness, especially when you have multiple graders. It's also a way to point out where improvements can be made. Finally, most relevant in my experience, they're nice to hide behind when confronted with angry students. "Look, I didn't make the rubric, the professor did!" :-P

 

In the case of BoA, meh, I still don't know. I'm in the midst of typing up a bunch of reviews, and while I'm splitting my comments into sections, I'm not doing anything beyond that. On the one hand, the general attitude towards rubrics on the CSR has been poor historically. On the other hand, you occasionally have people upset at reviewers handing out "incorrect" marks.

Link to comment
Share on other sites

The categories you've included for rating are all important. The weights you've assigned them are fairly arbitrary. It's most useful to be descriptive, but it's good for at-a-glance reviewing to just assign scores to the categories. The weights can be left to the preferences of the reader.

 

Many reviews have such breakdowns and then an "overall" rating, which often isn't an average, or even a weighted average. I think it's appropriate to let the reviewer give a gut response. "The gameplay was shaky, the balance wasn't always quite thought through, and I can't articulate what made this game great, but it's a fantastic game." Overall rating higher than any individual rating, because the whole is greater than the sum of its parts.

 

—Alorael, who thinks the lovely thing about having multiple reviewers is that you can drown out the people who are wrong by reviewing yourself. And often, people reading reviews will notice the discrepancies and possibly play to find out how they feel about it.

Link to comment
Share on other sites

Also, as a matter of human psychology: humans can't reliably (in terms of either test-retest reliability or inter-observer reliability) distinguish more than about four ranks for any single distinct measure of quality. That means that for, say, Facility, you need at least 10 distinct subcategories to make your rubric meaningful (more if you want scores to start at 0 instead of 1).

Link to comment
Share on other sites

Originally Posted By: Beach upon the shores of reason
The categories you've included for rating are all important. The weights you've assigned them are fairly arbitrary. It's most useful to be descriptive, but it's good for at-a-glance reviewing to just assign scores to the categories. The weights can be left to the preferences of the reader.


Pretty much this, though generally I agree with Dintiradan that like/dislike is often enough for reviews to be useful to an average consumer.

Getting into numeric scores and weighted averages leads to something like Metacritic scores, which can be helpful in a generalized sense (don't buy things with scores under 20, maybe). But it breeds score inflation in the long term, even with something as simple as a 5-star system or a score out of 10. Just look at any game review site and go average the scores.

(I will admit, some of that inflation comes from the fact that most games which make it to any sort of release are complete enough that they couldn't ever be reasonably assigned a low value on the scale, but then that suggests the lower half of the scale is meaningless in the first place.)

Even without inflation in those systems, eventually numbers reach a point where the average person can't draw a meaningful distinction between them, like Lilith said. What's the difference between a Metacritic score of 70 and 75? Anything meaningful to a player?
Link to comment
Share on other sites

Metacritic actually works pretty well, as long as you consider it to have a scale rather unlike the one it operates on. There's above 90, 80-90, 70-80, and below 70. That really gives you enough to judge.

 

—Alorael, who considers the truly educational cases to be games like Mass Effect 3, where the critical and user opinions are highly divergent. That fact itself doesn't tell you much, but very wide score ranges or very different opinions often deserve a closer look, because a mix of strongly positive and strongly negative reactions often bodes better, if the thing is to your taste, than the merely decent. (This has stumbled backwards into the OKCupid Trends analysis of attractiveness, which possibly belongs in another topic.)

Link to comment
Share on other sites

Originally Posted By: Alorael
(This has stumbled backwards into the OKCupid Trends analysis of attractiveness, which possibly belongs in another topic.)
Is it a bad thing if I read that post and was thinking about experimental design rather than objectification of women?

EDIT: About the Mass Effect... effect. I think it's just that critics tend to have a more well-rounded approach, whereas the average person, rightly or wrongly, looks at the strongest or weakest feature.

The other explanation is that video game reviewers often don't finish their games due to time pressures. (Of course, most people don't finish video games.)
Link to comment
Share on other sites

Originally Posted By: Dintiradan
The other explanation is that video game reviewers often don't finish their games due to time pressures. (Of course, most people don't finish video games.)


This is also just a unique point of the medium. When you get like Skyrim floating out there, you can have two people who played the same title without having come anywhere close to playing the same game. You can get something similar-ish in other media when you look at layers of meaning, but it's harder to have such fundamentally divergent experiences regarding content in other media.

But anyway, that's something amazing about the medium, even though it can torpedo meaningful review scales.

I kind of forget the point I was trying to make. But anyway, that's just an awesome thing about video games.
Link to comment
Share on other sites

Originally Posted By: Dintiradan
Is it a bad thing if I read that post and was thinking about experimental design rather than objectification of women?


Well, I read it and the first thought in my mind after finishing was "R^2=0.28? That's a pretty bold conclusion for such weak support!", so at least you're not alone.
Link to comment
Share on other sites

And I'm not amused by their refusal to disclose their actual methods. Peer review it's not, but as data-mining for hypotheses it's at least intriguing.

 

—Alorael, who was also taken aback by their algebra/confusion. They're apparently at least minimally capable of simple statistics, so you'd think they'd know that algebra alone won't get you there. Or rather, it will, but only if you're handed the equations to plug and chug.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...