How I Score Games

My Thoughts on Scoring Methods
Since the question comes up periodically about how my scoring works, and I consider the exercise of scoring to be an interesting one that has no real standard and only many schools of thought, I figured I'd try to break it down a bit and hopefully you won't leave this page more confused than you were before you started reading. I do strive to be fair and consistent at a high level but keep in mind that while my biases in terms of what I personally prefer or dislike will make their way into the tone of my reviews I attempt to score from a "taking a step back" perspective to some degree.

There are mainstream and highly-rated games I really dislike (the original Tomb Raider series will always be a go-to for that, though I enjoy the reboots) but can respect other people enjoying and there are games I enjoy that probably aren't that great in some way in the same mainstream eyes. To a degree I do try to account for mainstream taste as I do this, though obviously that's a tricky business since everyone has their own preference certainly. Keep in mind, I'm hardly claiming to have a lock on this, I'm just saying I attempt to consider it the best I can since my personal feelings aren't all that matters.

Conceptually I'd say first that methods that break down games into multiple areas that are then scored and averaged or totaled in some way, more than ever, are complicated and outright flawed. Especially given the robust indie community, which I obviously favor, evaluating things like sound or graphics is a tricky business when dealing in all manner of visual design and sound choices we get now. Between pixel graphics, games with minimalist aesthetics, and some with low-poly models as just a few examples there's no longer a set and fair scale for graphics, ultimately you're just arbitrarily scoring them. The big problem with this is that if you either hate the pretty game or love the ugly one using this method creates a sort of trap for you where either you risk your final score not being a sum of its parts (or, worse, less than them) or that you'll skew your scoring in order to justify your final score and make it all seem like the system still works.

Elements like story, gameplay, replay value... they pretty much all fall into the same traps and, worse, each have varying degrees of importance depending on the genre. So, again, with there being such a diversity of games out there when you latch onto story how do you score a roguelike or even a classic like Civilization? Worse, how about Tetris? Replay value is also tricky territory since there are games whose magic is primarily tied to something you don't know or realize going in, making returning less magical in the process. Does that automatically make it less worthy of a high score when it is so completely compelling that first time through? A great example that comes to mind is The Walking Dead Season 1. That is one of the most powerful games I've played in terms of the impact it had on me but I can't say I'd return to it. That doesn't make it less worthy of praise though. The only one that rules is control, and I very much factor that in, but it is a special consideration and if anything would outweigh the rest anyway. In summary, I'm very much not following that school. I've been there and have rejected it.

Another approach, which I wish the world was ready for, is really cool conceptually but has its downsides as well: Scoring games on an absolute scale relative to all games you've played. The goal here would be to think of every game you've played on a continuum of sorts with a scale between 1 and 10, and scoring them in a way that would say "among all games I've played this would be in the bottom/top X%". First, it would conceptually justify a 1 and it would force you to use the full scale. Second, it would also justify many more 10s than are ever given out since it would just be stating that certain games are in the top 10% of all games you've ever played. The more you've played the more scores there would be at each tier... so for me that would be quite a lot.

Unfortunately it doesn't take much thought to get into the downsides of that method. First, people expect scoring to roughly end up in a bell curve and, being truthful, for the most part across the industry this would probably be closer to the truth than everything being evenly subdivided. Second, if you were the only game in town doing this it would really make some of your scores out of range, again citing the bell curve expectation and how most scoring is done in the real world. Third, it would do far more to celebrate your biases since the scale is inherently subjective... they're there because of the games you have played so there can be no attempt at seeming impartial. Last, unlike a scoring where you simply are saying at that time I thought the game was an 8.5 this seems to invite returning and rebalancing over time. What was in my top 20% may fall to top 30% if you thought it through a few years later. Just in general I enjoy this thought as a mental exercise but couldn't do it.

Thoughts on "Using the entire palette" for Scoring
When I look around I get a bit peeved by how often I see reviewers who seem unwilling or unable to fully acknowledge how bad some games are and can be and score them appropriately. The convergence of game scores into a narrow band generally from only 7 to 9 does a great disservice to people who read game reviews, essentially rewards or deceives people who release flawed games and who are less inclined to worry about criticism if they're critically successful, and worst of all weakens the meaning of anything in that range for games the legitimately deserve them.

While I can't think of any instance where I've scored a game below a 3, the full spectrum is there to be used, and failing to make use of it is harmful to everyone. Much like working in an office where diligent workers are only "awarded" with extra responsibilities and pressure while weaker employees are essentially given the blind eye a failure to clearly and properly differentiate the two groups only rewards the weak, harming the people who are working their hardest for you. In truth, though, in the long run you're also harming the weak employees through this neglect as well. Without honest and critical feedback they're far more likely to repeat some of all of their previous mistakes, encouraged by artificially-inflated scores. This, then, really creates an "everyone loses" scenario when this is done since it waters down greater games that share that score, it makes it more likely your readers will pick up weaker games on your advice, and it makes it far more likely that mistakes made in this iteration will continue to be made moving forward because the developers don't understand that there's a real problem holding them back. Worse, if a buyer is new to the platform and decides to give the weaker game you decided to give an 8, and it results in a bad experience for them, they may choose to view the problem as the platform itself, and that this is the best they can hope to expect from an 8 on the given system. Dangerous stuff.

So Are You Finally Going to Share How You Score Games?
So now we've gone through how I don't score games and various problems I see we'll get on to how it does work. Being honest and up-front the majority of the score boils down to me applying everything I know and have seen in games to what I'm playing and thus applying a rough score. It is rare within the first few hours of playing a game that my final score deviates more than +/- 1 point from this phase of the process, though some games do manage to redeem themselves over time or implode on themselves due to some crippling problem somewhere and those exceptions do happen. That's why reviewers need to dig in and stick with games for a while before making a decision, though with indies this is relative to the amount of content they include. In general, there's just an air around games that you play at some point when you play so many of them... they feel like a 6 or an 8. Anyone who reviews anything at all who says otherwise to you is lying. As I cited above many sites will try to set up criteria and rating systems that will imply that there's more of a scientific process to things but there really isn't. People are coming away from the game with a base feeling and these sorts of systems are just there to serve justifying that final score the majority of the time.

So how do these feelings work? I will say that this decision-making is hardly arbitrary but I will readily admit that what you're evaluating and what you think the score will be is based on many different things and, on a general level, these will vary from game to game. The genre plays a huge role in what's important and what your expectations will be. Though this has been a controversial statement (and I wrote an entire editorial on it) value versus the MSRP of the game absolutely will play a role in this process as well. The scale of your expectations for a $20 game are different than for a $60 game. This can be couched in concepts like replayability, depth, and quality of support for added community features like mutliplayer but you'll be looking for more from that game with a higher price every time. If you weren't you wouldn't be doing a very good job of looking out for the interests of the people reading your review that may be basing their decision to spend that $60 on your evaluation of the game or on the other end someone who made a terrific game that's well worth $20 even without AAA bells and whistles.

What the process then most often looks like beyond intuition isn't generally looking for what elements elevate the score of the game so much as those that diminish it. If nothing else think of it this way, quantifying why a game is an 8.5 out of 10 is a quicker process from the top than from the bottom. You obviously need to be fair in how you apply those points or half points in that narrow space but for almost all games what's holding them back from being better is fair easier to measure than what all it is doing right. If nothing else even if a game did 1000 things right but had 1 super-critical bug in it that crippled the experience all anyone would focus on would then be that bug.

So, in the interests of clarity here's a list of factors that get measured (relative to genre or style) when I'm considering a review:

  • Control - This will absolutely cripple a game's score or knock it down a little depending on how bad the situation is. If I need to fight with your controls, they aren't intuitive, they aren't very responsive or consistent, etc it will absolutely harm my game experience and the experience of anyone who plays it and should be scored down. This isn't always necessarily physical either, sometimes there are design choices or other factors that come into play and hamper your ability to do what you need to do because of the systems in place as well... regardless if something is holding the player back that sucks and should be duly noted
  • Serious Bugs - The word serious is difficult to qualify but overall I understand bugs, I'm a developer by trade, and I'm well aware there are things that can get through the process. I've quietly shared some minor bugs with developers directly when I've found them, no big deal. However, if there are some serious flaws that go beyond a nuisance and begin to likely impede people's ability to enjoy the game with any consistency they really need to count
  • Difficulty - Another one that is challenging to quantify but is relevant. You could make the greatest game in the world but if it is too easy or too hard for an "average" gamer (I mentally put my bar at what I'd think the 55 - 70% percentile is for my gamer capability "sweet spot") it will likely hold things back a bit relative to the degree of relative ease or difficulty there is.
  • "Mainstream Appeal" - Another that is tough to quantify but for the most part it amounts to whether this strictly is a "genre game" or if it's a game that somehow bends or changes the rules in a way that it likely just about anyone could play it and have a fair chance of enjoying themselves. There's obviously never a guarantee but a good "for instance" would be anything roguelike. While I love them the odds of them becoming mainstream are also kind of remote, at least for now and until some game manages to find a way to blend and balance things in a way that is appealing to a more broad audience
  • "Is It Memorable?" - Another that is obviously tough to quantify but I'd say this is a better thing to consider over replayability or something like "story" since, for me, it overlaps them. There are games I'll always want to come back to because they're awesome and memorable and I want to have that experience or feeling I get with them again over the long run. There are also games I'll never play again but made such an impact on me that years later I'm still talking about them. This is often one of the last hurdles for the best games out there and whether they can break the barrier to being a 10, and it is a tricky bar. There's a big difference between a game that has a hold on your now but that is forgotten once you stop playing and ones that you continue to reminisce about
  • "How Portable Is It?" - In the case of the Switch this is especially crucial and sometimes problems with playing games viably in handheld mode because of scaling get swept under the rug, especially when games are ported. Sorry, this is a system with a central selling point being that you can play anywhere and if there are issues with that it should be properly noted

Anyway, on a general level these sorts of considerations are often swirling around in my head and, on a general level, when I try to quantify why my "feel" score lands roughly in a certain spot there's usually some combination of these factors in play bringing things down little by little. Is it possible my personal feelings for a game, whether good or ill, can play into this and color scoring? I'd give you a half point in general and perhaps as much as an entire point in some extreme cases but on the whole I'd say my scoring is at least attempting to have a degree of impartiality to it, and really there's no real way to fully remove that factor even if you wanted to.

If you've gotten to this point in the article congratulations, you've just gotten the trophy for "Diligent Reader" and if you did it in one sitting without finding yourself dozing off you've also received the trophy for "Brave, But Bored, Soul". Let me know if you have any thoughts, we can discuss!

Justin Nation

No comments: