Analyzing Other Approaches

In Davidson’s article, he described at length a matchmaking system based on player engagement that he reported came from Zhengxing Chen, a researcher at Facebook. In it, he mentioned the amount of additional data, such as the time it takes for them to put down a game, and alters the next match so that the player is more likely to want to keep playing. This was tested by Farmville, according to a researcher named Naomi Clark whom Davidson cited. It seemed to work too, although Farmville is also single player. However, it could work for multiplayer games as well, taking into account things like the ratio of wins and losses, game duration, number of games played, and whatever data is exclusive to a certain type of game mode. Trueskill, for example, has sets of rules for 16-player free for all games, as well as games that have either two teams total, or four teams total.

Figure 1: List of Rules in the Trueskill rating system (Lee, 2012, tbl. 1)
Rule Matches
16P free-for-all 3
8P free-for-all 3
4P free-for-all 5
2P free-for-all 12
2:2:2:2 10
4:4:4:4 20
4:4 46
8:8 91

Multiplayer matching could anticipate complaints and address them appropriately by using player data. But it could also ruin a game’s objective and/or subjective fairness, which is arguably more important, as stated by Herbrich and Graepel, in their study where they stated that, “Matchmaking should be based primarily on skill and be otherwise not under the influence of the gamers. Ranked re-matches should be disallowed or limited to one to avoid the risk of collusion.” (Graepel & Herbrich, 2006, p. 6) This collusion can ruin the subjective fairness that lower-level players will have seeing one higher-level player gain a major boost, just because they’ve been losing too often, and as shown in Figure 2 by Véron and the others, the greater the skill gap between players, the more likely players are going to quit the match. Along with this is the concept of “smurfing”. This is when experienced players start up new accounts and pretend to be newer players, and play against actual new players and casuals. The result, according to a group of researchers looking into another MOBA game “Heroes of Newerth”, is: “thus winning easily, but ruining the playing experience for inexperienced, and often new, players in the process (and cutting into the future profit for the company, as well).” (Caplar et al., 2013, p. 2). It goes to show that these competitive players are going to conflict with the casual audience just trying to have fun, or with people trying to grow their skills independently. So let’s consider the research of Neven Caplar and teams’ studies and see what they think.

Figure 2: Distance between players skill levels and frequency of players quitting with waiting times thrown in as a control (Véron et al., 2014, fig. 2)

In researching how to deal with this smurf issue, they first sought to gather a dataset of player ratings for their case game, which was done by taking the whole player ladder, which has been made available on the website This site has since been abandoned. They then took the statistics of several thousand players and did some math to study the player’s ranking. Player ranking is decided via Elo rating, and in it they discovered its limitations. Those of note include “rating inflation, and freezing of top rankings (by players who stop playing once they have reached top positions, i.e., no rating deflation over time).” (Caplar et al., 2013, p. 2). They also looked into its matchmaking algorithm, and made some interesting comments. According to the researchers, the developers of the game, Garena and S2 Games, posted a patch that supposedly “addressed [the] recognized problem of ‘smurfs’.” (Caplar et al., 2013, p. 2).

They also touched on the possibility of using neural networking. To paraphrase, they cited another scientific study that proposes using neural networks to evaluate the skills of players and maximize their perceived fun factor, as well as predict complex team scenarios where they might not be even in terms of members. This complex use of neural networks could possibly be the solution for this big issue. They mentioned how matchmaking shouldn’t solely have to be based on player skill. It could be easier to base it off of network connection, if we borrow from their example. This is actually something that Davison cited Chen using, in a phone interview where “[he] confirmed the growing complexity of matchmaking techniques: ‘Previously, they only looked at your win-loss history … and tried to develop one scalar score [like Elo or MMR] for you to summarize your skill. But as time goes on, I can see that there’s work using neural networks to summarize your skills in multiple aspects, not just one single score, and trying to use more history, more information to estimate your skills in different areas.’” (Davison, 2022, para. 21). This could be done with the amount of metrics that rating systems already gather, such as win/loss ratio, experience & currency per minute, how much time or real-world money a player has spent on the game, the length of time the game lasts, and so much more.

The rest of their experiment demonstrated these metrics in use and how they affected the matchmaking rank (MMR). In section 5, the results of their large case study were unveiled. The first subsection demonstrates how the number of games played affected ranking, demonstrated in the graph seen below.

Figure 3: Number of Games Played as Function of MMR (Caplar et al., 2013, fig. 3)

This graph has a correlation, sure, but it contains some anomalies, namely, involving the trend line having no feasible way to match with the results due to an extreme amount of variance in people’s MMR and the number of games played. The next subsections involved ratios. including the number of wins and the number of losses, and the ratio of kills, assists, and deaths. These revealed a rather obvious common trend of the higher one’s rank reflecting a higher number of kills and assists. Afterwards is gold (currency) and experience a player gains in a minute. Experience is a number type that determines the overall effectiveness of a character. High experience means more levels, which means a character is stronger. The result of it is that more skilled players are able to get these things much easier, and experience can be picked up by anyone, meaning that matches would likely match those of similar rank together because they can better coordinate things so that if a player is nearby, they can both gain experience from a person’s kill.

Afterwards is game length, action rate, rate of spawning wards (an item that allows map visibility), denying players of killing your creeps (NPCs that help attack bases, minions in other games), and a player account’s age. Game length had two graphs, one which showed that the probability for a match to end at a certain time decreased the higher the time was, generally ending at the 20 or 40 minute mark, 40 minutes for the full match, and 20 minutes if the game was called off early. The other showed that games were often shorter in higher ranks on average.

Figure 4. Distribution of games duration
Figure 4. Distribution of games duration (Caplar et al., 2013, fig. 4)


Figure 5: Average game length as function of MMR (Caplar et al., 2013, fig.5 (6)

The 20 minute mark ends and the quitting system seems to help deal with griefers, players who intentionally sabotage their team, as four out of five team members have to approve to quit the match.

Next is action rate, the rate players performed actions, which increased with barely a curve in the trend line. It seems to conclude the same thing about match formation that k/d/a ratios and win/loss ratios do, encouraging that similar ranking. Warding rate is next, and this one had a sharp increase in the beginning, but after a certain amount of use, it curved over into and slowed down increasing. The same for the number of denials of creeps, which seems to imply that for lower ranked players, these are viable uses of time and resources, something that experienced players don’t need to use as often, as they know the counters. Finally, the age of a player’s account, which has a heat map, but depicts a result similar to the third figure Caplar and the others created. In this study, they concluded that this data collection with some error, does at least manage to make a fair assessment of people’s performance. But it’s too slow assigning them to said skill groups. So is there still a way to speed things up?

Maybe there is, and that could be a Peer-to-Peer (P2P) system of matchmaking called the SelfAid, as proposed by Michał Boron,  Jerzy Brzezinski, and Anna Kobusinska. They state in their article that the “…presented solution allows a player to quickly connect to others, provided that no failures occur. In this case, accessing a service algorithm is only a matter of issuing one request to announcement DHT and then one request to the process.” (Boroń et al., 2020, sec. 7). The Distributive Hash Table is obtained through a service algorithm which contains the necessary data to help match players into the place they want, all without the need of a server.