About the CoGs Leaderboard Space

What are CoGs?

Competitive Gamers. Just a moniker for game players who like to compete, specifically at tabletop games, sometimes going under the labels of board games, card games, social games, Euro games, German games.

In the ideal, CoGs are not people who are out to win at all costs, but people who enjoy the thrill of competition, playing against well matched opponents, balanced and exciting games.

What is a leaderboard?

A list of all the players, of a given game ranked by some measure of performance or skill. There are Golf leaderboards, Tennis leaderboards, Chess leaderboards and more. The CoGs Leaderbard Space hopes provides a place to store, manage and present leaderboards for any number of games (Golf excluded ;-).

Why leaderboards?

Leaderboards are popular for a number of reasons:

They offer a meta-game or meta-competition, that surrounds a given game.
You're not just competing with your opponents at a given game session, but with all players over time, for long term rankings.
They help plan balanced games.
Players at similar ranks can plan to play together, helping to avoid runaway victories and create more in-game tension.
They promote replay and focus.
Get better at a game you like, play it again and again. This has particular relevance in the tabletop gaming world where new games are being published at unprecedented rates, and the novelty factor of discovering and learning new games threatens to eclipse enjoyment of good-quality games.

How are players ranked on a leaderboard?

We use the TrueSkill algorithm to estimate a players skill (or rating), measured in teeth (CoGs have teeth, after all) at a game based on the history of their plays of that game. We derive a rating using the recorded performance (history of game placements), and rank players on a given leaderboard (for a given game) based on that rating. Leaderboards can be for a specific league or global.

What is a league?

A gaming league is simply a group of players who are interested in running a leaderboard against one another. It's a useful way of grouping players who regularly play together. The CoGs Leaderboard Sspace can present leaderboards that include only the players of a specified league, or globally. Every player has a primary league noted, and if they have an account and log in, then relevant views will default to the primary league. Otherwise various views can be filtered to show players, teams, games or sessions of one league only.

What's the Leaderboard Space?

A place to track leaderboards for tabletop games. The Leaderboard Space is where many leaderboards for tabletop games live (and are managed and presented). It is, at its heart, a database with a web interface, as are most web sites today.

What is stored in the CoGs Leaderboard space?

We aim to store everything needed to calculate TrueSkill ratings and nothing more. That is, primarily:

Basic player information.
Includes an email for sending notifications if desired, and BoardGameGeek ID if desired, where more player information can be offered/found.
Basic game information.
Only details that help manage leaderboards. A BoardGameGeek ID is stored for each game where possible to provide links to further information.
Session records.
Every game session is time stamped, identifies the game played, the league that it played in, a venue, the order that the players finished in and the portion of the game they played.
Supporting information for those session records:
Basic summary information about leagues, venues and teams (groups of players playing together)

How is a TrueSkill rating calculated?

There's plenty to read about TrueSkill on-line, but in summary:

TrueSkill attempts to assess a player's skill at a given game.
It acknowledges that skill is an uncertain thing, that we are trying to assess (that is, we are unsure of a given player's skill).

Therefore it models skill as a probability distribution (PD) rather than a value, which is one way of saying "a player's skill is probably 'one value' but we're only 'so' sure of that".

Everyone is given the same initial skill PD
Evidence, that is, game results, are used to modify the PD using Bayes' Theorem.
In general the expected value of skill tries to track performance, and the confidence in that value rises with more and more evidence and declines slightly over time (absence of evidence).

And with that in mind, a little more detail, with specific attention to how we at the CoGs Leaderboard Space use TrueSkill:

A rating (in teeth) is assigned to each player for any given game, which is a summary of the PD that TrueSkill has calculated from the evidence.

In practice, the PD is modeled by a Gaussian or Normal distribution, which can be described by two values, a mean and a variance (or standard deviation).

In layman's terms the Normal Distribution is the familiar bell curve (hence it's name "Normal", because it's pretty often seen) and the mean describes where the peak of the bell is, while the variance (or standard deviation) describes the width of the bell. A high variance is a wide bell, a low variance a narrow bell.

The mean can be considered the most likely skill and the variance is a measure of our uncertainty, or conversely our confidence. That is, the lower the variance (the narrower the bell curve) the more confident we are that skill is close to that mean. Conversely the higher the variance (the wider the bell) the more uncertain we are about that mean.

We can't use a bell curve to rank players though. We need a single value to rank players by, some assessment of skill. We could choose the most likely value (the mean) and run with that. But instead, as per TrueSkill's recommendation, we use a value that we are very confident the skill is "at least", i.e. some value below the mean. The standard deviation is useful here.

If we used the mean as the rating, we're effectively saying that we're 50% sure that the player's skill is at least this.

If we were to use the mean minus one standard deviation, we're saying that we're about 84% sure that the player's skill is at least that value.

If we were to use the mean minus two standard deviations, we're saying that we're about 98% sure that the player's skill is at least that value.

If we were to use the mean minus three standard deviations, we're saying that we're about 99.9% sure that the player's skill is at least that value.

TrueSkill by default uses the mean minus three standard deviations to estimate skill (though it's configurable), so the 99.9% confidence level. The skill we're 99.9% sure you at least have!

Every time new evidence arrives (a new game result is recorded) the TrueSkill PD is recalculated (using Bayes' Theorem) for each player who participated, and a new rating determined from the new PD.

How do TrueSkill ratings behave?

There are some TrueSkill parameters that can be tuned either globally or for a given game. On the whole though:

Your rating at a given game will start at 0 (precisely because the initial PD has a mean that is three standard deviations above zero)
As you play games, win or lose, the standard deviation gets lower (i.e. the confidence in our assessment of your skill rises). This effectively raises your rating (because the mean minus three standard deviations goes up) - simply by playing the game and recording the result.

It will rise more quickly at first, and more slowly as more and more evidence is collected (i.e. game results are recorded).

After a given game is recorded some players' ratings will go up and others may go down. Winners tend to steal ratings from losers.

Is TrueSkill perfect?

No, TrueSkill is not perfect. Nothing is. And there are some known issues at present (waiting for you, if you're math whiz with passion and time or funding, to fix). Some of these will allow players to game the system so to speak.

Rating risk:
Because winners tend to steal ratings from losers, there's a definite risk to playing if your rating is high.
So someone with a very high rating may feel inclined to protect their rating by not playing that game any more.
The current leader so to speak on any leaderboard has only one direction to go after all ... down.

Ranking surprise:
TrueSkill introduces two rating trends. The first is that confidence goes up with new evidence.

Every time you play, it gets more and more sure of your rating. This effect is strong at first, the difference between two game results and one is very large and confidence responses appropriately. The difference between three games results and two is a little less, and so on the more evidence we collect the smaller the effect of any new piece of evidence. Which means that the rise in confidence gets smaller and smaller over time.
Over time, without any evidence (between games), skill is known to vary (notably it declines a little without practice, but the bottom line is it varies). To model this TrueSkill reduces confidence a little with every skill update.

Strategic opponent selection:
If you understand the algorithm well, it's conceivable to choose opponents which maximise the likelihood of a ratings improvement for you.
Biased draws:
You'd expect in a draw that all drawing players experience the same impact to their rating.
Alas, that is not guaranteed. TrueSkill can be fixed to secure this, and at the CoGs Leaderboard Space we're giving that some thought (no promises yet).

All these issues can be fixed. Let's hope someone does. In the mean, time we feel TrueSkill is the best rating system available to gamers, and a bulk of the Chess fraternity (through the adoption of Elo) and the Xbox community (through the application of TrueSkill) agree.

Who runs this site?

The Competitive Gamers League of Hobart (CoGs Hobart).

What is the status of the site?

Alpha.

There's an adage in innovation circles that if you release a product when it's ready, you've released too late!

It's a very busy world out there, and time to market is more important than ever! Not that this site has a particularly aggressive market strategy or grand aspirations. It's a service to one small community of gamers in the first instance with readiness to grow if popularity rises and others want to join and help grow it. To wit, we're putting this out there well before it's ready, or anywhere near as complete as we'd like. There's a to-do list longer than any current leaderboard (the tip of the iceberg is listed on github) and it's not getting smaller in any hurry.

But the site works, and is being used for the CoGS Hobart League, day to day, so it's time it showed a public face, so that league members can look up leaderboards and track results.

What technology does this site use?

Aside from TrueSkill, it's built using the Django web framework (written in Python) and running at present on a Raspberry Pi 2 in someone's basement. So if it's a little or a lot sluggish and up-time is not guaranteed, you know why.

What is the history of the CoGs Leaderboard Space?

In 2004 a small group of tabletop gaming enthusiasts started to run the Hobart Winter Games nights under the banner of HoGS, the Hobart Games Society.

In 2012 the group was flourishing and a focus group and event was tabled: HoGS Hardcore, dedicated to competitive strategic gaming.

HoGS Hardcore was a response to two growing trends at HoGS weekly games nights:

A struggle to bring hard core strategic games to the table for head-on competition.
With a steady influx of new gamers and a great diversity of gamers, a slight bias towards more accessible games emerged.
A struggle to replay favourite games
With the incredible growth in the game publication industry and in the gaming community, more and more players were assembling larger and larger collections of games, resulting in a never ending need to table new games. Novelty was eclipsing experience.

HoGS Hardcore was surprising popular, but not to everyone's taste and with some suggestion that it could lead to elitism so HoGS shelved the idea.

In 2014 the Hardcore group reformed as the CoGs League, running a complimentary monthly meet focused on competition and the idea of leaderboards.

CoGs started with a simple spreadsheet and scoring system to calculate ratings, but couldn't settle on something fair that worked across all games irrespective of the number of players. Then they found TrueSkill.

2014 was full of experimentation with TrueSkill. Some software libraries turned up, notably in Python and C#, and a simple TrueSkill server was toyed with in Python

By 2015 CoGs decided to code up a website opting for the Python route and the Django web framework.

By late 2016 CoGs was managing leaderboards on its nascent web site (running internally).

In early 2018 the website was deployed publicly to permit CoGs players to see leaderboards and play around with views and filters.

Can I help improve the site?

You sure can! Please!

To date it's had one developer only who can give it the likes of 5 to 10 hours a month of effort, so the project is openly in need of contributors.

The code base is all on github. Anyone can fork, and contribute. I hope there's enough in the ReadMe to help anyone with some comfort with Python to implement locally, run, debug, and see open issues.

If you're interested in helping the gaming community and especially those who view competition as an important part of the game, we'd welcome your help with this site. To discuss what you might contribute feel free to write to us at to: competitive dot gamers dot hobart at gmail dot com (forgive the transparent effort to hinder spam robots harvesting the email address - we're hoping it works).