I was not sure if I wanted to post this within one of the SMIP announcement threads or within General Discussion. PLease forgive me if you have real training in stats as I am trying to keep things relatively simple.
This is a topic I have been thinking about quite a bit over the last few weeks, naturally kicked off by the new trend we have witnessed coming out of our community. Looking at Metagame data has always been a pet project for many of the most dedicated members of our community. With the coming of Magic Online we have been given access to a source of information which has only rarely been seen in the paper world, whole Metagame Data. That is, analysis of all decks which appeared at certain tournaments. A historical example of this can be found here: http://www.mtgsalvation.com/forums/retired-forums/retired-forums/vintage-news/event-reports/130714-results-melbourne-legends-decklists-50-player
This kind of reporting allowed sharp members of the community to piece apart the metagame to a greater degree than possible with more simple Top 8 only or similar styles of TO reports. It would allow people to look at metagame saturation of individual decks (how many copies of that deck were played) or other similar data points. With the arrival of MTGO and the ability to replay games it has become easier to look at more than just metagame saturation. More and more, we have seen various members of our community delving even deeper into the tournament data.
Working with MTGO (or WER for the more dedicated) and third party software (generally Excel) the trend has become for TO's (or the like to @ChubbyRain & Co for the Online events) to report not only the top performing decks but to also include a more detailed and incricate "matchup analysis". Where after rewatching enough of the tournament to get the copy down who was on what archetype and extrapolate that using the final results of each round to get a total matches played/matches won for each matchup.
There are so many places where you can find this information but for ease of communication I have attached a copy of the NYSE #4 breakdown below.
NYSE #4 is to my knowledge the largest of the recent events to have received this treatment so this is a great place to start.
As cool as the pretty colours and lots of fancy numbers look most of these numbers mean absolutely nothing.
Thats's right. Most of this data is unusable for any real statistical analysis. There are simply too few data points to draw any conclusions from the data presented. Excel is a great program which I used every day for years and years but it is not a fantastic program for the kind of analysis we are trying to do here. Trying to put this kind of data into a more rigid program such as R-Tools really shows the deficiency in our data. For example, between the two largest groups (Gush & Shops) we have only 41 games. As you go down the list it gets worse and worse.
It is also quite poor as Vintage is not in the position it has been in the past with 60+ cards within archetypes being the same. By grouping decks like this we lose sight of any individual changes between the various decks. That being said, that has always been an issue with looking at tournaments as whole entities.
While 41 data points is enough for a binary Y/N kind of thing. Its hard to even ask something as simple as is deck X favoured in this matchup with any real certainty with so little information. As you may know, the higher the number of games you have the more accurate you can expect your data to be. Small sample sizes are more prone to variation from what you would expect. Take a look at the Oath of Druids vs Combo matchup in the NYSE #4 data above. If you were to take this as gospel and you expected your next event to have a lot of storm you might end up leaning toward Oath as a foil to that metagame.
Let me know how that works out for you.
Now I am not asking people to not do this kind of analysis. As I said, I have taken part in this sort of analysis but PLEASE take all of this data with a grain of salt. I don't actually think anyone is seriously taking the above as Gospel but words from the SMIP Podcast had me worried. This kind of analysis should have little to no place in B&R discussion. We simply do not have the information to reliably draw conclusions with any sort of certainty from what we have. Especially if we keep the information restricted to individual tournaments.
I see a few options going forward regarding this kind of analysis.
- Keep doing this analysis but for the love of God keep it away from any kind of B & R discussionin its current form. I fear that this kind of data, as restrictive as it is, changes the focus from winning decks and decks which have an unhealthy metagame saturation to decks which meet various other factors/requirements. The numbers for decks with larger metagame saturation using this kind of data will naturally fall, despite perhaps having a good performance at top tables simply due to the numbers that did not make top 8 (which will generally be more than those that do make it) We saw an example of this in a recent SMIP pocast where despite Gush being 4/8 its MW% was low. This would always have been expected as 30% of the metagame cannot make top 8 etc. I don't point this out to rag on Gush but simply to use it as an example of the possible unintended consequences of changing our policies RE: B & R
With the current set up we can expect to see decks see well metagamed decks that appear in small numbers to hit hit much harder through B&R policy than decks with higher metagame saturation.
Expand the analysis. Working together with the various TO's who have made this data available to smash them up together to get a much broader "Vintage Metagame over Time" analysis. This does have its own issues however as you would lose all sense of metagame changes over time. Even more so than the standard loss of deck individuality in individual tournaments. We would also lose any kind of trend data if we were to rely on this kind of analysis.
Ignore this data.
Now clearly we should not use any single form of analysis as our sole source of data for B&R discussion. We should be using everything that we have at our disposal and working out what is correct from there - not that we have any real power to affect change at the DCI level. I simply want to avoid bad data and bad analysis being used as a soap box for Vintage community outcry - or lack thereof- at various cards/decks etc. Without proper instruction etc, this new rage for 'big data' may do more harm than good in the long run if it is kept in its current form.