The biggest Vintage event of the year just took place and I have ~350 decklists sitting on my desk. I'll be going through all of them as many notable cards like Paradoxical Outcomes, Leovold ("Butters"), Emissary of Trest, and Chandra, Torch of Defiance made appearances outside of the top 8 and there are some rather sweet lists. Rather than make everyone wait, here is what I have done so far:
Top 8 Lists: http://www.cardtitan.com/coverage
I recommend checking out Reid Duke's Paradoxical Storm, David Fleischmann-Rose's Odd Oathstill, and Kurt Crane's "Red Meta", as examples of innovative lists that did well.
*Except for Mukesh Ablack who we assume is on Colorless Eldrazi based on what his opponents told us.
As always, thank you to Ryan Eberhart, my partner in crime, for his considerable help.
Ryan Eberhart (@diophan) and I will be updating this post throughout the week but we wanted to disseminate data from the event as we process it. Currently, we have the metagame broken down by archetypes and subarchetypes.
Top 16 Lists
- Andy Markiton - Ravager TKS Shops
- Tom Metelsky - Grixis Pyromancer
- Ryan Glackin - Amalgam Dredge
- Hank Zhong - Esper Mentor
- Andy Probasco - White Eldrazi
- Roland Chang - Ravager TKS Shops
- Vito Picozzo - Jeskai Mentor
- Brian Kelly - Esper Mentor
- Brian Schlossberg - Ravager TKS Shops
- Jason Jaco - Eldrazi Tribal
- Jordan Kasten - Transform Dredge
- AJ Grasso - Gush Mentor Bomberman
- Nicholas Cummings - Belcher
- Lee Hillman - White Eldrazi
- Matt Murray - Sylvan Mentor
- Nick Dijohn - Smasher TKS Shops
The rest of the lists will be made available at some point, likely on EternalCentral.com.
Major props to Nick, the judges, and the rest of the tournament staff - they did a phenomenal job running the event.
Update: Link to decklists
Save the spreadsheet as a copy and you can make use of search Ryan and I built.
It's pretty clear that players (and Wizards of the Coast) have divergent opinions on the role of data in Magic. My opinion is that data is an imperfect representation of reality. I would much rather have as much data as possible, while understanding the limitations of such data. That said, I think we have reached a point where we as a community might be oversaturated with these weekly reports and the conversations that follow. Ryan and I are both committed to collecting Vintage data from as many sources as possible. What we remain split on is how often to disseminate that data. We could continue to publish weekly reports or we could combine them into monthly reports like this one. Please chime in the comments below.
Top 32 Lists
Xerox is what we are using to refer to former Gush decks - most are Young Pyromancer based, either with Delver or splashing white for Mentor. Some are playing Seeker of the Way... Fish decks refer to Bug Fish, Combo is kind of a grab bag for DPS decks - most combo decks have been absorbed into Paradoxical. Tom's winning Rector Flash list is the source of the 87.5% MWP on 9/23.
Thanks to Ryan for his considerable help as usual. Jonathan Suarez also helped with data collection for the 7-23 challenge and it was much appreciated.
I took the data from the last 3 Power 9s and combined it, looking to see what we could glean about certain matchups. There are several results that contradict commonly held beliefs about the format:
People need to stop insisting Storm and Oath beat Gush.
Here is the metagame breakdown for the May edition of the MTGO Power 9 Challenge:
5-2 or better:
- Collector - Dredge
- Call1Me1Dragon - White Eldrazi
- Montolio - Eldrazi Shops
- Pubert - Grixis Pyromancer
- Mr. Random - White Eldrazi
- BlackLotusT1 - White Eldrazi
- Lexor19 - White Eldrazi
- JdPhoenix - Blue Moon
- Diophan - Jeskai Mentor
- IcyManipulat0r - UW Mentor with Delver
- Ravager101 - DPS
- Footemanchu - DPS
- Deibler - Jeskai Mentor
- Sigaisen - Dark Sylvan Mentor
Metagame Breakdown and Win Percentages:
Note: Mirror matches were not included for overall win %.
Archetype vs Archetype Win Percentages and Sample Sizes (n = number of wins - for instance, Gush decks went 5-0 against Oath)
Subarchetypes as promised:
As always, a major thank you to @diophan , Ryan Eberhart , for his help with this (and condolences on the 9th place finish).
Edit 1: Top 16 decklists linked to above.
Edit 2: Updated the original post with Subarchetypes. We considered IcyManipulat0r's deck to be a "mentor" deck instead of Delver as it had a more expansive manabase (with Sol Ring and Mana Crypt) along with Mana Drains. Obviously, this helps emphasize that classifications are gray rather than black and white and there is no perfect system.
Edit 3: There was a rather odd bug in or google doc that affected a handful of matches in round 4 (for some inexplicable reason, the fill function skipped cell 43 and caused a misalignment of opponents to archetypes for 10 match results or so). This slightly affected the calculated win% against the field and we have updated the initial table.
Since this is still going on, I think it would be beneficial to break this down statistically. I started this as a reply but it reached sufficient length that I decided it deserved its own thread. The link to the original thread is here.
Max in the Shops Mirror
The model that best approximates a 'coin-flip' scenario in which there are two outcomes, determined by luck, with probability p, repeated n times, is a binomical distribution. Let us apply this to Max's experience with the mirror. We have the following parameters of n = 16 games and p = 50% or 0.5 (since it is a 'mirror'). Ignoring skill, Max should win
u = n * p = 16 * 0.5 = 8 games
Let's stop and double check that the result makes sense. You flip 16 coins and on average there will be 8 heads and 8 tails. Moving on to variance from that mean, we use the equation of
Var(sigma^2) = n*p * (1-p) = 16 * (0.5) * (1- 0.5) = 4.0 games
The standard deviation is typically more useful than variance and determined by taking the square root of the variance. The standard deviation is
Std Dev (sigma) = sqr rt(4) = 2
The final breakdown is:
Max's 15 wins in 16 matches is much higher than the 50-50 an average Shops player would obtain. Therefore, it would be pretty reasonable for a casual observer to say Max is an above average Shops pilot based on these results.
Max against the Field
There are two ways to establish the probability that can be used in these models. The first is theoretically derived, like we did in the first section. In the mirror the cards are assumed to be the same or close to the same, so if player skill is ignored, the theoretical probability of winning is equivalent to losing, or 50-50. However, if the cards are substantially different, i.e players are playing different decks, it is much more tenuous to assume a 50-50 win-loss record. You can make an argument for it: the tournament structure is such that a loss for one player equals a win for another, so the overall record of the field must be 50%. If you do so and exclude Max's Shops matchups, you get
Max's actual number of wins, 66, and actual match win percentage, 78.5%, are much higher than what we would expect given a 50-50 win rate. It would be pretty reasonable to conclude that Max is an above average player with Shops against the field, too.
How can we 'science' up the above conclusion?
Science is conducted through the scientific method: you make a hypothesis, conduct an experiment, then reject or accept the hypothesis based on the results. "But Max didn't have an hypothesis." In many cases, data are collected before an actual hypothesis is made. The default position is that of the null hypothesis, that there exists no statistical difference between two groups of data. Put in the context of this experiment, we are essentially taking the position that there is no statistical difference between Max's results with Shops and the theoretical results of an average player with an average deck (average defined as 50% win rate). That is, Max's results happened purely through chance and neither skill or deck selection played a role.
Rejecting the Null Hypothesis (Confidence Intervals)
Max has collected his data - Now we have to determine whether or not Max is good or lucky. And honestly, we cannot know for sure. If you flip a coin 10 times and it comes up heads 10 times, would you conclude that this was luck or something nefarious was in play? The odds landing on heads 10 times is theoretically (0.5)^10 or 1/1024. Alternatively, the coin could be weighted so that it almost always comes up heads. Both are possible, right? There is that one-in-a-thousand chance and weighted coins exist. Granted, in this example the coin is severely disfigured and would be readily apparent that it was doctored... Still, if someone says to you "I just flipped 10 coins and had 10 heads" with no additional information, what should you believe?
This brings us to confidence intervals. We know that with any type of probabilities, there exists a range of theoretical outcomes and that certain outcomes are more likely that others. What we have to determine is our threshold for error or alternatively our confidence in the results. Luckily, we did most of the work already by calculating the standard deviations. There is a statistical rule called the 68-95-99.7 rule that states the likelihood of a certain result falling within one, two, or three standard deviations of the mean. Those ranges are given in the above charts. If the above games were played by an average Shops pilot, there is a 68% chance that the Shops pilot would win between 6-10 games, a 95% chance they would win between 4-12 games, and a 99.7% chance that a Shops pilot would win between 2-14 games. Max won 15 games and so his odds of being an average Shops pilot based on this data set are <0.3%.
Dividing the number of wins by the number of games played gives us a match win percentage that allows us to compare different sample sizes. Doing so shows how win rates can vary dramatically based on limited results (and why looking at small sample sizes is unreliable, like @Timewalking suggested). We would expect an average Shops pilot to win 37.5-62.5% of their matches 68% of the time, 25-75% of their matches 95% of the time, and 12.5-87.5% of the time over 16 matches. Conversely, the confidence intervals for 84 matches (the number of matches Max played against the field) are much smaller: 44.5-55.5% for one standard deviation, 39.1-60.9% for two standard deviation, and 33.6-66.4% for three standard deviations. Statistically, more data is always better. Max won 78.6% of these matches, so again, it strongly suggests that Max is an above average Shops pilot and/or that Shops is an above average deck.
By convention, those in the medical field (and many other fields) tend to use the cutoff of 95% (2 standard deviations) as a statistically 'true' result. Max is well beyond that, so we can statistically conclude what most of us already concluded - that Max is not an average Shops pilot. We have a higher degree of certainty, of at least 99.7%, but it's much simpler mathematically to stop here for now.
What other meaning can we derive from the data?
There are really two other questions/observations that emerged from the thread concerning Max's article.
- Does Max's higher win rate in Shops mirrors (94% vs. 79%) suggest that Shops is actually a weaker deck against the field?
- Does Max's 81% win rate in total and 79% win rate against the field suggest that Shops is an above average (or good deck) in the metagame?
Let's start with the first as it is easier to address. The argument assumes that skill in one matchup is transferable to another, that the Shops mirror is inherently a 50-50 matchup, and that since Max won at a higher rate against Shops than against other decks, the skill-independent MWP of Shops is below 50% (making it a 'bad' deck). While considering assumptions is really important in interpreting data, it actually doesn't matter much statistically. The numbers are what they are: Max won 94% of matches against Shops in 16 matches and 79% of matches against non-Shops decks. The question is whether or not this discrepancy is real.
Is there a statistical difference between Max's results against Shops and Max's results against the field?
The second way of determining a probability (and by far the most common) is to do so experimentally. We don't know how many matches Max should win when we factor in his skill and his deck selection. How good is Max? How good is Shops? How good is Max with Shops? Again, we don't for sure, but one thing we can do is have Max play a bunch of matches with Shops to give us an experimental value for his win probability. Well, Max already did that so let's use Max's win rate against other decks as a starting point. Max won 66 of 84 matches, for an experimental probability (P because I don't know how to add a circumflex to the letter p) of ~79%. What is our confidence interval for this value? Well, there are several ways to calculate confidence intervals of experimental means based on sample sizes. Easiest one to use is a normal approximation interval or Wald method where the range is:
The constant z depends on the desired confidence level - for 95%, z is 1.96. Punching the numbers in, we get an experimental probability of 0.79 +/- 0.09 and a range of 70-88%. Max's win rate of 93.5% is outside of this range, implying that there is statistical significance in the discrepancy between the Shops mirrors and matches against the rest of the field.
Does a statistically significant result actually tell us what we want to know?
Now it's time to look at our assumptions. We assumed that
- Mirrors are inherently 50-50.
- Skills with a deck are transferable between the mirror and other matchups.
- Skill differences affect outcomes in other matchups to the same degree .
I can poke holes each of these arguments. The first assumption is that mirrors are inherently 50-50, but that ignores the fact that 'true' mirrors are relatively rare. Most decks are not 75 card copies of each other, and most classification schemes lump similar decks into the same archetypes. For Shops, this includes Ravager Shops but Stax, Rod, and other variants. Ravager Shops tends to destroy these other versions, which is part of its dominance within the metagame. Foundry Inspector breaks the symmetry of Sphere effects and is unaffected by Null Rod, the threat base is wider and lower to the ground (i.e. many creatures that can be cast cheaply), and the mana denial is much more effective against other decks with higher mana curves. Max went at least 5-0 against these 'mirrors' which arguably should be considered decks. If one assumes that the remaining 10-1 record was against other Ravager decks, that gives a win probability of 0.91 +/- 0.17, or a lower limit of 74%. This result is no longer statistically significant.
For the second and third assumptions, Max and I both stated that we thought the mirror tested different skills and was very skill-intensive (i.e. that the skill discrepancy with a deck went a long way to predicting the winner). The Ravager mirror does have blowout potential but many games develop into complicated board stalls with key pieces such as Walking Ballistas, Arcbound Ravagers, Steel Overseers, and Hangarback Walkers shutdown by Phyrexian Revokers. Oh, and Metamorphs, Wurmcoils, and Precursor Golems providing powerful threats to be navigated. Complex combat math is arguably the most valuable skill in the mirror, with sequencing less important. These types of scenarios are uncommon in other matchups and the combat math is much more simplistic as most opposing creatures provide few decision trees (most creatures are vanilia x/x's like tokens and creatures with abilities tend to be static like the lifelink of Griselbrand or triggered and predictable like Inferno Titan). Sequencing is more important for the Shops pilot who assumes the proactive role. Skill from the other side of the matchup is also minimally interactive - as Max said, either the opponent kills all your threats or deploys a massive trump like Blightsteel through Spheres and mana denial, or they don't and die. That is more draw and die-roll dependent than skill based.
I think that statistically significant results in this case point to a couple of possible conclusions. First, I think the most likely explanation is that the Shops mirror tends to be less variable than other Shops matchups. This doesn't require assumptions about the transferability of skills from one matchup to another. It actually assumes the opposite of assumption #3 in that it assumes matchups are influenced by skill to varying degrees. Max reached this conclusion as well. I think it is less likely that Shops is weaker than other decks in the field, because we have more premises that I find hard to logically accept to reach that conclusion.
Does this article indicate that Shops is an overpowered deck in the metagame?
Short answer is "No". That type of question is much better answered by our metagame breakdowns. Again, more data is better and you mitigate issues of player skill by having a much larger sample size. Applying the same statistical tests to this most recent Vintage Champs gives a win rate of 59% (+/- 5%). In this sample size of 404 matches played by 72 players, it's pretty statistically clear that Shops is a good deck. Is it the best deck? Oath is the closest of the other archetype with a win rate of 55% (+/-6%). Those confidence intervals overlap, so you can't statistically claim that Shops is the best archetype. The answer of course is "more data". When you look at results from the Vintage Challenges and other tournaments (taken collectively), ideally it paints a consistent and accurate picture of reality. That's how science works...you do radiometric dating of a bunch of radioactive minerals and when many different labs reach a consensus of 4.5 billion years old, that's what they put in the textbooks. Would people be interested in a large scale analysis of all available metagame data (in essence, a meta-analysis or the strongest form of scientific evidence in medicine and other areas of science)? I am willing to do this, but I would like confirmation that players would be receptive to the data.
Alright, back to one 100 match set played by one player. We can agree that Max's skill has skewed his results away from that of an average player. The question is what additional component arises from Max's deck selection. Again, we have to make various assumptions. We don't know Max's 'true' win probability with other decks, but he has stated that he has won roughly 70% of his matches in PTQ's. If we accept this figure as accurate and assume that this MWP is transferable to Vintage, and assume that PTQ's are comparable in level of competition to Vintage leagues, then we can use this 70% value as a theoretical probability. In this case,
The confidence interval has an upper limit of 79, which suggests with 95% certainty that Max's results are not just a product of variance. He won 82 games. If you exclude Shops decks, you are at the edge of statistical relevance (remember our confidence interval from that data set was 70-89). If you exclude true Ravager Shops mirrors and include Shops variants, you are back above statistical significance with a confidence interval of 72-88 MWP. Given the proximity to the limits and the assumptions required, I would not personally conclude from this that Shops is an above average deck in the metagame.
Hopefully this type of data analysis was informative and accurately conveys some of the challenges with regards to interpreting data. Questions and comments? Please, let me have them.
Number of Participants: 114
Top 8 lists
- Joe Brennan - Jeskai JVP Mentor
- Zohar Bhagat - Jeskai Nahiri Control
- Brian Kelly - Dromoka Gush Oath
- Brad Gutkin - Blue Moon
- Shawn French - UW Landstill
- Ross Pranjzner - Dark Jeskai Mentor
- Paulo Cesari - Jeskai Mentor
- Nick Dijohn - "Car Shops" (aka. Get outta my dreams, get into my car)
Rest of the X-2+
- Porterfield, Avery - Jeskai Delver
- Geras, Jonathan - Ravager TKS Shops
- Castrucci ,Sam - Ravager TKS Shops with Cruisers
- Lynch, Paul - Salvagers Oath
- Fleischmann-Rose, David - 4C Odd Oathstill
- Sees, William - UW Emrastill
- Dayton, William - Ravager TKS Shops
- Eberhart, Ryan - Grixis Therapy
- Ata, David - Ravager TKS Shops
- Sacino, Joey - BUG Fish
- Miller, Daniel - Moat Control
- Barkon, Daniel - Ravager TKS Shops
- Dail, Ryan - White Eldrazi
- Dobbin, Zach -Jeskai Mentor
- Waldron, Scott - Salvagers Oath
- Johnson, Richard - Salvagers Oath
- Difebo, Dominic - Jeskai Mentor
Metagame and MWP against the field:
Archetype breakdowns and MWP against the field
Note: Gush Oath decks were classified under the Gush archetype. Frankly, there is no perfect way to classify lists and it is unclear to us to what purposes you the reader apply these breakdowns. If you are concerned about B&R rationale, a very relevant question is "what percentage of the metagame is running Gush" and that is the question we chose to answer by putting them into the Gush archetype. Alternatively, if you are a Shops or Eldrazi player concerned with the Oath matchup, you would want to consider those 3 lists as part of the Oath archetype. The actual effect is small: plus or minus 3 out of 114 players is a 2.6% change and the MWPs are virtually identical at 61.0% (Gush), 62.5% (Oath), and (61.1% Gush Oath). I wanted to mention this in the interest of transparency since we only have one basket in which to drop these deplorables.
Archetype vs Archetype win rates
Mistakes? Typos? Comments? Let me know. As always, thanks to @diophan for his help on these endeavours. Additional thanks go out to @EEMagic for running an excellent event at a great venue and supplying us with the lists and WER files. Please support them by attending EE6 or watching the coverage if you can't make it.
At the request of Andy, I'm reposting here:
The restriction of Thorn of Amethyst and Monastery Mentor took effect September 1 2017. That means we are over 6 months out from that restriction and I think it is worth looking back at how effective those moves were. As many of you know, Ryan Eberhart (@diophan) and I spend quite a bit of time collecting metagame data from major paper events and the MTGO challenges. We do this for a couple of reasons. Personally, I use this data when it comes to creating new decks. The version of Snapcaster Control that I've played in the last three challenges was heavily influenced by what I saw from our challenge data. The prevalence of Shops and Planeswalkers in both Oath and Xerox (i.e. cantrip heavy blue decks) motivated me to shift the removal suite to Lightning Bolts and Fiery Confluences instead of Swords to Plowshares and Balance. This was further justified by an absence of Eldrazi and Merfolk decks in the format. Honestly, that was my primary reason and hope when we started collecting data: that what we gathered would be used to promote innovation in a small format like Vintage.
Alternatively, Ryan and I wanted to provide an accurate picture of the Vintage metagame for use in discussions involving the Restricted list. Much of what we read previously tended to be hyperbolic, opinionated, and poorly reasoned. We hoped people would use our data in forming conclusions like scientists or researchers. In both cases what we wished to happen didn't actually happen. Most responses to our posts consisted of hyperbolic, opinionated, and poorly reasoned arguments, just now with cherry-picked data. There was very little commentary on trends and how to combat them, no brewing of decks. We went from posting results weekly after each Challenge, to monthly aggregations of the previous month's events, to not posting or gathering data from February. In effect, Ryan and I burnt out, on both playing Vintage and collecting data about the format. We asked for help and nothing really materialized. The reason I'm bringing this up is that I don't know if we will continue this in the future. So if you do find this beneficial, please let us know and considering helping out if you play Vintage on MTGO. The Challenges continue to be excellent EV, with the top 32 (basically any 3-3 and several 2-4s) making their entry fee back. Power is affordable - a set of VMA Power 9 costs less than 100 dollars. Complete decks range from 120 tix for Dredge, 300 tix for DPS, 500 tix for Ravager Shops, and 700 tix for UWR Mentor or UW Landstill. Which serves as a pretty good segue into the next section...
Paper vs Online Metagames
We hear a lot of comments concerning real or perceived differences between these two metagames, often in the context of B&R discussions. While I appreciate that players may play Vintage in widely divergent paper metagames, that doesn't invalidate data collected in other metagames. At the end of the day, the DCI is going to base their decisions on the data they have available. This likely is limited to the large sanctioned events of European, North American, and Japanese Eternal Weekends, along with the results from MTGO Leagues and Vintage Challenges, so that's where we've focused our efforts. And, frankly, the MTGO metagame has several advantages compared to paper Vintage. The cost of decks is lower, even considering proxies, which allows players more freedom in deck selection. Events are more frequent and typically larger than their paper counterparts. We are looking at four 40+ person events whereas a local tournament may have one monthly event with 17-32 players. This gives us a much larger sample size from which to draw conclusions. And finally, players on MTGO tend to do very well in paper tournaments such as the 2018 North American Champs. Winner Andrew Markiton (MTGO: Montolio), finalist Rich Shay (The Atog Lord), Patrick Fehling (Clone9), Brian Kelly (brianpk80), and Eric Vergo (caggii) all are regulars on MTGO.
Before the Restriction
The Gush and Gitaxian Probe restriction took effect April 24 2017, so we used the May through July challenges to establish a baseline. Individual events can be found by searching TMD, but the compiled data is available here.
Following the Restriction
We changed our spreadsheets when we moved to monthly reporting. It allowed us to do a month-by-month breakdown of events. Note: February's metagame breakdown is drawn from the Top 32 results of that month's challenges. As mentioned previously, we didn't do our usual data collection for that month. Also, January is missing an event in which Ryan and I were unable to participate.
As can be seen by the monthly breakdown, October and November are dramatically different from the other months. The most likely explanation is that the proximity of these events to the North American Vintage Champs altered player attendance and behavior. North American Vintage Champs was October 19-21 and Ravager Shops was absolutely dominant. It met in the finals, won the tournament, placed 5 decks in the top 8, 11 decks in the top 32, and had a 58.9% win rate against the field. Yet on MTGO, Shops portion of the metagame actually fell. Among many players, there was concern that a Shops restriction was imminent, so they played other decks, leaving Shops as the "best deck" primarily played by those unfamiliar to the format. Many established players flocked to the deck that supposedly "beat" Shops, Inferno Titan Oath, as Oath's percentage of the metagame tripled from 6.4% in September to 19.0% in November. And still others went next level the various Mentor/Xerox decks that tend to beat Oath. Those decks put up an impressive 63% win rate in October and November. Now I'm not in the habit of ignoring data, but data should make sense. If it doesn't, you have to wonder what factors might be influencing or introducing bias into your study.
If you exclude the October and November like we did above, there is a remarkably consistent picture of Shops' dominance.The combined results show a 59.0% win rate, virtually identical to the 58.9% win rate at Champs, slightly decreased from the 59.2% win rate in the pre-Thorn metagame. The metagame share is slightly decreased but trending upward. These trends seem to hold so far in March, as you can see below. Shops has a 31.4% metagame share and a 62.1% win rate. In my opinion, the results from October and November appear as outliers rather than a true indicator of Shops place in the metagame. However, one of the reasons to write these in-depth reports is to solicit differing opinions, similar to peer-review. I invite whoever is so inclined to chime in below with their thoughts. If you feel this is some sort of adaptation by the rest of the metagame, I am curious to hear what you think that was and why the metagame revert back to its previous state.
What beats Shops?
For those that do not follow other formats, Standard underwent several bannings in January. Ian Duke's explanation of those bans is well worth a read as it provides useful insight into WotC's reasoning and approach to B&R decisions. Ian spends quite a bit of time discussion the matchups of Standards top 2 decks, Temur Energy and Ramunap Red, and how these decks have a favorable matchup profile against the field, suggesting that the metagame is unable to adjust. Let's take a look at Shops' matchup profile since September:
With November and September removed:
Ironically, Shops only "bad" matchup (and I admit to being a bit lazy with the statistics here - if you want the raw data and the sample sizes, it's here), is the "Other" category where we throw decks that don't fit into other categories. Apparently, the Monored Hate deck with Null Rod and Ensnaring Bridge went 5-0 against Shops in October and November... Outside of what are essentially rogue decks, Shops either has a good matchup of >55% or is essentially even (between 45% and 55%). This includes Oath, which is Shops' worse matchup but only at 47.5%. Decks that were traditionally thought to be good matchups, like Dredge and Landstill (the most popular variant in the"Blue Control" category) actually end up struggling against Shops.
The goal of this post isn't to propose specific actions: it's to establish the need for such action. The previous restriction of Thorn of Amethyst has not discernibly altered the win rate or metagame share of Shops in the MTGO Vintage Challenges or in the NA Vintage Championship. If such action was indicated then, it holds that additional action is indicated now. Of course I have my own opinion on what I think should be done. However, I want to allow some time for players to read and process this. Comments, thoughts, and opinions are welcome and encouraged.
Edit: Added Archetype vs Archetype Win Rates with the Champs months removed.
Data from the month of March
Major props to twitch user k0dydraven who has figured out a way to import round results from mtgo, saving us a lot of time.
Edit: Also props to Ryan @diophan XD
There has been quite a bit of change recently in the Vintage format. Wizards has been more active in managing the restricted list and printing powerful Eternal relevant cards like the Delve spells, Dack Fayden, and Monastery Mentor. After the restriction of Lodestone Golem, I wanted to take this opportunity to look at how the metagame evolved following the removal a key card. I felt that while players understandably will have different views on what the Vintage format should be like, we should also have as much information as possible available to us that we can use to construct informed opinions and arguments going forward. Ryan Eberhart (aka @diophan) and I have been collecting and disseminating data from MTGO Power 9 events, but we have also been collecting data on the Vintage Dailies and paper tournaments around the world. I would like to share with you now the data we have collected on the MTGO Daily Events since the Lodestone Golem restriction took effect on April 13th (paper results will be following shortly).
We have classified decks according to the following archetypes and broken them down further into sub-archetypes in an effort to more accurately convey the metagame.
- Gush - If Gush was a primary component of a deck's gameplan, it was put into this category. We then broke this down essentially by win condition: Delver, Mentor, Pyromancer, Combo (Doomsday and Gushbond), and Other (Thing in the Ice or Vault/Key/Tinker, mainly).
- Shops - The Shops archetype was obviously hit hard by the restriction of Lodestone Golem and went through quite a transitional period. Over the last three months, the archetype has reestablished itself by turning to Thought-Knot Seer as a replacement for Golem. The most successful build has been the Ravager TKS deck though other lists have incorporated TKS and put up results. A third category includes the non-TKS Shops lists but these have been a minority of lists and slanted towards April.
- Eldrazi - An archetype that emerged from the LSG restriction, the most popular variant of the archetype has been White Eldrazi which pairs the colorless creatures with White Hatebears like Thalia and Vryn Wingmare. A minority of decks have fully embraced the tribal element of Eldrazi, i.e. Jaco-Drazi.
- Dredge - Divided by sideboard strategies based on whether they intended to combat opposing hate head with Creature, Enchantment, and Artifact removal or Transform post SB. The former approach remains the most popular.
- Combo - Predominantly Dark Petition Storm but also a few Belcher decks and odd-balls (like Two-Card Monte and Rector Flash)
- Blue Control - The more controlling remnants of the Mana Drain pillar like Landstill in various colors and Blue Moon.
- Big Blue - Less controlling artifact-based combo decks like Control Slaver, Painter-Grindstone, Academy combo.
- Oath - If it contained maindeck Oaths, it found it's way here. Variants include Salvagers Oath, Control Oath (Fenton Oath with Griselbrand as the primary win condition), Combo Oath (i.e. Burning Oath), Oathstill, and other Oath (odd Oath).
- Null Rod - The various Fish decks that have historically belonged to the Null Rod Pillar. These types of decks are almost nonexistent on MTGO but include BUG Fish, Hatebears (White Trash and 5c Humans), Merfolk, and Other (in this case, a monored 8 Moons deck).
We kept track of 4-0 and 3-1 finishes and used these to create a category called Total Wins ( # of 4-0 finishes * 4 + # of 3-1 finishes * 3). This more heavily weighted the 4-0 finishes, from which we calculated the % of Total Wins for that archetype/subarchetype. Comparing the totals reflects performance - a positive Delta % Total means the deck disproportionately put up 4-0 finishes. However, the sample size is not really large enough to infer much from this.
There is a function in Google Sheets that allows you to count unique entries within a data set. We used this to calculate the number of unique players both overall and within archetypes/subarchetypes. Over time, you would expect the majority of MTGO Vintage players to put up a finish so this is a rough indicator of the total pool of MTGO players that participate in these events. It also helps to remove repeat performers like Rich Shay or Montolio as they can potentially skew results for certain archetypes. It should be noted that players can switch archetypes/subarchetypes so some players will be counted twice or more as you breakdown the data.
That out of the way, let's get to the results.
The true value of this data in my opinion is how the different archetypes and sub-archetypes have changed over time. Ryan and I broke down these results by week and displayed them on several graphs.
As we can see, the trend of a declining metagame prevalence for Gush has not continued (did anyone aside from @Smmenen think this would be the case?). Metagames tend to be cyclical by nature - people build their decks to combat specific decks and that focus shifts with time. Gush was the clear target that emerged from the Lodestone Golem restriction and decks adapted to combat Gush, with a surge in Sudden Shocks, Sulfur Elemental, Thorns, and Defense Grids. As the field diversified, the narrower hate-cards were supplanted by more broad removal (you don't want to be holding a Sudden Shock against a resolved Thought-Knot Seer) and Gush decks themselves diversified to dodge the hate with these decks turning to Tendrils, Pyromancer, and Thing in the Ice. At its heart though, Gush is a control deck with a powerful card advantage engine - it just needs to draw into the right cards for the field. A key development was the adaptation of Cabal Therapy and Baleful Strix by Grixis Pyromancer (and ultimately Esper Mentor) as a means of competing with Eldrazi, Cavern of Souls, and the broader field. This has lead to a resurgence in Gush, decline in Shops and Eldrazi, and ironically the metagame percentages have returned to roughly the same percentages as the start of April. It remains to be seen how the metagame will adapt but I hope this look at it has been interesting. Keep in mind, all statistical work is subject to variance and the samples sizes are low (though we have a comparable number of lists to Paper over the same time span). Questions? Comments? Suggestions? Have at them and I hope we can get a good discussion going.
Correction 1: We noticed an error in our calculations that affected the Sum of 3-1 Finishes (it did not count Eldrazi) and as a result, the percentages were high. We've had an issue with Google Sheets where the formulas we write do not appear to "fill" properly, randomly skipping certain cells...This could be an issue with us simultaneously trying to edit a sheet. This specific instance could have been human error (aka I screwed up), but we really don't know. In any case, the best thing to do is post a correction explaining the error and fixing the data. The first table has been updated and should be correct now. Other charts were unaffected as they did not use the "Sum of 3-1 Finishes" in the calculation.
Bah, it's four in the A.M. Here's some data.
Top 8 Results
Of note, the Paradoxical lists with the asterisks contained Monastery Mentor as their primary win conditions. Neither list was the standard 4X Mentor Outcomes, but rather eccentric lists from players such as @iamfishman and @brianpk80. These lists contained 2 Mentors each, with Brian running one in the SB "to come in against other Mentor decks". If counted as part of the Mentor archetype, that brings the total up to 28.4%.
This breakdown is based on archetypes and not tags. The actual percentage of Mentor in the metagame is slightly higher. Unfortunately, data collection has been inconsistent as Ryan and I have missed events and relied on others to help us. It's a significant amount of work and we are immensely grateful to @desolutionist and others for that help. However, we don't have tag breakdowns available for every event. The percentage of Paradoxical Mentor is typically low, normally 2-4 players on a given day. A reasonable approximation would put the percentage of Mentor at about 25% of the metagame, or on par with Shops. Again, these decks aren't necessarily focused on Mentor and the number of copies is pretty variable, but since Mentor is currently on the community's radar for restriction, it's best to keep the metagame saturation in mind.
Archetype vs Archetype MWP
The color codes are as follows. For win rates, green corresponds to >50%, red to <50%, and yellow to =50%. The problem with establishing a range is that the sample sizes (and therefore uncertainties) aren't consistent. We expect much more variance from the 3 matches Eldrazi had against Blue Control than the 171 matches Mentor had against Shops. The second table shows those match breakdowns, with green corresponding to >50, yellow 25 to 50, and red <25 matches. I'll leave it to the readers to discuss the implications of this.
Trends in Major Archetypes
Just to show how variable win rates and even metagame percentages can be on a weekly basis. Also, it looks that Shops is trending upwards in metagame percentage while Paradoxical is trending down.
It's 4:00 am. I'm going to bed.
Story Time... Brian Kelly and I were sitting at his dining room table playing our decks for a nearby tournament the next morning. We are both on the "usual" Blue decks we tend to play, though I forget the actual decks, the actual tournament, and whether or not this was before Brian's notorious "Dromoka + 3 Force of Wills" Champs victory. It doesn't matter the exact context as I'm just trying to tell an interesting anecdote that introduces the major theoretical premise of this deck. So back to our dining table Vintage match, Brian and I both keep our 7 and playout the first couple of games. At one point, I attempt to resolve a Jace, the Mindsculptor and Brian responds with a hardcast Force of Will. I Flusterstorm back, and Brian (as those who have watched him stream can imagine) complains a bit before casting Force of Will, pitching Dig through Time. "The reason I am losing this game is because I drew two copies of Force of Torach." I smile and say, "Your problem is that you insist on playing only best Blue cards, so the cards you are pitching are always good. You need to run more bad Blue cards to pitch to your Forces." I then exiled a Spell Snare to Force Brian's Force. Now, I'm not saying Spell Snare is a "bad" card. It is however a situational counter with no obvious application in that game state. It is not Dig through Time, or Ancestral Recall, or Brainstorm, or any of the powerful, now restricted Blue cards that make Blue the best "color" in Vintage.
We play Magic in what is a very Blue format. This point comes up frequently when discussing potential restrictions and I think it is something that is worth examining. There is no question that many of the best cards in the format are Blue. Ancestral Recall, Time Walk, the Delve spells, Gush, Brainstorm, Ponder, Dack Fayden, and Jace the Mind Sculptor create a very strong incentive towards playing Blue decks. I think it is actually impossible to negate that. If you look at Legacy, the format is just as Blue. According to MTGGoldfish (which as a metagame tool is flawed due to the overrepresentation of League decks - though the most recent changes in data reporting by Wizards have yet to really impact the 1 month window that MTGGoldfish uses), Brainstorm is played in 61% of Legacy decks. In Vintage, Force of Will is played in 62% of decks. Both formats are over a majority Blue. They are so Blue in fact, the Eternal Blue Cabal can break a filibuster in US Senate. However, while there is a path in Legacy to create color parity, such a path does not currently exist in Vintage. Wizards of the Coast could chose to ban Brainstorm, Ponder, Probe, Daze, Force of Will, and any number of Blue cards until Blue decks are part of a pluralistic Legacy format. It would be unprecedented for Wizards to ban any card in Vintage based on power level or metagame balance. The result is the "Blue Skew" is very likely here to stay.
The question in Vintage then becomes, "How do we lessen the skew assuming that such a thing is desirable for the format?" The answer then is to look at what factors exacerbate the skew. And I think we've already mentioned the most severe one: Force of Will. Force of Will is a card that heavily incentivizes you to play more Blue cards. By running Force of Will, you are constraining your deck-building options towards running more Blue cards. The first Google result pulled up 18 Blue cards as the minimum number of Blue cards necessary to reliably cast Force on turn 1 (34.5%). But relating back to the conversation I had with Brian, you actually want to run more. You don't want to pitch Ancestral Recall to Force. You don't want to pitch Time Walk or Tinker or Treasure or Gush. Plus, many decks are control decks and might want to Force of Will multiple cards in the early game. That requires a higher number of Blue cards. If you look at my Mentor deck from the challenge, I am running 30 Blue cards and I can tell you definitively that Force of Will match factors into my deck building decisions. Butakov's Mentor deck is at 27. Ecobaren's PO list and PsiVen's Oath list are both at 23 Blue cards but they are not really control decks - they don't want to be casting multiple Forces during the course of a game.
Now, let me put this in bold letters: I am not advocating or campaigning for a Force of Will restriction. I merely want to draw attention to a perceived constraint of the format, one that certainly affects the diversity of the metagame. Consider Brian's Championship deck - the most common remark about it wasn't about Dromoka but about the 3 Force of Wills. LSV stated that he couldn't wait to play with the deck after he cut a Probe for a Force. And it wasn't just LSV - many Vintage players believe they should be running 4 Force of Wills in their non-combo Blue decks. How else do you stop <Insert broken turn 1 play> here? Personally, I am unsure, but much of brewing is about examining and testing these types of perceived metagame constraints. At the time Brian played his Oath deck, it was considered pretty insane not to just Oath into Griselbrand (or occasionally Emrakul - Dromoka wasn't on the radar). Brian's work with the deck broke that constraint and led to a more diverse format. So when @NBA84 wrote his deck tech on Blue Jund, I lept at the chance to run it through MTGO leagues and tinker with the particulars. To start off, I recommend reading what @NBA84 wrote and giving them an upvote to show appreciation for their time and effort. Here are the insights I gleaned playing those 33 league matches.
Chubby's Blue Jund
Down to 17 Blue cards. In my defense, Misstep is basically colorless in this deck and it's not my fault Wizards decided that Leovold needed to be the BUG Commander Tiny Leaders was missing... One thing I wanted to mention is that I reworked the mana base that NBA was using. The first round I played with the deck I ended up not liking double Blue spells like Jace in the deck. It made the deck more dependent on Deathrite Shaman for color fixing while requiring you to play multiple Volcanic Islands that make a turn 1 Deathrite harder to achieve. Now this could be an unfair criticism of the deck based on a small sample size. However, I made the decision so that I wanted to have the colored mana for any card in the deck off of two lands. Underground Sea and Taiga let you cast Abrupt Decay, Dack Fayden Kolaghan's Command, etc. Also, yes...There is a Taiga in the deck. That's why. Stop making fun of me for playing a Taiga in Vintage. There is no combination of 2 lands that lets you cast a double Blue spell. The resulting mana base was excellent and I very rarely had difficulty casting spells with the deck.
Dark Confidant is a card that gets so much better when you cut Force from your deck. I was playing it in Blue decks and it seemed to keep killing me against Shops or Pyromancer decks or really anything with creatures, which is not a good place to be in Vintage. Cutting Forces, Gush, Jace, etc... reduced the converted mana cost of the final list to 1.2. I never died to my own Bobs in those 33 matches, and actually added the Treasure Cruise late in testing because I was taking so little damage off my Bobs. There might be more CMC space to play with in this deck. Dig and Jace are problematic colorwise and the deck can't support Gush, so I'm not sure what card that would be.
Two-for-1's galore! Since you are omitting some of the more powerful and broken spells in the format, the deck is built around generating incremental value. You have to accept that occasionally the opponent Forces through a Jace, Dig, Cruise, or Ancestral. But having so many 2-for-1's lets you catch back up rather quickly. Dack, Dark Confidant, Snapcaster Mage, Leovold, Ramunap Excavator, Sylvan Library, the restricted cards, Ancient Grudge, and Kolaghan's Command are all 2-for-1's that facilitate this game plan to a certain degree.
Cards included that were not in NBA84's list (yes, I know that many of these are cards NBA84 mentioned as being purposely excluded. I have had different results with my testing):
Dack Fayden - Dack is Whack. No, seriously...he is really really good right now. This color combination has a very difficult time dealing with Hangarback Walker since most of our removal destroys permanents. Outside of the Shops matchup, Dack provides the card selection we lack from eschewing Preordain in our list. However, my preferred mode is just stealing a random Mox. It means that even if Dack is removed, he was still a 2-for-1, and with 4 colors in the deck, it's pretty likely that Mox will work for you. Relevant synergies include fueling Deathrite Shaman, Treasure Cruise, and Ramunap Excavator, combining with Leovold to make the opponent discard a card each turn, and clearing two dead cards from the top of your Sylvan Library. While I understand that theoretically Dack lacks synergy with Null Rod, I haven't had the same experience. Against Ravager Shops, Null Rod nullifies Ravager and Ballista, which allows Dack to steal the biggest threat on the board and then hide behind it. Against non-artifact based decks, Dack lets us discard Rod and Grudge, which is a rather loose definition of synergy but has certainly been relevant in practice.
Green Sun's Zenith - Green Sun's Zenith has been very strong, even without a dedicated package for it. Functionally, it provides additional copies of DRS, Leovold, and Ramunap, and you only take 1 damage when you flip it to Bob. At the same time, it makes Black Lotus better. It's very hard to cast a turn 1 Leovold off of a Black Lotus given the three necessary colors, but it's much easier to cast a GSZ for 3. I really like the addition of this card to the deck.
Vampiric Tutor - The old saying goes that you can't draw a card if it isn't in your deck. At the same time, cards like Null Rod, Ancient Grudge, Abrupt Decay, Ramunap Excavator, and Kolaghan's Command are extremely good in different contexts. Outside of those scenarios, they tend to be very weak and you almost never want to draw more than one. Vampiric Tutor provides an additional copy of these cards while minimizing the number of dead draws. The fact that most of these cards are 2-for-1's allows you to recoup the inherent card disadvantage of Vampiric Tutor.
Abrupt Decay - Oath of Druids and Vault/Key decks are pretty prevalent in the MTGO metagame. While Abrupt Decay is pretty inefficient against Shops and can be mediocre in a lot of match ups, I wanted the singleton copy to provide an out to early Oath decks and an un-Forceable answer to early Vault/Key, since we obviously aren't going to win the Force of Will battle there. With Vampiric and Demonic Tutors along with Snapcaster to flash it back, I've enjoyed a better game 1 win rate against Oath decks.
Manglehorn - Manglehorn is good against Shops, but really good at slowing Paradoxical down. It's fetchable with GSZ, which is an obvious plus. I have played Vedalken Aethermage to have additional copies of powerful SB cards (did you know Yixlid Jailer is a Wizard?), and this interaction isn't quite as embarrassing.
Izzet Staticaster - I mentioned the trouble with Hangarback Walker but Young Pyromancer tokens can also be problematic if you can't get a Leovold to stick. I wanted some sort of sweeper in the deck but there aren't very many one-sided options. Izzet Staticaster has been good. It gets hit by Pyroblast but dodges Misstep and either shoots down Pyromancer while eating a Bolt or Swords, or stays on the board and takes over the game. Either way, that makes it a conditional 2-for-1. It's also much better against the Overseer versions of Ravager Shops than it appears at first glance, given how many of their creatures are 1/1's. The 3 toughness blocks Mishra's Factory and I have found it a strong compliment to Null Rod.
Mindbreak Trap - Not really a surprising SB inclusion, but I feel it's worth mentioning that I board in MBT against Shops, especially on the play. As a 1-of, it's not really going to prevent most broken turn 1's from the opponent, but if you have it in hand against an 8 card Shops opponent, I like your chances of nabbing at least one spell on turn 1. It also is not bad in the late game as a 4 mana counterspell for a top decked Walking Ballista or Hangarback Walker. You typically do not want to draw multiples though and I wouldn't bring in more than one against Shops. I do want to find space in the SB for a second copy.
Cards played by NBA84 but cut from my list:
Null Rod - Null Rod has not been cut from the list, but I did try it without. I haven't liked Null Rod in Xerox Control decks but have come to appreciate how much better it is in this shell. In Xerox, you still struggle to answer Shops player's creatures as you tend not to run very many of your own. In this deck, you have several creatures that match up favorably against the Shops player's counterparts once Null Rod is in play. I favor one copy in the main to find with the tutors as drawing multiples against other Blue control or Xerox decks can be game ending given how attrition-based those matchups are. You can't afford to draw multiple cards that do actual nothing in those matchups, and by "nothing" I'm not talking about the good kind of "nothing" in the flavor texts... Against PO, I hear good things about the card. I say that because despite playing several time against PO, I have yet to resolve a Null Rod against them. Part of this is variance for sure, but I would like to find a room for a 2nd copy in the SB. I don't think it's necessary against Shops and am not sure I would board in the 3rd Null Rod, but Paradoxical Outcome is proving to be the most difficult match up, right now.
Thoughtseize - Thoughtseize complicated Shops SBing and was underwhelming for me. I don't consider it very good at all against Shops as they've typically dropped their hand by the time you can cast it on the draw. It is also tempo negative on the play in that you've removed a card that they didn't have to spend mana casting. Yes, sometimes their hand is dependent on one particular card, but most of the time I've found that they just play the other two 2 mana cards in their hand on turn 1 and you wish you would have led with DRS or turn 1 Dark Confidant. Since I really wanted to board it out, the led to 8 cards for the board and several other weaker cards I wanted to board out as well (like the 3rd Leo and Cruise on the play). I judged that I wanted to keep the Missteps and Pyroblasts more than I wanted to keep the Thoughtseizes, so those were cut to make the SB'ing map cleaner It is also a poor top deck late in the game, though Dack wasn't included in NBA's list so it might be worth revisiting as Dack mitigates that downside.
Dire Fleet Daredevil - On a fundamental level, I really dislike cards that are dependent on my opponent's deck. On a practical level, Dack makes Snapcaster better and the absence of Thoughtseize makes Dire Fleet worse. Snapcaster is better late game when you opponent is setting up a Paradoxical or Delve spell, in that it lets you flashback Pyroblast to stop that. I also felt that being able to Snapcaster back removal like Nature's Claim or Bolt was more relevant than the first strike against Shops. This is the reasoning that led me to cut the Daredevil for a second Snapcaster. I am curious if anyone has had much success with Daredevil. I have not encountered it in the leagues.
Against the Field
I am hesitant to give a formula for sideboarding with decks. In many cases, you should be willing to adapt a list to a specific metagame, which means tweaking the maindeck and the sideboard as appropriate. Also, cards in the SB are contextual and if the opponent does something unexpected, it's necessary to be flexible with certain plans. To that end, I think is most valuable to give written explanations of the sideboard exchanges I would make and why I would make them.
Shops - The format's current boogeyman. Game 1 is winnable but rough. Cards like Null Rod, Ancient Grudge, and Dack Fayden are powerful and can steal a couple of games. On the other hand, you have 4 Mental Missteps and 2 Pyroblast. Sideboarding consists of sideboarding those cards out. Obvious inclusions include 2 Nature's Claims, 2 Ancient Grudges, 1 Null Rod, and 1 Manglehorn. Less obvious considerations include boarding out the 3rd Leovold (he's by far the weakest 3 drop in the deck and redundant copies are not useful) and either the Sylvan Library or Treasure Cruise. The nature of the matchup changes on the play vs on the draw. On the play, you have a turn 1 window to land a Sylvan Library or a Dark Confidant to start to pull ahead on cards. On the draw, you lack that opportunity and the opponent gets an additional attack step to lower your life total (making both Bob and Sylvan worse). The strategy on the draw is more reactive as you typically have to cast removal immediately. Bob still has use as a blocker, but Sylvan Library is much worse. At the same time, Treasure Cruise allows you to reload in the mid-game after you've bought yourself a bit of a reprieve. I know you can still flip Cruise to Bob and be quite sad, but since you have to trade off Bob pretty frequently in combat, it doesn't end up occurring very frequently. Nonobvious cards to board in include Izzet Staticaster and Mindbreak Trap, I discussed the role of these cards above. Note: Izzet Staticaster is not very good against Car Shops, which runs Chief of the Foundry over Steel Overseer and typically omits Hangarback Walker. You can board in Yixlid Jailer in those cases.
Dredge - I haven't played against Dredge yet with the deck. The game plan is simple: lose game 1 while trying to figure out which version of Dredge the opponent is playing. You might be able to steal a game with Green source into DRS + Waste/Strip on Bazaar. If they only have one dredger, you can then remove it, but this is clearly not a high percentage line. Games 2 and 3, you board out 2 Pyroblasts, Null Rod, Green Sun's Zenith (Cage shuts it off), 1 Snapcaster Mage, Abrupt Decay, 1 Ancient Grudge for the 7 obvious pieces of Dredge hate. The plan is shut of the opponent's Dredge line with Cage, Vamp for Jailer, or Demonic for Tormod's Crypt, depending on circumstance. Additionally, DRS and Wasteland can interfere with the opponent's line. If the opponent is playing Hollow One, you have Dack Fayden's and K Command as removal. Against Gurmag Angler, you are typically on the block and Bolt plan. That's the theory, but I haven't gotten to put it into practice.
Oath - The plan against Oath is to remove Oath with Abrupt Decay, or ultimate a Dack Fayden and Pyroblast whatever creature they dump into play. Neither of these is a particularly high percentage line but postboard things get better. Three Grafdigger's Cages, 2 Nature's Claims, and 1 Mindbreak Trap come in for 1 Snapcaster Mage, 1 Dark Confidant, 1 Green Sun's Zenith (again, we have Cage), 1 Null Rod,1 Ancient Grudge, and 1 Underground Sea. Why Mindbreak Trap? Occasionally, your opponent will try to hardcast something like an Inferno Titan off a Cavern of Souls, or a Carnage Tyrant. It feels really, really good to Trap such a creature. Otherwise, beating a Titan is really difficult. If my opponent is on Inferno Titan Oath, I try to attack their mana with Dack and quickly build a board presence. By the time the opponent has played Titan, I hope I am able to burn them out with Bolt, Deathrite Shaman, and Time Walk. If you are really afraid of this happening, Maelstrom Pulse is a flexible removal spell that can be played over Kolaghan's Command or Izzet Staticaster.
Paradoxical - Paradoxical has been a rough matchup. Landing a Null Rod or Leovold is key, as is blowing up artifacts with Grudge, Dack, K Command, etc, or winning the counter battle over Pyroblast. This is by far the matchup I most miss Force in. SB'ing consists of boarding out Ramunap, 2 Lightning Bolts, Abrupt Decay for Null Rod, Mindbreak Trap, and 2 Ancient Grudges. Nature Claims typically stay in the board as I find too much artifact hate leads to clunky draws that don't do anything. I would like to change the SB around to better address this matchup and will talk about that later.
Xerox and Blue Control - Cutting Forces has had very substantial payoffs in this matchup. You typically jam card into their plays, counting on your 2-for-1's to eventually overwhelm their answers. Leovold is great in the matchup, as is Dark Confidant if you can keep Young Pyromancer under control. You are pretty much preboarded for this matchup (like the rest of the field). SB'ing includes cutting Ancient Grudge and Null Rod against Mentor as these cards are obviously quite poor. You want to board in Mindbreak Trap and Izzet Staticaster to deal with tokens. Trap actually gets hardcast a lot in the late game, but you can also orchestrate some blowouts with removal, Ancestral, and Snapcaster Mage. Against Landstill, Grudge stays in to deal with Factories. I board out the Vampiric Tutor, instead. In comes Trap and Yixlid Jailer. No, Jailer doesn't shut off Crucible, Delve, or Snapcaster Mage. It is simply another threat for them to deal with.
Combo and Other decks - Against Combo, the sideboard plan is pretty much common sense. Against DPS, you board in Traps, Rod, and Manglehorn. Izzet Staticaster can deal with Goblin tokens if they have Empty the Warrens. Dacks deal with the Tinker plan, well enough in my opinion. Against 2 Card Monte, Rods, Grudges, and Claims are great. Board out cards that aren't good in the main, like creature removal such as Grudge and Lightning Bolt. Pyroblasts are also pretty weak depending on the build of DPS. If they have Preordains, I like aggressively Pyroblasting those. Key is to pay close attention to what the opponent shows you game one and board accordingly. The "other" decks I played against were Humans (felt very favorable as your creature were comparable to theirs), White Eldrazi (felt miserable as your creatures are worse than theirs), and Saturn's Aperture Science Workshops combo deck (If you die before you Rod...). The only deck that seems devoid of bad matchups statistically is Shops, so with a deck like this, you have to recognize you are going to lose to somethings. Eldrazi is one of those things, so if it is a large part of your metagame, I recommend retooling the SB with Dismembers and more Wastelands (the deck is very mana hungry so hitting a Tomb or Temple can be crippling).
I played 5 matches with the original list, but since then I have started keeping track of my results, which are below. Would this article really be complete if I didn't have a screenshot of a Google Sheet?
Small sample size so I caution against drawing conclusions from this. A 60% MWR suggests that the deck can be competitive in the field, even without Force of Wills. I would encourage others to give their own experiences with similar lists so we can create a more complete picture.
Changes to the list
I wouldn't change the main deck, but if you wanted to make a change, Kolaghan's Command was probably the weakest card in the deck. I mentioned Maelstrom Pulse as a possible substitution for an Inferno Oath heavy field, but I'm sure everyone has their own pet cards that they would like to play instead. I would change the SB to be better against Paradoxical. Unfortunately, this means being weaker against Dredge. C'est la vie.
That's...not very many cards for Dredge. I will point out that tutors, DRS, and Waste effects do contribute to the Dredge matchup. That said, if I haven't played against a deck in 33 matches, it's hard to dedicate the full 7 cards to the matchup. Your mileage and metagame might vary though, but this is the list I plan on running through a Vintage league later in the week (to stream and to record).
I hope you enjoyed this approach. Thanks again to @NBA84 for his initial work with the deck. I think there is a lot of value in getting different viewpoints on the same decklist. At the very least, it gives things to talk about that aren't directly related to the B&R list. If anyone else has an idea for a list and would like to collaborate in a similar fashion, please let me know. As always, I welcome comments, questions, criticisms, etc...
Edit: I fixed several typos. Thank you @diophan.
Apparently this card didn't get a spoiler post (unless I missed it). Time to rectify that.
Everyone has their own process for evaluating cards. Some people look to draw comparisons to existing cards. Personally, I hate that approach. Very few cards are functionally equivalent and therefore justify comparisons. What do I mean? Disenchant and Fragmentize have the same function (destroying artifacts and enchantments). Disenchant was a playable card in Vintage and because Fragmentize did the same thing, it makes sense to compare these cards to establish whether or not they are playable. In the end, it was clear that saving a mana was worth the timing and targeting restrictions. Jace, Vryn's Prodigy is not functionally equivalent to Snapcaster Mage or Merfolk Looter, and so evaluation by comparison missed the mark on the card. It makes much more sense to evaluate Jace in the context of how it functions first before any sort of comparisons are made.
Why bring this up? Well, it's very tempting to view The Antiquities War as a bad Tezzeret baby. After all, the card is combination of Tezzeret's abilities in Enchantment form. You do that, and you end up quickly dismissing the card. Tezz the Seeker mostly sees play because it searches up Time Vault and wins the game that way. Tezz Agent of Bolas doesn't really see play at all. Tezz, Antiquities War doesn't serve as an obvious upgrade of either and so many players dismiss it and you end up without a Spoiler thread on TMD.
Let's step away from comparisons and focus on what the card does: In an artifact heavy deck, it impulses for 2 turns then wins the game. It's immune to Revoker effects, can't be attacked by creatures, doesn't require that much set up (it digs up 10 power of attackers. It does get hit to Pyroblast but then most of the cards you are running in Blue suffer from that (well, not Karn...more on him later). That's...not bad right? Suspend 2: win the game for 4 mana with a couple of artifact impulses? I thought not and so I built a deck around this and streamed it. If you missed that (I apologize...the videos get mangled due to twitch's copyright filter), to summarize: the card was a very solid win condition in the Thoughtcast, Mox Opal, Seat shell. It's immune to rod, you can drain into it pretty easily, and it fuels itself. It's less powerful than Paradoxical Outcome but the cards are complimentary - If you lack the artifacts for PO (which happens a lot), TAW will find some for your next turn. After you've PO'd, dumped a ton of artifacts into play, then passed back with counterspells to stop your opponent for one more turn, TAW wins the game. Synergy. I frequently had issues with Tendrils being a dead card when not comboing out. Or drawing a Blightsteel with my Tinker and then hating life. Or getting my Time Vault Dacked. TAW doesn't have those issues and therefore provides several advantages to the Thoughtcast PO deck.
My next stop with the card will be that list above (I was trying to make Damping Sphere work too, but I feel those go in different decks). I hope to stream it on Friday, but until then I hope this prompts some discussion on a card that has until now flown under the radar.
Edit: here is the rough draft of the first list
Yay, more data! Thanks again to @diophan and twitch user k0dydraven for their considerable help in compiling these.
Top 32 lists are available on WotC's page: https://magic.wizards.com/en/content/deck-lists-magic-online-products-game-info
Gentle reminder though that B&R needs to go to the right thread. (Believe it or not, it's possible to speculate on the impact of tech like Shattering Spree, Damping Sphere (Brian played it as a one of in his top 4 list in the last challenge), and the Misstep-less PO list from VSL competitors Ecobaronen and Lampalot), without bringing up possible bans or restrictions.)
@seksaybish Your main point is of questionable significance as Top 8's are essentially single elimination 8-mans in which pairings and luck play a disproportionate role.
I would also challenge you to see past the single decks to the metagame as a whole. Going through the tournaments you've cited, the events not dominated by Gush tend to be dominated by Anti-Gush Thorn decks.
EW - 5/8 Gush or Thorn decks
EE6 - 8/8
JanP9 - 4/8
FebP9 - 5/8
Mar - 7/8
Total - 29/40 = 72.5%
Seeing as Gush and Thorn decks are roughly 50-65% of a given metagame, this indicates an overperformance of these archetypes. Why is that? I would hypothesize that because these decks cannot be attacked on the same axis, it creates a polarized two-deck format. Gush requires a slim manabase, efficient though narrow counters, and is relatively immune to spot removal. Eldrazi and Shops require a robust manabase, a difference set of answers, and copious spot removal. Decks built to attack either Gush or Thorns must dodge the other, which becomes increasingly difficult to do over the course of larger events. This is an environment that is not rewarding of innovation and frankly boring to those that play it frequently. Success is largely matchup dependent once you reach a level of competency with your deck, which regrettably most Gush players have not achieved, leading to there being a large contingent of poor Gush pilots dragging down the match win % to "acceptable" levels.
This past Saturday, 115 Vintage players from the Northeast US gathered for one of the marquee events of the year: the TMD Open 18 aka Waterbury. Ryan and I were unfortunately unable to make the trip, but Ray graciously scanned the decklists and sent them to us so we could give them the usual treatment. By all accounts, @iamfishman did it again, throwing an excellent tournament the exemplified what Vintage is in the Northeast. There was trivia, there were giveaways, there was bingo, and there was beer. After all was said and done, Jarad Demick on Ravager Shops took home first place (the top 4 split the money and played for the trophy, as I understand it).
On Deck Classification
We continued to use the Archetype/Subarchetype scheme along with the breakdown by Tags. Given the popularity of Paradoxical Outcome, we created an archetype which we broke down by win conditions and structure. PO Mentor describes builds that run multiple Mentors as the primary win condition, similar to Kevin Cron's and Stephen Menendian's list from Champs (there were none present at the event). PO Storm describes the broken lists running Draw 7's, Chrome Mox, and LEDs. These decks often used a mixture of win conditions along with either Tendrils or Brain Freeze. PO Tezz describes the classic Vault/Key and Tinker builds - less all in than Storm and with value creatures like Trinket Mage and Snapcaster Mage. Other PO was generally combination of other archetypes. This is a work in progress...
There were several hybrids. Brian Kelly's Emrakul/PO/Gush deck was classified under Paradoxical. PO Oath was classified under Oath. Salvagers Gush Oath under Oath. The Oathstill decks were also included under Oath. If this seems arbitrary, it is. We are open to suggestions on this front but that's the nature classifications. It's why we have the tag system, so that we can attach multiple descriptors. The lists we included in "Other" can be viewed here along with all our raw data and calculations.
Top 8 Decklists
- Jarad Demick - Ravager Shops
- Jonathan Geras - Ravager Shops
- Travis Compton - Unmask Dredge
- Craig Dupre - PO Storm
- Raf Forino - Blitzkrieg Shops
- Akash Naidu - Powered Colorless Eldrazi
- Andy Probasco - Jeskai Mentor
- Andrew Farias - Jeskai Mentor
Congrats to all members of the top 8 and thanks to Ray for running an excellent event and providing us with the raw data. As always, I am indebted to Ryan Eberhart for his considerable help with our analysis. Questions? Comments? Please don't hesitate.
Expect the commentary for these events to get more brief due to their regularity. More in-depth analysis is now possible as we are able to aggregate the results from multiple events, so we'll probably do a monthly "State of MTGO Vintage" with those results. This past event both Ryan and I were able to play, Ryan coming in second with Jeskai Mentor, and me finishing in 9th after an unfortunate misclick on Stream was followed up by mana screw. Despite that, I walked away with 50 extra play points and 10 treasure chests (which would sell for ~24 tix). This is a considerable improvement over the old Power 9s in which I would have gotten my entry fee back, but no more. The EV for these events is excellent and if anyone has online Power, I highly encourage you to participate. Full details are here.
Link to the top 32 decklists: http://magic.wizards.com/en/articles/archive/mtgo-standings/vintage-challenge-2017-05-28
- Pedroj - Foundry Shops (Tangleless)
- Diophan - Jeskai Mentor
- Maegwiny - Foundry Shops
- Anssi A - Jeskai Mentor
- Isomorphic - Academy Combo
- Mlovbo - Ravager Shops
- Mr. Random - Foundry Shops (More Vehicles - No Ravagers)
- Hermoine_Granger - Jeskai Delver
I wanted to discuss two trends that I've been noticing in the metagame. No, not whether the DCI was correct or incorrect - in my opinion, it's far too early to tell. No, not based on this event, Shops is the best deck - a 49 player event is too small of a sample size to draw such conclusions. These trends have to do with personal observations of the metagame.
Several Shops players have started to cut Tangle Wire from their builds. This started with Jazza, who top 8'd the past 2 events (winning one outright), while finishing in the top 16 in this event. Pedroj adopted this strategy and won the tournament. I'm not saying that this is the correct direction for Vintage Shops players to take, but it certainly has merit and should be noticed by the Paper community.
Similarly, I've noticed that I have been less than happy with Jace, the Mind Sculptor in Mentor. I've found him to be poor against Delver, Shops, and Eldrazi where he is both difficult to cast and easily pressured. I've found him to be mediocre against Paradoxical Outcome (on both sides of the matchup). He is a sorcery speed 4 drop that only Brainstorms the turn he is cast, and savvy opponents will generally him resolve then aim to end the game on the next turn while their opponent is tapped out. I haven't been alone in this - if you look at Ryan's deck, you'll notice that he does not run Jace, running an extra Mentor and Snapcaster Mage. My approach was different, running Gifts for value where it enabled some pretty decent lines with JVP and Snapcaster Mage. It also allowed me to sit back on Mana Drain and operate more at instant speed, which improved my matchup against the two Outcomes opponents I played. Who's approach is right? Again, I don't know. This is meant as food for thought.
Major thanks to Twitch user ValanLuca who helped put in round data while I was streaming. This allowed me to suffer through Choice Chamber in between rounds, hopefully providing more entertainment for my viewers than entering numbers into a spreadsheet. Thanks and congratulations to @diophan. And lastly, a very happy birthday to Dragonlord @brianpk80. Questions? Comments? Have at it.
I apologize for the personal attacks - you are right that that rant has been building for some time. I am tired that the focus of these reports remains evaluating the banned and restricted changes or remarking on whether or not this event shows a unhealthy metagame. Did anyone congratulate Jazza or any of the other contestants outside of the initial post? @Serracollector , @desolutionist , and @MSolymossy looked like they were at the beginnings of a productive discussion on PO's poor performance and comparison to FoF and Gifts in similar shells, but that obviously never materialized.
Yes, the format's health is an important issue and I believe everyone should be entitled to their opinion. However, a single weekly event should do little to affect one's view of the metagame. I even tried to start those monthly reports to provide a medium for a more holistic metagame review - that's part of the frustration. We could go through this every week. "Shops won again; clearly is overpowered." "Mana Drain and PO won, the format is fine and everything is glorious". "Oh, Shops won again, time to restrict Shops." Magic is a game of variance and there will be swings...there will be anomalies. That's the nature of our hobby.
So, let me phrase my objection in what I hope is a non-confrontational form. The previous iteration of The Mana Drain had a policy that limited banned and restricted discussions to a specific forum. I imagine they ran into similar issues like we've encountered here. Even if @Brass-Man does not want to take a similar step (and I'm not saying he should), I think that we should try and limit such discussions to threads specifically concerning them. While we call these "metagame reports" they are really snapshots of a specific metagame and not necessarily indicative of the overall trend.