@Smmenen said:

Do you think Wizards has lied about the daily results?

The data is the daily reported MTGO decklists and the MTGO P9 decklists. That is, every day after a daily fires, Wizards of the Coast posts the decklists on their website here. Kevin and I went through every single daily in the Q1, 2016 up to our recording date, and then compiled them. So did the author here.

Quick question, then... The data you present represents **EVERY **decklist or every decklist which went 3-1 and 4-0 in the dailies? I am under the impression that your data was ONLY 3-1 and 4-0 decks from the dailies - the data shared with us on mtgo.com and mtggoldfish.com.

last edited by Fred_Bear

@Fred_Bear said:

@Smmenen said:

Do you think Wizards has lied about the daily results?

The data is the daily reported MTGO decklists and the MTGO P9 decklists. That is, every day after a daily fires, Wizards of the Coast posts the decklists on their website here. Kevin and I went through every single daily in the Q1, 2016 up to our recording date, and then compiled them. So did the author here.

Quick question, then... The data you present represents **EVERY **decklist or every decklist which went 3-1 and 4-0 in the dailies? I am under the impression that your data was ONLY 3-1 and 4-0 decks from the dailies - the data shared with us on mtgo.com and mtggoldfish.com.

Your question doesn't make sense. What's the difference between "every decklist that went 3-1 or better"and decklists those "only went 3-1 or better"? That's the same thing.

The world "only" and "every" perform the same work in each part of your question by excluding decks that performed worse than 3-1.

In any case, if you clicked the links I provided for you, you would have seen the answer. It is the complete population of decklists that performed at 3-1 or better. It's not a sample of 3-1 or better decks.

Again:

The raw data is here.

The cleaned data is here.

And then then aggregate data is here.

Wizards of the Coast asked MTGGoldfish to cease and desist collecting data, so our data was taken directly from the Wizards website. Had you actually looked at the tabs or read my previous post more carefully, I think that would have been clear.

last edited by Smmenen

I think maybe he's talking about the fact that we don't have the metagame breakdowns (which isn't your fault, we just don't have that data.)

If 75% of people are playing Shops decks and making up only 30% of wins, that tells a different story than if 1% are playing Shops decks and make up 30% of the wins. Of course, both of those situations would be a problem.

I'd love to see if/how the data on shops trended down over time, as I suspect a lot of early shops wins came from players assuming the deck was dead, and seriously underpreparing for it. Many people were super excited to play Storm Combo or Doomsday after the Chalice restriction, and most of those people lost. Of course, WotC made the decision before they could have identified any trend, which is a different problem, but one that every format has to deal with.

I don't think we have consensus as a format on what a problem metagame even looks like. Personally I would love the top deck to be around 25-30%, (even if, in this case, I don't really enjoy playing the top deck), so I looked at the same numbers and said "obviously not a problem!." If you think an optimal metagame has a top deck at 15-20% wins, those very same numbers say "obviously a problem!" Without any pre-discussed target for what a healthy metagame looks like, it's too easy to postrationalize what the data means, even if the data is complete accurate, which in this case I have to believe. (Note that I'm not saying anyone in this thread is doing that, it's just a peril of the sort of discussion we've been having)

@Brass-Man said:

I think maybe he's talking about the fact that we don't have the metagame breakdowns (which isn't your fault, we just don't have that data.)

Ah.

If that's the point he's making, then that would render Danny's statement at issue fundamentally unsupportable - since there is no way that anyone could know that "Mentor decks are basically the same percentage of the metagame as all MIshra's Worksho decks combined."

The assumption in this discussion is that by "metagame," we are referring to Top X metagame (either 3-1/4-0 decks or Top 16/8).

In fact, if you go back and look at every single metagame report ever, that's what we are talking about - the Top performing decks.

That said, although we don't have the entire metagame breakdown for most events, there are many in which we do. For example, the NYSEs, Waterburys, many of the Vintage Championships, and at least three of the MTGO P9 Challenge events are data points in which someone (in some cases me, Jaco, or Matt & Ryan), have gone in and counted every single deck in the metagame.

For example: http://themanadrain.com/topic/146/january-and-february-mtgo-p9-challenge-data

And:

http://www.eternalcentral.com/so-many-insane-plays-magic-online-p9-challenge-metagame-analysis/

From those data points, we've been able to see what % of the metagame these decks tend to be. In my experience from having closely observed this data over time, Workshops are often around 20-25% of the metagame. I think it was about 22.5% of the NYSE 3 last year. It was 22% of the Feb MTGO p9 but about 20% of the January MTGO P9 event.

last edited by Smmenen

@Smmenen said:

If you clicked the links I provided for you, you would have seen the answer. It is the complete population of decklists that performed at 3-1 or better. It's not a sample of 3-1 or better decks.

Steven, I appreciate the condescension, but, by definition, your data is a sample of the full population. More decks than what went 3-1 or 4-0 were played at each event. I didn't misunderstand anything and I do believe it is disingenuous to present only those decks as the full population. A daily requires a minimum of 12 participants - 48 events (in your sheets - I did look) x 12 decks = 576+ decks in the full population. Your data includes 241 "reported decks". The analysis done by @diophan and @ChubbyRain was a full population.

My issue with the data still exists.
#1 - The data does not represent a random sample and results in data with huge variation. What I mean by this is that you are looking at many snapshots rather than a continuous stream of data. This causes high variance on its own. Add to that the high variance (which you acknowledge) in a deck's play month-to-month and add to that the high variance of a small data set and you have data that is probably +/- 1 deck (on the conservative side) in every event. What does that mean? You show 72 Shops decks over 48 events. The high variance means that over the next 48 events, we should see 72 +/-48 decks in your data - just based on the variance in the data. That seems about right, too. The data was weighted higher in Jan/Feb and a drop-off was seen in the March data (and at the March P9). 57 decks in 35 Jan/Feb events and 15 decks in 13 Mar events. High variance, but within expectations. [Note, this is why I have no problem with the author say 16%=22% in his article. Ultimately, 24=120 in terms of Shops level of play over 48 daily events. That's what the 'variance' means in real numbers.]

#2 - Deck and Win % are not the only correlated variables. 'Huh?' Pilots matter over this data set. The premise of your data is that the deck is the dependent variable that leads to win percentage. It would not hold up to a stronger analysis of correlation. From your data set, we can point to 3 Shops pilots, Montolio, The Atog Lord, & BlackLotusT1, who account for nearly 40% of the decks in the Shops population. If 3 players can distort the data to the point of getting a card restricted, we should be able to agree that the data set is too small to base decisions on.

@Brass-Man is right. The data you put together is missing a key component and maybe someone at Wizards or the DCI has that information and has done a more in-depth analysis. Based on their explanation that came with the restriction, I'm doubtful.

I also agree that a top deck being in the range of ~30% should be fine and with the small samples that we are subject to, 30% probably means 20-40%. If it starts to creep from there, we have issues.

Looking at your most recent post, if Shops is historically 20-25% of large events, to use your terms, the DCI's explanation that Lodestone Golem was 'over-represented' is "fundamentally unsupportable" unless they are willing to outline what the ideal metagame looks like...

@Fred_Bear said:

@Smmenen said:

If you clicked the links I provided for you, you would have seen the answer. It is the complete population of decklists that performed at 3-1 or better. It's not a sample of 3-1 or better decks.

Steven, I appreciate the condescension, but, by definition, your data is a sample of the full population. More decks than what went 3-1 or 4-0 were played at each event. I didn't misunderstand anything and I do believe it is disingenuous to present only those decks as the full population.

First and foremost, if you are defining the full metagame as every deck played, then that renders Danny's claim not only unsubstantiated and unsupported, but fundamentally unsupportable as unknowable.

The issue being debated is my disputing the claim that "mentor decks are basically" the same portion of the metagame as Shops. I find that provably false, as an empirical matter.

So, if you wish to redefine the "metagame" as every deck played, then that only strengthens my critique.

But, based upon Danny's data set and my response, neither Danny nor I were defining "the metagame" as the every deck played. Rather, the "population" was the top performing decks. In MTGO dailies, this was defined as 3-1 or 4-0 decks. In my larger data set, this included Top 8 paper tournament results and Top 16 MTGO premier event results.

But, to reiterate, the "population" in our data was not every deck, but only the top performing decks. That's how were both defining the population - as the top performing decks.

This is not a novel concept.

Take a look at the metagame report archive: http://themanadrain.com/topic/138/vintage-metagame-data-archive

When, in 2004, Phil Stanton, posted an article titled the "April 2004 Type One Metagame Breakdown," he looked only at Top 8 data.

Or, when in 2011, Matt Elias posted an article titled "The Q1 Vintage Metagame Report," he looked only at Top 8 data.

In both cases, the titles of the articles and the discussions used the term "Vintage metagame." Not "Vintage Top 8 Metagame."

In the context of these discussions, it's well understood that we are discussing top performing decks, not the entire set of decks played. Danny's own data set makes that clear.

Now, if I had advanced a claim that you were now disputing using that logic, then maybe you would have a leg to stand on. But my only reason for participating in this thread is to dispute a very specific claim presented in the article that this thread is about.

#1 - The data does not represent a random sample and results in data with huge variation.

That's not what random sampling is. Random sampling is a statistical method that is used to try to understand a population that is too large to count feasibly So, instead of polling every possible voter, campaign pollsters use samples.

Not only is it not a sample, there is nothing "random" about this data. It's a complete population of top performing decks.

What I mean by this is that you are looking at many snapshots rather than a continuous stream of data.

Uh. No, I'm limiting most of my data to Q1, but within that set, it's a fairly continuous stream of data. Yes, I imposed some parameters on it (the Q1 of 2016), but you have to do that to any data. The notion that it's "merely a snapshot" suggests that it's some sort of inherently biased sample, when it's the exact same methodology that Vintage metagame reporters have used since 2003.

#2 - Deck and Win % are not the only correlated variables. 'Huh?' Pilots matter over this data set. The premise of your data is that the deck is the dependent variable that leads to win percentage. It would not hold up to a stronger analysis of correlation. From your data set, we can point to 3 Shops pilots, Montolio, The Atog Lord, & BlackLotusT1, who account for nearly 40% of the decks in the Shops population. If 3 players can distort the data to the point of getting a card restricted, we should be able to agree that the data set is too small to base decisions on.

This is off-topic to the issue I was debating here, but you are calling for a standard for restriction that Wizards is in no way obligated to follow.

By saying that "a data set is too small to base a decision" you are explicitly saying that Wizards either should not restrict or is unjustified in restricting unless they have a certain quality of data.

That's just false. That's not to say that I don't think that Wizards shouldn't use data in making decisions. I've been arguing that for years. In fact, I argued that in 2003 in one of my earliest SCG articles.

But Wizards, like many real world policy makers, have imperfect data sets when making policy decisions.

Do you think that Federal Reserve has every data set it would like in setting the federal funds rate?
Do you think that the President has every data set he wants in making military policy (see this month's issue of the Atlantic and the tremendous uncertainties in his Syria policy).

As I already pointed out, the problem of "individuals" skewing data sets already exists in paper magic, and it's just as true on MTGO. But that doesn't mean that the data can't be used to make banned and restricted list decisions or that doing so is somehow less valid. Wizards is perfectly justified in using imperfect data to make policy decisions, just as much as any other real world policymaker.

I also agree that a top deck being in the range of ~30% should be fine and with the small samples that we are subject to, 30% probably means 20-40%. If it starts to creep from there, we have issues.

Looking at your most recent post, if Shops is historically 20-25% of large events, to use your terms, the DCI's explanation that Lodestone Golem was 'over-represented' is "fundamentally unsupportable" unless they are willing to outline what the ideal metagame looks like...

Complete nonsense. Wizards has no duty to outline what an "ideal" metagame looks like. Moreover, Wizards has access to all of the MTGO data. It certainly is the case that they can look at the overall composition of Workshops in these metagames, and then see how they are performing relative to their metagame presence. There is not a shred of tangible evidence to doubt that's exactly what they did here.

In any case, that part of the discussion is a non-sequitur. I'm not debating the validity of Wizards decisions. I'm debating the validity of Danny's claim regarding Mentor and Shops.

last edited by Smmenen

On a different but related topic, I am curious Danny about your different suggested sideboard choices between the Storm and Doomsday, and especially the different anti-dredge, anti-workshop and "insulation packages" (defense grid/city of solitude/xantid swarm).

It isn't obvious to me why you wouldn't play more similar sideboards, particularly when the effects are similar. What were your thoughts?

P.S. I'm annoyed at you for mentioning City of Solitude. I've been thinking about that technology for a while, and was looking forward to catching folks off guard.

Great datasets and breakdowns. I'm sure we all appreciate the efforts people put into it.

I do wonder the value of tracking "Top 4" and "Top 2." In such small datasets, is this really relevant, as I think it leads to a warped perception. Top 16 or Top 8, yes. But Top 2/4 seems much less relevant and leads to overimportance of data.

I also question the dismissal of @Fred_Bear point about how three players made up 40% of the success of one deck. You can't say that 6% is incredibly relevant in one area, and then dismiss the enormous impact that these three players have had on the MTGO 3-1/4-0 population numbers. I think this level of repeat/consistent success is a rarer phenomenon in paper.

I just don't see Paper results and MTGO results being directly comparable (drops, lack of proxies on MTGO, tournament times, tournament prizes, etc). That said, with what is available, I think everyone is doing admirable work.

last edited by joshuabrooks

Look, you're obviously a smart guy and I'm not trying to dispute that, but just as you accuse others of hyperbole, you seem unwilling to relent on your own use of it...

@Smmenen said:

First and foremost, if you are defining the full metagame as every deck played, then that renders Danny's claim not only unsubstantiated and unsupported, but fundamentally unsupportable as unknowable.

No it doesn't. Statistics are used to draw comparisons and conclusions. Danny is looking at the available data and drawing a conclusion based on the expected variance in the data. Can it be known 100%? No. Can it be known for the sake of an editorial? Absolutely.

By your argument, the statistics are definitions and that's unreasonable in the terms of the original article's intent (at least by my understanding). It should be obvious to a reader that 16 does not equal 22, but the difference between 16 and 22 is not quite as vast as you want us to believe. [32 decks over 48 events is well within the variance between Mentor and Shops]

The issue being debated is my disputing the claim that "mentor decks are basically" the same portion of the metagame as Shops. I find that provably false, as an empirical matter.

By this argument, I could look at the data for 3/13 and claim Shops is 0% of the meta as an empirical fact. It's a crap argument. Statistics work when combined and interpreted with some common sense and you know that. What happened on 3/13 must be viewed within a larger discussion, just as March or February or January or Q1.

There is nothing "random" about this data. It's a complete population of top performing decks.

For a given time, you mean. This is a problem with statistics. Population and Sample. You've said before, it's the complete population, except when 2 dailies fire, so it's really a pretty comprehensive Sample.

What I mean by this is that you are looking at many snapshots rather than a continuous stream of data.

Uh. No, I'm limiting most of my data to Q1, but within that set, it's a fairly continuous stream of data. Yes, I imposed some parameters on it (the Q1 of 2016), but you have to do that to any data. The notion that it's "merely a snapshot" suggests that it's some sort of inherently biased sample, when it's the exact same methodology that Vintage metagame reporters have used since 2003.

It may be the same methodology, but is that warranted? Look at February 2016 in your spreadsheet as an example. You have paper data - 8 events ranging from 8-43 participants. The online data is 15 events with a minimum of 12 participants, all less than 32 (would result in 2 4-0 decks). The 8 paper events have maybe 2-3 of the same players appearing in the top decks (60) while the 15 online events have many of the same players appearing multiple times, for example BlackLotusT1 had 5 top finishes out of 72 total decks. The data is fundamentally different and those differences should be accommodated. To simply try and view it through the same lens as paper Magic has been viewed for the last decade doesn't seem right.

I mean, you can certainly do it, but what are the implications?

By saying that "a data set is too small to base a decision" you are explicitly saying that Wizards either should not restrict or is unjustified in restricting unless they have a certain quality of data.

That's just false. That's not to say that I don't think that Wizards should use data in making decisions, but Wizards, like many real world policy makers, have imperfect data sets.

Seriously? Again, I think you know what I mean. The MTGO data can be manipulated to present any number of "empirically correct" arguments (i.e. anything involving FoW, Mental Misstep, and Ingot Chewer). That doesn't make for useful decision making. The Federal Reserve may not have every data set it would like in setting rates, but they don't look at hockey shooting percentages or baseball batting averages. They try to make sense of the data set that they have and work to make it as strong as they can. If a data point doesn't fit, I guarantee that the Federal Reserve doesn't key on that data point to drive decisions.

This is the issue that many people seem to have. 12/60 (20%) Shops decks in paper (February) compared to 27/72 (37.5%) decks in MTGO (February dailies) could lead to much different decision making. The question becomes - which is more accurate? It's up to them and that's fine, but then either they can explain the reasoning or they can live with us questioning the methodology. I doubt they lose any sleep over whether or not I question something, so they will continue to do what they want...

As I already pointed out, the problem of "individuals" skewing data sets already exists in paper magic, and it's just as true on MTGO. But that doesn't mean that the data can't be used to make banned and restricted list decisions or that doing so is somehow less valid. Wizards is perfectly justified in using imperfect data to make policy decisions, just as much as any other real world policymaker.

To use your terminology, this is empirically false. Using February 2016 as an example, the paper Magic data is not nearly as skewed by individuals as the MTGO data over the same time period.

And you are right, Wizards/DCI can use whatever data they like, but we are also free to question their interpretations and decisions when they conflict with alternate data analysis.

I appreciate the analysis that you put together. I just don't think it's the whole story because you can easily be led to alternative conclusions. I understand that you are looking at it with the same historical perspective as has always been done, but I think that's skewed by the type of data MTGO generates.

As for the original argument, using your definitions, Danny probably overstated his case. If we read it and assume Danny is extrapolating the data to describe the metagame in broader terms (as Wizards does in their B/R announcement), I think he's justified. Again, based on the expected variance in the daily events data and the level of play Workshops see in larger events (paper, P9 challenges), it is reasonable to believe that Shops is played about the same amount as Mentor. I guess I should clarify that this can never be 'known' 100%, but based on the data available, I would've bet that for the next P9 Challenge Shops and Mentor would be within 5 decks of one another (post-Golem restriction, I expect Mentor to go up in comparison to Shops).

@Fred_Bear & @Smennen

I'm not sure what you are even debating anymore. The data speaks for itself. We can disect it and criticize analytical methodologies ad nauseum, but at some point the head of the pin becomes overcrowded with angels.

@Fred_Bear said:

@Smmenen said:

First and foremost, if you are defining the full metagame as every deck played, then that renders Danny's claim not only unsubstantiated and unsupported, but fundamentally unsupportable as unknowable.

No it doesn't. Statistics are used to draw comparisons and conclusions. Danny is looking at the available data and drawing a conclusion based on the expected variance in the data. Can it be known 100%? No. Can it be known for the sake of an editorial? Absolutely.

You are now pulling a bait and switch.

When I presented aggregate MTGO daily data, it's a mere "sample," explicitly calling it a questionably reliable data source.

But when Danny does the exact same thing, it's simply "statistics" - looking at data and drawing a conclusion with a subjective view of possible variance.

Look. I took a single sentence out of this article that I found to be objectionable based upon all of the available evidence. My data strongly disputed his claim.

Now, Danny has presented his data, and by his own terms his statement is not supportable by the available facts. You are willing to credit Danny with a tortured reading of the facts by introducing some notion of variance, but that cuts the other direction as well. It's just as plausible, in your reading, that Shops should be 6% higher, if you are willing to credit that much variance.

By your argument, the statistics are definitions and that's unreasonable in the terms of the original article's intent (at least by my understanding). It should be obvious to a reader that 16 does not equal 22, but the difference between 16 and 22 is not quite as vast as you want us to believe. [32 decks over 48 events is well within the variance between Mentor and Shops]

The difference between 16 and 22 % of a Vintage metagame is actually an enormous gulf. In fact, it's probably much larger than "I would have us believe." Consider a few facts that will put this in context:

  1. Very few decks constitute more than 6% of a Vintage metagame in any time period. Usually no more than 5, and sometimes as few as 2.

  2. 6% is the difference between 1% and 7% or 2% and 8%. Would anyone say that the difference between "1 and 7" isn't that big? No. It's enormous when dealing with the kinds of data we are looking at.

  3. 6% is almost equal to the total number of Delver decks in a data set. It's a very large difference.

The issue being debated is my disputing the claim that "mentor decks are basically" the same portion of the metagame as Shops. I find that provably false, as an empirical matter.

By this argument, I could look at the data for 3/13 and claim Shops is 0% of the meta as an empirical fact. It's a crap argument. Statistics work when combined and interpreted with some common sense and you know that. What happened on 3/13 must be viewed within a larger discussion, just as March or February or January or Q1.

Exactly. Your point here is a straw man argument, that actually undermines any use of data.

The reason that I look at quarterly data or bimonthly data (as Phil Stanton used to ) is that a snapshot of a single month, or a week, or even a day, is less reliable.

What we are looking for in the data are trends.

Your point, that any particular snapshot of data is flawed because there is so much variance in Vintage, overshoots the mark & swallows your entire argument. That argument could be made, literally, about any time period, a quarter, 6 months, a year, even a decade. I could apply the exact same point: that ten years of data simply has too much variance when looking at the previous ten years.

Applied to the logical extreme, your point here renders any time bound data set problematic.

When considered in reasonable manner, using a couple of months of data, or even a full quarter, has been deemed perfectly reasonable my most commentators, analysts, and participants in discussions such as this.

In my metagame reports, I used to only include tournament data with a minimum of 33 players to try to reduce the variance. Phil Stanton pegged the cut off at 50 players.

In any case, when ever using data, we have to impose some limits on what will be included and what won't. In this case, Danny and I looked at exactly the same data: MTGO reported dailies (although I looked at everything else as well).

By saying that "a data set is too small to base a decision" you are explicitly saying that Wizards either should not restrict or is unjustified in restricting unless they have a certain quality of data.

That's just false. That's not to say that I don't think that Wizards should use data in making decisions, but Wizards, like many real world policy makers, have imperfect data sets.

Seriously? Again, I think you know what I mean. The MTGO data can be manipulated to present any number of "empirically correct" arguments (i.e. anything involving FoW, Mental Misstep, and Ingot Chewer).

Of course it can be. But any thinking person would call out unreasonable use of data. Using your hypothetical, some idiot pointing to 3/13 to say that Shops are 0% of the metagame would be immediately dismissed out of hand.

On the other hand, aggregating several months worth of data into a quarterly report is generally considered reasonable and sufficiently inclusive.

This is the issue that many people seem to have. 12/60 (20%) Shops decks in paper (February) compared to 27/72 (37.5%) decks in MTGO (February dailies) could lead to much different decision making. The question becomes - which is more accurate? It's up to them and that's fine, but then either they can explain the reasoning or they can live with us questioning the methodology.

Finally, a point I agree with. I think this is the place that most reasonable people can disagree and get into a debate about the restriction of Golem. But that's a non-sequitur here, as my focus in this thread is a particular statement in this article.

As for the original argument, using your definitions, Danny probably overstated his case.

Yes. Overstatements = false statements. So, after all of this back and forth, you are now conceding that my original statement is now correct. Thank you.

If we read it and assume Danny is extrapolating the data to describe the metagame in broader terms (as Wizards does in their B/R announcement), I think he's justified.

I don't. Let's focus on this for just a minute longer.

Danny's claim, which I dispute, was that Mentor decks were about as prevalent as Shops decks.

Yet, not only are Mentor decks not even close to as prevalent as Shops decks (according to the Q1 data, Mentor decks were only 10% of MTGO reported daily decklists, whereas Shops were 31% of reported decks), but not even all Gush decks combined are as prevalent as all Shop decks combined.

If you look at the MTGO Q1 data we compiled, Mentor decks are less than 50% of all Gush decks (23/50).

So, if not even all Gush decks are equal to the number of all Shops decks (50 Gush decks compared to 72 Shops deck), then the much lesser statement, that Mentor decks basically approximate the number of Shop decks, becomes even more absurd (and, in fact, in the Q1 data set the numbers are 23 Mentor decks (28 if we add the 5 UW landstill decks) to 72 Shops decks).

Let that sink in for a second.

If the total number of Gush decks don't even come close to approximating the number of Shops decks, then it's not plausible that Mentor, which is less than 50% of the total number of Gush decks, can come any closer. It's just math.

last edited by Smmenen

I'm not accusing you of bias, but I believe the data you present is.

I think you misunderstand the data. You believe I was presenting a sample rather than the whole population of data.

Collection methods may be biased, but in this case the point is moot as its all the available data (unknown knowns and all that). Grouping criteria may be biased, opinions may be biased—raw, complete datasets not so much.

@Smmenen said:

You are now pulling a bait and switch.

When I presented aggregate MTGO daily data, it's a mere "sample," explicitly calling it a questionably reliable data source.

I've pulled no bait and switch. I've consistently said, the data set is a sample. The statistics that I read Danny drew his conclusion on are the same. I read it that he was extrapolating the data set to describe the metagame. When you do this, I find it reasonable to factor in variance to the data set, which over a quarter of the year can result in a significant number of decks.

Look. I took a single sentence out of this article that I found to be objectionable based upon all of the available evidence. My data strongly disputed his claim.

I disagree that the data strongly disputes his claim. It can be read that it disputes it, but if you look more broadly, his statement is not as black-and-white true-or-false as you make it seem. Mentor is a heavily played Gush deck - one of the best/most played versions, in fact, in paper it out-represents Shops. I don't find it offensive for him to say they see *about *the same play.

And you are correct, Shops could see heavier play due to variance, but that does not seem as plausible based on the larger tournament results (i.e. I made my argument self-consistent). I indicated in an earlier post that anywhere between 24 and 120 Shops decks in the data shows *about *the same level of play.

The difference between 16 and 22 % of a Vintage metagame is actually an enormous gulf. In fact, it's probably much larger than "I would have us believe." Consider a few facts that will put this in context:

Again, you want it to be true for some and not others. 16 and 22% is a huge gulf, but not spread over 48 events each reporting 4-7 decklists. The application means that you might see 1 more Shops deck than a Mentor deck at a similar event. This is why I don't understand why his statement was so offensive. The difference in play is less than 1 deck per event over nearly 50 events... It's only when you turn it into aggregate data that it turns into an 'enormous gulf'. But that's not what your statistics describe - the statistics describe a small event - where they were generated.

Exactly. Your point here is a straw man argument, that actually undermines any use of data.

Which is what I point out. Data has to be used in context and now you flip/flop from the use of data to prove fallacy to 'looking for trends'. That's what I've advocated all along. But to look at trends, you have to look at how the data is generated. Following top-level Shops mages on MTGO is going to artificially skew the data, but you don't seem willing to admit that the data does that.

Applied to the logical extreme, your point here renders any time bound data set problematic.

It doesn't, but you just want me to be wrong. The time should be such that the data is meaningful. I think the paper Magic data is excellent. It looks at random metagames, it looks at, seemingly, random participants, it's, seemingly, unbiased data. The MTGO data looks like the same players over and over and over again. Except for the P9 data which plays out like the paper data.

Of course it can be. But any thinking person would call out unreasonable use of data. Using your hypothetical, some idiot pointing to 3/13 to say that Shops are 0% of the metagame would be immediately dismissed out of hand.

On the other hand, aggregating several months worth of data into a quarterly report is generally considered reasonable and sufficiently inclusive.

But why is unreasonable to question data which is not in sync with other data sets? MTGO daily data shows a significant change from paper or from large online tournaments. You are ok with this. I am not. Especially when compounding the abnormal data set seems to be what drives decision making.

Again, I'll agree to disagree. I think he made an overstatement, but I don't find it offensive. I don't think you could distinguish 50 decks from 72 decks using sound statistical tools to analyze your data set to describe the metagame as a whole over Q1. I think it becomes even more difficult if the data analysis included an analysis based on pilot. If it repeated over another quarter (i.e. a trend), I think you would have a statistical argument, but we'll never know.

@Fred_Bear said:

@Smmenen said:

You are now pulling a bait and switch.

When I presented aggregate MTGO daily data, it's a mere "sample," explicitly calling it a questionably reliable data source.

I've pulled no bait and switch. I've consistently said, the data set is a sample. The statistics that I read Danny drew his conclusion on are the same. I read it that he was extrapolating the data set to describe the metagame. When you do this, I find it reasonable to factor in variance to the data set, which over a quarter of the year can result in a significant number of decks.

If Danny and I are both using the exact same data source: MTGO daily events, why are you crediting his data, but ignoring mine?

Moreover, if you are so concerned about sampling and time periods, shouldn't you be much more critical of his data set? If this isn't about just Danny's data, then doesn't his claim become even more tenuous?

Again, according the data I presented in the first post in this thread, in Q1:

Shops are 30% of dailies
Gush is 20.7%
And Mentor is 10%

Danny's explanation for why his data set is different, and he calculates Shops at 22% and Mentor at 16%, is that he added October, November, and December to the data. That means that Danny had the exact same data I have, but added three more months to the beginning (and therefore less directly relevant in determining current trends).

Adding those three months bolsters any claim that Mentor is getting closer to Shops, but looking just at Q1 makes it clear that Shops have pulled far ahead. According to the data, Shops went from 14% of dailies in Q4 to 30% in Q1. Danny knows this or can know this, since his data set encompassed both quarters. By presenting both Q4 and Q1, he is selectively ignoring the "variance" that illustrates a huge increase in Shops in the MTGO data set.

Look. I took a single sentence out of this article that I found to be objectionable based upon all of the available evidence. My data strongly disputed his claim.

I disagree that the data strongly disputes his claim. It can be read that it disputes it, but if you look more broadly, his statement is not as black-and-white true-or-false as you make it seem.

OK, let's look at his statement more broadly. Here is the the sentence and the sentence that precedes it:

"One could make a very strong argument that Monastery Mentor decks were the best deck in the format prior to the April 4th changes. They occupied basically the same percentage of the metagame as all of the Mishra’s Workshop decks combined, and unlike its artifact based counterpart there is no real good way to combat it. "

So, he's implicitly arguing that Mentor was the best deck before the most recent restriction, or at a minimum, tentatively endorsing such an argument. And then he is presenting a statistical claim to support that argument. So, for the argument that he is either advancing or implicitly endorsing to be true, the facts upon which it relies must also be true. This is not an editorialization. This is claim. If my staff put such a claim in a report, article or brief, I would demand they support it.

Mentor is a heavily played Gush deck - one of the best/most played versions, in fact, in paper it out-represents Shops. I don't find it offensive for him to say they see *about *the same play.

I don't find it offensive; I find it factually untrue.

And you are correct, Shops could see heavier play due to variance, but that does not seem as plausible based on the larger tournament results

Really? If you believe that, then you are ignoring the facts. If we look at the larger tournaments, Shops performs just as well, if not better. See below.

Recall again that Danny's data includes Q4, whereas mine is just Q1. If Danny's data is accurate, that Q4 and Q1 combine to make Shops only 22% of the MTGO daily results, and Q1 data has Shops at 30%, then for Danny's data to be true, Shops must have been around 14% in Q4 in order for that to average out.

That means that Shops more than doubled between Q4 and Q1. So, if we are going to look at "larger" tournament results and more tournaments, and we really care about trends, the trend is clear: Shops had a dramatic increase in Q1.

After all, you are now arguing that what we should care about is trends. The variance argument actually plays into my critique. Shops trended dramatically upwards in Q1, and any variance over time is variance that should be interpreted, if we credit trend data, towards Shops increasing frequency.

The difference between 16 and 22 % of a Vintage metagame is actually an enormous gulf. In fact, it's probably much larger than "I would have us believe." Consider a few facts that will put this in context:

Again, you want it to be true for some and not others. 16 and 22% is a huge gulf, but not spread over 48 events each reporting 4-7 decklists. The application means that you might see 1 more Shops deck than a Mentor deck at a similar event. This is why I don't understand why his statement was so offensive. The difference in play is less than 1 deck per event over nearly 50 events... It's only when you turn it into aggregate data that it turns into an 'enormous gulf'. But that's not what your statistics describe - the statistics describe a small event - where they were generated.

Your argument is tantamount to an argument against aggregation. It's absurd on it's own terms, but taking it at face value, my point is true whether we look at dailies or Premier events.

Let's look at Premier events. In Q1, there were only 8 Mentor decks in the MTGO P9 Top 16s. That's for an overall percentage of 16.66% of decks. In contrast, Shops were 31% of those decks.

So, if we look at Q1, using a smaller data of 3 events with 16 decklists per event with much less bias for particular player overrepresentation, then how preposterous is it to claim that 16.66% is about the same as 31%?

Exactly. Your point here is a straw man argument, that actually undermines any use of data.

Which is what I point out. Data has to be used in context and now you flip/flop from the use of data to prove fallacy to 'looking for trends'. That's what I've advocated all along. But to look at trends, you have to look at how the data is generated. Following top-level Shops mages on MTGO is going to artificially skew the data, but you don't seem willing to admit that the data does that.

sigh

While I'm glad we are now on the same page regarding "looking at trends," if you go back and look at any article I've ever published on Metagame Analysis that I've linked to earlier, it's clear that's the entire goal.

The point of looking at aggregate data is to discern trends. That's not flipflopping, that's the essence of what analyzing the metagame is for.

If MTGO dailies were as skewed as you suggest, then the data for Q1 dailies and MTGO P9 challenges wouldn't be virtually the same. Yet, the top 16 and top 8 data from the premier events is almost statistically identical. So, this "skew" effect that you keep harping on in an attempt to undermine the validity of the dailies - is not evident if we compared the dailies to the premiers. It's the same stats. The same numbers.

Again, I'll agree to disagree. I think he made an overstatement, but I don't find it offensive. I don't think you could distinguish 50 decks from 72 decks using sound statistical tools to analyze your data set to describe the metagame as a whole over Q1. I think it becomes even more difficult if the data analysis included an analysis based on pilot. If it repeated over another quarter (i.e. a trend), I think you would have a statistical argument, but we'll never know.

It's not 50. It's 28 compared to 72. That's the number of Mentor decks in the data set compare to the number of Shop decks.

I was using the 50 Gush decks to illustrate that not even the total number of Gush decks comes close to the number of Shop decks, and only 23 of the 50 Gush decks were Mentor decks.

To accept your overall argument concerning Danny's claim, one would have to believe that 28 is "reasonably" close enough to 72, given variance. That's just nonsense. It's miles away, not inches.

last edited by Smmenen

@Smmenen said:

If Danny and I are both using the exact same data source: MTGO daily events, why are you crediting his data, but ignoring mine?

No need to make it personal. I'm not choosing his data over yours. I believe that, objectively, there is truth to his statement. It's not black-and-white. It's not as simple as true-and-false. He said Mentor is a good deck which sees roughly as much play as Shops. You have repeatedly tried to paint the only available data as that from top decks played in the MTGO dailies and I still believe that is a disingenuous representation of all the available data.

I'm not going to argue with you - you're an attorney. I'm a process engineer, though, so I know how to interpret data. I'm trying to explain what I see in the data.

The metagame data from the P9 Challenges in Q1 show that Gush is just as played as Shops (not Mentor specifically). The paper data analysis that you provide for Q1 shows that Mentor shows up more than Shops by a bit. On the other hand, the Dailies data shows that Shops has a sizeable lead. My question remains the same - Why is the data so much different in the Dailies? The data from the dailies suggests it to be an outlier compared to all other available data.

I think 1 reason is the people playing. They get a single representation in each large P9 Challenge while they have multiple top finishes in the Dailies. This skews the data towards good players who play good decks, e.g. Montolio, BlackLotusT1, The Atog Lord, etc. I think a 2nd reason is that we do not see every deck played in the dailies, only the top finishers for an event. The hope with MTG data is that this evens itself out over large samples, i.e. a deck which went 2-2 on Tuesday will bounce back and go 3-1 on Wednesday, but the Vintage dailies are not a large sample. As I've repeatedly tried to point out, a variance of even 1 deck per event over your Q1 data is equivalent to only 24 Shops decks passing muster (1 less deck in 48 events) or a whopping 120 Shops decks making the cut (1 more deck going 3-1). Some will argue that the difference between 2-2 and 3-1 is a dice roll in Vintage. Those numbers swing the Daily data wildly without any additional Shops decks getting played. [Note: This is why it's important to know all the decks in an event. If every Shops deck being played is making the top decks list, that's important.]

I'm not going to argue that 24=120, but in the world of data analysis, sometimes it's hard to tell the two apart definitively.

Here's an alternate analysis of the dailies...

Using a simple Run Chart approach:
Average # of Shops decks finishing 3-1 or better - 1.45 +/-0.86
Average # of Decks finishing 3-1 or better - 5.00 +/- 1.00

Standard Statistical analysis would say to use 3 sigma, but even limiting ourselves to 1 sigma (68% confidence), we could see as few as 28 decks and 192 top finishes (14.5%) and as many as 111 decks over 288 (38.5%) and we shouldn't be "surprised". Those numbers could still 'represent' Shops only being ~30% of the meta. [Note: You could also use median and variance, but I think that under-represents Shops] [Note 2: This actually explains the Jan-to-Feb-to-Mar variance in the Q1 data and likely explains the variance back to Q4 of 2015.]

Looking at it from this angle, #1) I'm willing to believe that Mentor falls somewhere within those numbers. So, to the degree I trust the data, Mentor is roughly on par with Shops. Is it 100% played as much as Shops? No, probably not. Is it close? It's not an unreasonable statement. and #2) The huge variance in data from the dailies suggests that the data requires further monitoring/analysis before relying on it. The small sample sizes and limited player pool is not providing a consistent representation of the meta as compared to other data.

I'm sorry I disagree with the conclusion you want to draw. You're obviously quite passionate and invested in it. I really wish we could've gotten another month or quarter of data to see where the trend really was going.

@Fred_Bear said:

@Smmenen said:

If Danny and I are both using the exact same data source: MTGO daily events, why are you crediting his data, but ignoring mine?

No need to make it personal. I'm not choosing his data over yours. I believe that, objectively, there is truth to his statement. It's not black-and-white. It's not as simple as true-and-false.

Of course it is.

Imagine I said to you: "There are about the same number of Apples in the United States as Bananas."

Is that a true or false statement?

Of course it is.

It's empirical and quantifiable. It's precisely the statement that you'll find in the first chapter of a logic or a social science textbook as a contestable claim.

Now replace "apples" with "Mentor decks," "United States" with Vintage metagame, and "Bananas" with Shop decks, and it becomes clear as crystal that Danny's claim is the same.

Let's be absolutely clear about this. Danny’s statement is an empirical claim. It's a factual claim.

He claimed that Mentor decks were “basically” the same proportion of the Vintage metagame as Shop decks. This is either true or false.

Since it is a factual claim, it is amenable to empirical inquiry. It is precisely the kind of claim that is susceptible to factual analysis. It's inherently provable or falsifiable.

It’s not a subjective claim (e.g. “I like Ice Cream). It’s an objective claim.

Nor is it an inherently unprovable claim that lay in the realm of philosophy (e.g "God exists.")

Danny's statement is numerical and quantifiable.

If Danny's statement is not amenable to truth or falsity, then it's hard to imagine a claim that is. It’s the paradigmatic example of a empirically falsifiable claim.

Now, the only question is, what data set should we use to either prove or disprove the statement.

Thus far, there have only been developed two basic data sets:

  1. Danny’s MTGO Daily results (Q4 & Q1)

  2. My and Kevin Cron’s data sets, which include:
    a. Q1 Paper
    b. Q1 MTGO Daily reported results
    c. 5 Months of MTGO Premier Events
    i. A subset that features just Q1 Premier Events

Each of those data sets produce slightly different results.

  1. Danny’s MTGO Daily results show that Mentor is 16% of top performing decks compared to 22% for Shops
  2. Danny relied on my and Kevin’s paper results, which show Mentor and Shops to be roughly the same.
  3. My and Kevin’s Q1 MTGO daily results show Mentor to be 28 decks (11.6%) compared to 72 Shop decks (30%)
  4. My and Kevin’s MTGO Premier Events show Mentor to be 14% or 16% respectively, and Shops to be 30% and 31% respectively.

Let's summarize:

  • The Premier event data suggests that Danny’s claim is completely wrong. Shops are twice as prevalent as Mentor.

  • The Q1 daily results have Shops decks almost THREE TIMES as prevalent as Mentor.

Aside from the paper results, which no one disputes (and which isn’t relevant by itself, since we are talking in the aggregate), there is no universe in which Mentor is even close to the same % as Shops.

What about Danny's data? There are huge flaws with Dannys data:

  1. He completely ignored Premiere events.

  2. Despite having the data in his set, he conveniently ignored or elided the fact that Shops doubled its representation from Q4 and Q1, making it, in my opinion, unreasonable to include Q4.

Danny’s data misrepresents Shops proportion of the metagame “before April 4,” because he includes a period, October-December, that looks nothing like Q1.

But even if, in the “best case,” we accept Danny’s data, there is still a difference of 6%. While you don’t feel that’s a big deal, I think it’s an enormous gap. It’s the difference between 1% and 7%. It’s 33 decklists in his sample.

Your entire argument, that Danny’s claim is valid, rests on the following assumptions:

  1. Acknowledging a 6% difference, by arguing that that difference is close enough, in light of variance, to justify a claim that Mentors are within range of Shops.
  2. Ignoring the premier events is fine in drawing your conclusion
  3. Including the Q4 data, despite Workshop being half as present, is in no way problematic

I don’t think any of those assumptions are reasonable. But Danny's argument isn't valid unless we accept all three of them.

I think the argument is over once you acknowledge the 6% difference. That 6% difference means that Mentor decks are not the roughly the same as Shops. That's the end of the story.

He said Mentor is a good deck which sees roughly as much play as Shops. You have repeatedly tried to paint the only available data as that from top decks played in the MTGO dailies and I still believe that is a disingenuous representation of all the available data.

That’s all of the data we have. Neither Danny nor I have all of the MTGO daily decklists. So, your point here is completely besides the point.

I'm not going to argue with you - you're an attorney. I'm a process engineer, though, so I know how to interpret data. I'm trying to explain what I see in the data.

I also know how to interpret data. I am the Director of Research at a research institute, and write social science reports and file social science briefs in the Supreme Court at least once a year. What you are saying makes no sense. I wouldn't let my staff publish reports with the kinds of claims you are advancing in them.

Why is the data so much different in the Dailies? The data from the dailies suggests it to be an outlier compared to all other available data.

This is totally false. It's just completely untrue. You’ve already said before, and I’ve already refuted it. The daily top performers align almost perfectly with the premier event top performers.

As I said before:

“If MTGO dailies were as skewed as you suggest, then the data for Q1 dailies and MTGO P9 challenges wouldn't be virtually the same. Yet, the top 16 and top 8 data from the premier events is almost statistically identical. So, this "skew" effect that you keep harping on in an attempt to undermine the validity of the dailies - is not evident if we compared the dailies to the premiers. It's the same stats. The same numbers.

Let me repeat: The Daily Results are not an outlier. The Daily and Premiere Top performing Shops and Gush data are almost identical.

Yet, the premier events do not suffer the flaw you keep talking about "The skew of overrepresentation of specific individuals."

This is a great example of you just ignoring the facts.

I think 1 reason is the people playing.
Except that your premise isn’t true. The premier events show the same top performance stats as the dailies. It’s remarkable how well they line up. The Gush and Shops data is almost identical for Q1.

I'm not going to argue that 24=120, but in the world of data analysis, sometimes it's hard to tell the two apart definitively.

I'm just going to let readers ruminate on that statement, as I think it speaks for itself

last edited by Smmenen

@Smmenen said:

I was watching a little bit of thxe replay of Rich Shay's twitch stream this evening, and I was actually astonished that some people felt that by posting data here and asserting that a quote in your article was a false statement that I was "attacking your credibility."

I just want to chime in here and confirm that this is absolutely true. Many people on Rich's stream were attacking Steve, calling him a liar, biased, etc. I don't mean to criticize Danny, or anyone, but I was horrified by the things said on Twitch chat during Rich's stream. Some poor guy named Oddbare got banned from chat for saying that Danny's claim isn't supported by Steve's data. This controversy is silly. Let's keep things civil.

Would just like to add to the notion of keeping it civil. Thank you.

last edited by DBatterskull

Look Steven, as I said, I'm not going to argue with you. I'm trying to help you understand my viewpoint. I believe statistics are strongest when used as a tool for developing a predictive model. Based on my limited understanding of competitive Magic (I'm clearly not a World Champion or even a GP winner), that's what most players use to 'metagame'. They look at recent results and try to determine what puts them in the best shape for the next event.

You want to take the daily tournament results and claim they define the metagame. I don't think you can say that. They describe a single event or a group of events. I believe we can use those numbers to describe the whole metagame, but it is not 1:1. That is where we fundamentally disagree.

Had the article said: "Shops has basically been played as much as Mentor in MTGO dailies in Q1." I would agree whole-heartedly with your point. That is false. That's not what the article said, though. By my reading, Danny used the statistics to try and describe the whole Magic metagame. You want to say that is false, but I disagree, as a stats guy, I think it's fine. As you point out several times, it is 'fundamentally unknowable', but the statistics allow us to paint a picture.

I've tried to use the colloquialism of 'error bars' (maybe that's just a thing for scientists), describing the uncertainty in a measurement. You would have us believe that the number is 30% +/- 0%. That's true if I want to describe the past, what has already happened in an aggregate report. If I want to use it to describe what may happen in tomorrow's Daily Event, I have to add in some uncertainty. Using the Daily Event data from Q1, That's +/- 17% looking at it event-to-event (a run chart or control chart in industrial terms). So in my planning, I need to account for Shops accounting for somewhere between 13% and 47%. If 12 guys show up, there may be 2 Shops players or there may be 6. I won't know until the event starts. [The 'model' cannot distinguish between 2, 3, 4, 5, or 6 players showing up on Shops. If any of those numbers of players show up - the model, 30% +/- 17%, may be 'right'. You may be able to easily tell the difference from 2 and 6, but the statistical model cannot.]

As I've pointed out, this is why I don't like the Daily Event data. The estimate is high, due to a potential number of reasons, and it's quite variable. If we use the P9 Challenge data in Q1, Shops is 21% +/- 1.5% using the overall metagame data provided by @diophan and @ChubbyRain. That's pretty tight. Again, your argument is to look at the top 16 and it matches the dailies. You're, of course, right. Shops was 31.25% of the Q1 P9 T16s, but that's because it went nuts in February. The correct way to report that is 31.25% +/- 16.5%. By saying it is simply 31%, you ignore the fact that it was only 3/16 decks in January and 4/16 decks in March. Using the statistics in this way paints a much different picture and tells a much more accurate story, at least to me.

Back to Danny's statement, I think if we applied the correct error bars to Mentor decks, there would be overlap between Shops play and Mentor play, especially in paper. It's not going to be exact, but, again, for an editorial (which is what Danny's article is), it's a reasonable stretch.

I'm not going to argue that 24=120, but in the world of data analysis, sometimes it's hard to tell the two apart definitively.

I'm just going to let readers ruminate on that statement, as I think it speaks for itself

I'm sorry if I offended you by taking a more industrial approach to the statistics analysis. I work with Run Charts and Control Charts every day, so analyzing 'statistical variation' and applying control limits is something I take for granted. I may have made my statement a little bold, but if control limits on a process at 23 and 121, 24 and 120 are the same (until a trend develops, of course). If you want to try and use my statement to make me out to be an idiot, so be it. I got over trying to be the smartest guy in the room during my PhD program...

  • 50
    Posts
  • 29486
    Views