A few things:
- Everyone agrees that win % and win rates are the best possible metric for assessing deck performance, but there are two issues with this:
a) we lack this on a regular basis, your Vintage Champs analysis and my mid-October Vintage Challenge analysis are the exceptions that prove the rule, where we actually have win % by archetype.
b) Win % or win rates don't actually tell us that much about the overall shape and scope of the metagame. They don't tell us about diversity. They aren't a metagame metric, per se. You could have a large or small number of decks, and the win rate of any particular deck wouldn't tell us much about that.
- Language used by Wizards: I agree with your point that Wizards is constantly refining their terminology. BUT, and this is a big caveat, the last time they restricted cards in Vintage, they specifically cited Vintage Challenge Top 8 data, not win rates or win percentages.
"Data from twelve recent Vintage Challenges reinforces this, with 40% of the Top 8 decks being Shops and 30% being Mentor. Both decks feature strategies that are powerful, stifle diversity, and can be frustrating to play against."
Before I read up on the Simpson Diversity Index, I was thinking about creating a "Menendian Index" that would be a mashup index of different indicators; possibly 1/3 the range of decks in Top 8s, 1/3 a Gini Coefficient-like variable that measures inequality, and 1/3 perhaps something else.
But when I read up on the Simpson Diversity Index, realizing that it is sensitive to BOTH the range of strategies in a metagame AND the relative proportions of those strategies in the field, I realized it was the perfect holistic measure for what I was looking for.
Balance is obviously a metaphore that we are applying to Magic metagames, but balance by itself refers primarily to inequality. The primary image associated with balance is a scale or teetertotter. The problem with balance, by itself, is that the metaphor of balance doesn't include the range of decks. So a 2-deck metagame could be balanced, even though such a duopoly is bad for the format. My OP had two hypotheticals that illustrate two different extremes.
The Simpson Diversity Index is perfect because it accounts for both 'inequality' and for 'diversity.' Both matter.
TLDR: terminology is tricky here. We don't just care about one thing: we care about diversity AND balance, evenness and abundance, inequality AND range. And all of these concepts and terms are conceptually related, but also different.
In any case, I will do a write-up of my findings.