Election Models: An Inexact Science?

Between big-data and on-line betting markets electoral prediction is pretty hot these days. Everyone has a model for what will happen in November and, as I said last article, the raw numbers for these are all over the place. What's going on--what are "these models" anyway?

Electoral Models
Electoral Models are mathematical simulations of an election (in this case, the Nov 6th presidential election) that take the "current state" of the race (often including factors such as the economy, current polling, and even things like YouTube views) and try to match this state to past elections in order to predict a future result. In some cases, such as public polling, there's a wealth of data. In other cases (Twitter tweets in favor of a candidate) there isn't.

How Good Are They?
They are amazing. Everyone claims near perfect results (at least for the ones who were around for a past election). The problem is that everyone claims near-perfect results and so when "everyone is special" then "no one is."

In case you can't click through: all those links go to various prediction and polling houses talking about how accurate they are. They are all very, very accurate. So what do we do when they are saying different things?

Which they are right now.

What Does That Mean?
There's a near infinite number of models and houses and I'm in no way qualified to tell you who is getting what wrong. I want to talk about something slightly different: the raw material they are using to make their predictions. That's what I think is interesting right now.

The Wisdom of Crowds
As I've said before, there are places like InTrade that allow you to place real-money bets on election (and other) outcomes. When you have a lot of people with actual skin in the game, the theory is that you will get an aggregate push in the "correct" direction. That's the "Wisdom of Crowds." Here's PredictWise: an aggregator of various "betting sites" that uses the total numbers to address the outcome.
Florida Is RED!
Net Result: Obama at 59.7% to win
A Grand Formula (Polls + Economic Indicators) uses a poll of all polls plus some additional "weighting" and algorithms to try to get a better score than polling alone. This includes the stock market, GDP, and so on. It takes all the polling results but adjust them for "likely voter models" (a poll of all voters or all adults is less accurate than likely voters) and "house effects" (such as polling that tends to lean Republican or Democrat). His model for Nov 6th looks like this:
Florida? She is BLUE!!
Net Result: Obama at 72.4% to win

A Poll of Polls!
Forget about the Grand Formula: just add those suckers up. In this case, I think, RealClearPolitics does exactly that. This is their result:
Now Florida is GRAY!!
Net Result: It's not a prediction, just a snapshot--but it shows a current Obama lead of 56 EV for Obama.

The Candidate's Internet Presence!
We looked at the "Twitt-Dex" yesterday--Twitter's score of positive or negative comments about the candidates. PoliticIT takes all kinds of things like the candidate's Wikipedia hits, Twitter and Facebook followers, their "Klout" score, and so on and tries to build an "IT-Score" which is how popular and therefore likely to win they are. They claim their big-data approach has predicted over 90 election outcomes in 2012 with 87% accuracy! I'll ... uhm ... tell you what I think of this a little later.
Obama is TOTALLY killing it on Facebook!
Net Result: Obama leads IT Score 54 to 33 so ... that's an 87% chance of victory?

Tim Groseclose wrote the fascinating book Left Turn, which analyzes bias in the media. I found it entertaining, well written, and fairly convincing. He assigns each state a PQ score or "political quotient" which is how Democratic or Republican that state is. A 100 is Nancy Pelosi. A 0 is Michelle Bachmann. Clicking on his web-site, you can see that each of the key swing state is around 47.X% which, he says, should mean that it'll be a route in November as all the states vote against the high-PQ Obama. He surmises:
These polls, like my PQ analysis, suggest that the Electoral College tilts slightly toward Romney, compared with a pure popular-vote system. Expect soon for liberals to renew their complaints about the unfairness of the Electoral College.
At least one liberal isn't: Michael Tomasky (left) of the Daily Beast (Lefty-Mc-Left-Of-The-Clan-McLeft) looks at the electoral college layout and decides:
There’s a secret lurking behind everything you’re reading about the upcoming election, a secret that all political insiders know—or should—but few are talking about, most likely because it takes the drama out of the whole business. The secret is the electoral college, and the fact is that the more you look at it, the more you come to conclude that Mitt Romney has to draw an inside straight like you’ve never ever seen in a movie to win this thing. This is especially true now that it seems as if Pennsylvania isn’t really up for grabs. Romney’s paths to 270 are few.
Net Result: Dr. Groseclose doesn't exactly say, that I've seen--but I suppose it's a 65% or more chance of Romney based on, I think, Rasumssen polling.

What Do I Think?
Here's the first thing I think: predictions seem to "tighten up" by October so that if you look at the last month for the 2008 Pollster Report Card you see that Rasmussen and Pew "both nailed it" and after that there are various levels of accuracy. However, even though those guys at the bottom were pretty far off (5pts), the fact is: All of them showed Obama winning. The report card kind of accounts for this (its consistency rating) but the fact remains that no one is totally clear on what these polls mean this far out.

The second thing I think is that the Internet Presence stuff, while fascinating in a big-data way is not meaningful in real world terms yet. I'm not at all impressed with PoliticIT and their lack of discussion of methodology doesn't help any.

Thirdly, I'm even less impressed by the Philosophy battle. I think that Tomasky makes a point (the map, given current polling, is not favorable to Romney) and the PQ scores strike me as interesting points of public sentiment that have almost nothing concrete to do with elections. In other words, I think this is just blowing hot air.

On the other hand: polls of polls and analysis like Silver's does get my attention. That's not to say I think Obama has a 70% chance of victory--but I think that, looking at those RCP numbers: if Romney's coming ad blitz does not change the narrative he is going to lose. He needs to be thinking hard about how that's going to work. Nothing seems to have changed it so far--the You-Didn't-Built-That ad campaign is the sort of thing that seems to really drive the message home with people who were already going to vote for you.

So I've got one final thing. Look at the Twitt-Dex:

Does This MEAN Anything!?
This shows something happening on July 23 (or around there) that has resulted in a widening gap in the dialog. If we presume this is not random noise we would assume that something caused it. We don't know what that is (the olympic gaffe coverage was after that, from what I can tell) but if we assume it's a "symptom" then it becomes an interesting window into public reaction.

If we assume that these online measures (objective data) and polls (objective data) are measuring responses to real events then we should be able to use a big-data approach to tighten up the grand-theory approach much more effectively.

NOTE: According to some you might be able to tell who the VP pick is by looking at a spike in Wikipedia updates! Wow! So maybe these Internet numbers are good for something!

