« The Case For a Hardline | Main | Pohgoh »

### Specification Search Much?

Alan Abramowitz looks to the future:

Reestimating the time-for-change model based on the results of all presidential elections since World War II, we obtain the following estimates:These figures -- 50.3, 0.81, 0.113, -4.6 -- strike me as a bit, um, arbitrary....V = 50.3 + .81*GDP + .113*NETAPP – 4.7*TFC,

where V is the predicted share of the major party vote for the incumbent party, GDP is the growth rate of real gross domestic product during the first two quarters of the year, NETAPP is the incumbent president’s net approval rating in the final Gallup Poll in June, and TFC is the time-for-change dummy variable. TFC takes on the value of 0 if the president’s party has controlled the White House for one term and 1 if the president’s party has controlled the White House for two or more terms.

**UPDATE:** Sorry, this was brief. I understand how the math works and why it doesn't turn up whole numbers. This is the formula that best relates the three variables under examination to the election results. But here's the question. What happens if you discard the results of, say, the 1964 election, recalculate the formala based on the 1948-1960, 1968-2004 data points, and then try to use the resulting formula to predict 1964? And I don't mean to be making a special point about 1964 -- these backward projections of presidential election results never seem to me to try very hard at going beyond a pretty simple task of number-crunching. What is the theory behind the TFC variable supposed to be? Etc.

December 2, 2004 | Permalink

## TrackBack

TrackBack URL for this entry:

https://www.typepad.com/services/trackback/6a00d8345160fd69e200d83436b6f453ef

Listed below are links to weblogs that reference Specification Search Much?:

## Comments

Mr. Yglesias,

These constants look like the results of a regression analysis.

They are not arbitrary if so.

Posted by: luisalegria | Dec 2, 2004 5:15:01 PM

These figures -- 50.3, 0.81, 0.113, -4.6 -- strike me as a bit, um, arbitrary....

What were you expecting? The numbers have to be chosen to fit known data. It would be awfully strange if they came out to round figures.

The big question is whether you can fit more than four separate datapoints given these four degrees of freedom. I'm skeptical since it is easy to make up an overfitted model for anything. But the fact that the numbers look "arbitrary" is not the reason.

Posted by: Paul Callahan | Dec 2, 2004 5:16:25 PM

I don't know what techniques he used, but statistical methods for extrapolating predictive formulas from previous data often produce random-looking numbers. I assume they're designed to fit the available data (extrapolated from them, actually), which means they probably won't come out to be whole numbers.

Posted by: Haggai | Dec 2, 2004 5:17:44 PM

Well, at the end of the day (or voting, as it were), what counts is whether or not the formula actually works? I'm assuming that through the iterations of interpolation machinations they got this formula to work going back to ... (idunno, a bunch of years/elections)?

Now, Michael, you wear glasses, so I nominate you to test it out and let us know, kay?

Posted by: mememomi | Dec 2, 2004 5:18:19 PM

I'm assuming that through the iterations of interpolation machinations they got this formula to work going back to ... (idunno, a bunch of years/elections)?

The problem is that that's not a good way to get a reliable formula. You can always fit any set of variables into a formula like this looking backward, and end up with a "best fit". The way you see if it is useful is by remember the scientific process--get a testable hypothesis, and test it. If you took half the elections in that time frame at random, fit to them, and then tested the formula against the remaining half, that would be better. If you have a reasonably good fit to the set you generated it from, and an equally -- or nearly so -- good fit to the "test" set, you've probably captured something real in that formula (though not necessary an explanation -- you may just have different effects with shared causes). If it fits really well to the numbers used to generate it, but badly to the "test" numbers, then you probably have nothing.

Posted by: cmdicely | Dec 2, 2004 5:35:57 PM

Paul Callahan has got it right. Four free parameters (including the constant) with only 14 data points (number of presidential elections since WWII) means overfitting (only 3.5 data points for each degree of freedom).

Few data points usually result in models that are unstable, that is, that change a lot with the addition of more data.

The proof of the model will have to wait until we have data from 40 or even better 80 presidential elections to include, sometime in the 22nd or 23 century, assuming we still have elections. Until then, don't put any money on the model predictions.

Posted by: Steve | Dec 2, 2004 5:43:11 PM

The proof of the model will have to wait until we have data from 40 or even better 80 presidential elections to include, sometime in the 22nd or 23 century, assuming we still have elections.

We have more than 40 elections now; the problem is that we keep having arbitrary points at which it is assumed that elections before are not comparable to elections after -- we have only 14 post-WWII elections. In 2060, political scientists may bemoan that we have only just over a dozen post-9/11/2001 elections. in 2100 we may bemoan the paucity of post-Martian-invasion elections, and so on.

Posted by: cmdicely | Dec 2, 2004 5:52:51 PM

Nicely put, Dicely!

Posted by: Steve | Dec 2, 2004 5:54:45 PM

cmdicely: you have a point, but perhaps more important is the fact that the other variables here aren't available for earlier elections. We have no idea what, say, Andrew Jackson's approval rating before his re-election campaign was. And prior to the 20th century, I don't think we have accurate quarterly growth statistics either. For that matter, there weren't two clearly defined parties prior to 1968, so it's not always clear who counts as an "incumbent" and who's a "challenger" in those elections. So even if we assume the underlying politics have not changed at all, we wouldn't be able to apply the formula.

We have much better data now, and the D's and R's show no signs of going away. In 50 or 100 years, we might have enough data points to come up with vaguely robust predictive formulas.

Posted by: Tim | Dec 2, 2004 6:18:01 PM

It's fairly easy to check whether the model that's being derived is stable; Just derive it all over again with each combination of thirteen elections, and see how much change you get in the parameters.

Posted by: Brett Bellmore | Dec 2, 2004 6:20:43 PM

Matthews original post links to the article that gives the derived parameters for the 2000 election (i.e., dropping the 2004 election) and the parameters don't change much from the 14 election model, certainly less than their standard errors.

Anyone got the data set handy for doing this sort of sensitivity analysis?

Posted by: Steve | Dec 2, 2004 6:35:02 PM

For that matter, there weren't two clearly defined parties prior to 1968, so it's not always clear who counts as an "incumbent" and who's a "challenger" in those elections. So even if we assume the underlying politics have not changed at all, we wouldn't be able to apply the formula.

We have much better data now, and the D's and R's show no signs of going away. In 50 or 100 years, we might have enough data points to come up with vaguely robust predictive formulas.

In 1812, the Whigs probably did show much sign of going away; but we've had the same two clearly defined national parties (with occasional temporary fissures in one or the other, and occasional transiently influential minor parties) since just before the Civil War.

Posted by: cmdicely | Dec 2, 2004 6:43:12 PM

A major structural change in presidential elections after WWII was the two-term presidential limit. Accounting for FDR's unprecedented four terms plus allowing that possibility before his time, would require another variable in the model that interacts with the present variables. With the increased degrees of freedom used up by the additional terms, I doubt there would be any improvement of the overparametrization problem even including data from the Civil War era to present. After all, it would only give us another 20 or so more data points, coupled with an increased use of five more degrees of freedom in the model, for a total of 9 degrees of freedom in the parameters with 34 data points, still about 4 data points for each degree of freedom.

Maybe what we really need are presidential elections every year so we'll still be around to test the model with an adequate amount of data.

Posted by: Steve | Dec 2, 2004 7:11:42 PM

A major structural change in presidential elections after WWII was the two-term presidential limit. Accounting for FDR's unprecedented four terms plus allowing that possibility before his time, would require another variable in the model that interacts with the present variables.

It is conjecture that the 2 term limit had any substantial effect. You could just fit the model to half the post civil war data points (chosen randomly), test it on the other half, and see what you have -- without interjecting a variable to account for the two term limit. I wouldn't be surprised if it didn't have a big effect on the quality of the model.

Then again, I wouldn't be surprised if no reasonable model with the variables here could be constructed on any set of elections that would test well on the rest, either.

Posted by: cmdicely | Dec 2, 2004 7:26:26 PM

"Then again, I wouldn't be surprised if no reasonable model with the variables here could be constructed on any set of elections that would test well on the rest, either."

I'm not trained in this kind of stuff, but is this another way of saying that this is all a bunch of pseudoscientific hooey? If you can't get a reliable model from inductive generalization on the available data, then can you get one any other way? Or is this a case of trying to apply mathematical/statistical analysis to a phenomenon that just isn't amenable to this kind of quantification? (This is meant as genuinely curious, not snarky).

Posted by: o | Dec 2, 2004 8:11:41 PM

The whole exercise seems pretty weird to me. Do we really think this regression is coming even close to explaining any sort of underlying statistical process? Setting that aside, what about the (presumably massive) omitted variable bias? I think Matt's question -- if we exclude any one year from the sample, how well does it work out-of-sample? -- is quite relevant. Furthermore, I was a little surprised that Abramowitz doesn't present an R-squared statistic.

Posted by: Guy | Dec 2, 2004 8:20:24 PM

Really, all this "presidential election prediction" stuff is a complete waste of time. The lack of data points, absurd assumption that Presidential elections are all due to the same process, etc. is just political scientists having a little fun and being taken way too seriously.

Posted by: Kimmitt | Dec 2, 2004 8:49:28 PM

If we could just get everyone to believe in the equation, it would be right every time.

Posted by: fle | Dec 2, 2004 9:00:52 PM

To answer Matt's question, if you removed the 1964 data and used the formula to predict the outcome of the 1964 election, no, you wouldn't get the number exactly right--no model's that good. But if you attempted the 1964 experiment with every election year used to develop the formula, your set of results would be more accurate as a whole than any other set of coefficients you could devise. Derive as much warmth from that as you care to.

Don't they teach stats at Harvard? OK, OK, I'm kidding!

Personally, I prefer my model: the candidate who wins is the one who can most plausibly be likened to Peter Pan. It seems to predict the outcome of the last 15 or so elections with uncanny accuracy.

Posted by: Adam M | Dec 2, 2004 10:03:41 PM

Yes, I'm not sure I see the value of removing a data-point. Like, don't you need that for yer theory? The real value is in its future predictive power, no?

THe Peter Pan thing's too flighty -- erm, subjective.

Posted by: memer | Dec 3, 2004 8:43:29 AM

Yes, I'm not sure I see the value of removing a data-point. Like, don't you need that for yer theory?

No, until you've tested it on data points other than the one it is initially fit to, you don't even have a "theory", you have a hypothesis.

Posted by: cmdicely | Dec 3, 2004 10:35:06 AM

"Personally, I prefer my model: the candidate who wins is the one who can most plausibly be likened to Peter Pan. It seems to predict the outcome of the last 15 or so elections with uncanny accuracy."

G.H.W. Bush more like Peter Pan than Dukakis? Hm. (Try to imagine Peter Pan running the CIA.)

R.M. Nixon more like Peter Pan than ... a freakin' rock or something? Hard for me to see!

Posted by: o | Dec 3, 2004 11:15:54 AM

Remember that Bush the First had Peggy Noonan writing for him, who literally believes in fairies (or at least magic dolphins). And Peter Pan never got asked what he would do if Wendy were raped and murdered.

As for Nixon, that may be an outlier, but he was tanned, rested and ready, and his "Secret Plan" to end Vietnam had more than a bit of pixie-dust to it. Some author (I think maybe Mailer) followed the McCarthy campaign and said that despite his liberal positions and the kids following him, he was the most deeply conservative man in American politics since Robert Taft.

Posted by: Adam M | Dec 3, 2004 1:54:26 PM

The comments to this entry are closed.