Please use the comments here to ask questions about our methods.
I have a question about the methods for the analysis of the variance of differences between presidential approvals by Independent and Other voters.
My question is based on the DailyKos report, not on anything I have read here, btw. Apologies if I have missed a more detailed description that addresses this issue here.
VI is the sample variance of the Independent margin
VO is the sample variance of the Other margin
VD is the variance of the difference between Independent and Other Margins
VED is the expected variance of the difference between Independent and Other margins, then,
I understand the report to say that:
and that because VED>>VD, there is an unexplained departure from random sampling.
This argument requires I and O to be independent datasets. I do not understand how that assumption is justified. External events shape the approval ratings of both Independent and Other voters. It seems quite plausible to me that changes of I and O voters could be quite correlated over time (ie both could react in the same way to a speech or action by the president).
If such a positive correlation were to exist between the groups of voters, then it is obvious that the VD would be lower than expected than if there was no correlation. Since this is the actual result, and is used to suggest problems with the data, I think it is important to clarify this issue.
To be clear: I like both the idea of statistically checking up on pollsters, and I liked (fwiw) the DK post overall. I am asking this in the spirit of making sure that an obvious error wasn’t missed that might change peoples’ opinion of the whole work if uncorrected. And I think that, most likely, I missed some subtlety in the analysis that was glossed over in the DK post in the interests of readability (or because I can’t read 🙂 ).
Hi- Glad to get a more detailed version of the Q from the Kos site here. I think the key thing you’re missing is that we have a piece of information quite other than the empirical variances and covariances. We also have the minimal variances implied by statistical sampling theory. (See the Wiki links on the original post.) Those sampling errors ARE independent and they DO simply add. And they add up to something much bigger than the reported variance, which should include both them and the other correlated stuff you’re discussing.
This is brief for now- so please return if it doesn’t suffice.
Thanks again for taking the time to read and reply to my questions. In the DK article, the equation (1) describes how expected variance was calculated. I think Fav and Unfav in the equation are the mean favorability/unfavorability over all 60 weeks, expressed as a proportion. (btw: how did you calculate the variance for the difference data itself?)
Did you calculate the expected variance for the difference in margins by applying this equation to the I and O data, and then summing the two answers? If so, I think that my point still holds: covariance between the two datasets would lead to reduced variance in the difference data. If not, then I think I’ve been led astray by the original post (not meant as a criticism; I understand full well the constraints of technical writing for a non-technical audience).
I’d be happy to save you some bother and read through a technical report of the analysis if one is available – whether now or soon.
We will be posting the data which we used for our report, so anyone should be able to double check.
Eq. 1 describes the expected STATISTICAl var only. There is no formula, of course, for the non-statistical var. And, just to be careful, we did not average Fav and Unf and then calculate expected var, Rather, for each week we calculated expected var from that week’s data, then averaged those. (When Fav and Unf have much swing over time, the former procedure overestimates expected var, and we wanted to bend over backward not to do that.)
I still have no idea why you could possibly think that for the statistical sampling error (which is all we can calculate the expected var of) you would have anything but =0 for I and O, disjoint populations.
And of course one of the main reasons to look at the difference data was precisely to reduce the NON-statistical contribution to var, as we explicitly state.
For some reason that explanation worked. My previous concern was based on a mistaken understanding about how the the expected variance was calculated. In case anyone else is reading this: I was wrong and the original analysis seems perfectly logical to me.
Sorry for the time-wasting and thanks for the explanation.
Fill in your details below or click an icon to log in:
You are commenting using your WordPress.com account. ( Log Out / Change )
You are commenting using your Twitter account. ( Log Out / Change )
You are commenting using your Facebook account. ( Log Out / Change )
You are commenting using your Google+ account. ( Log Out / Change )
Connecting to %s
Notify me of new comments via email.
Blog at WordPress.com.