Pollsters keep track of the gender of the people they call. This means that any poll can be broken down into two polls with entirely separate sets of respondents. The private polling firm Research2000 reported the gender breakdowns for all 795 [Note: This number was wrong earlier] of their weekly tracking poll questions done for dailykos.

These two separate smaller polls may look similar if men and women have similar opinions, but there is no overlap in respondents, so the precise % numbers are completely unrelated.

This means that if an odd-number percentage of men are “FAV”, it does not have any bearing on whether an odd or even % of women say “FAV”.

That said, we can look at all the questions that Research2000 asked, and count how often the men’s results and the women’s matched in being even or odd. For each question, there are four possible combinations:

Because the results come independently, each of these combinations is equally likely. So for the 795 weekly questions, we should have roughly 199 of each. Here are the actual tallies:

(Among “FAV” responses in questions with 3 possible responses asked in state polls only.)

How likely is this to happen by chance?

Well, this is roughly the equivalent of flipping two different coins 795 times, and only getting different results twice. It should happen by chance approximately:

#### One time in

10000000000000000000000000000000000000000

00000000000000000000000000000000000000000

00000000000000000000000000000000000000000

00000000000000000000000000000000000000000

00000000000000000000000000000000000000000

00000000000000000000000000000

Advertisements

I’m sure that I have missed something obvious. But how is it that out of 778 weekly questions there are 404 + 389 + 1 + 1 = 795 responses graphed?

Hmm, looking back at the original data, it appears there

are795 polls.Don’t know where the 778 came from.

Aha- the 778 was from before we fixed the 17 answers which were mis-formatted on the original posts. All 17 turned out to be parity matches when the column tabs were fixed.

Thanks for catching that. It shouldn’t change the bottom line, but it’s pretty sloppy on our part.

One theoretical explanation for the even/odd male/female thing could be in the dialing mechanism. It could recognize when it has hit enough of one sex/party affiliation and then instruct the people making the calls not to waste their time if more people of this sex/party affiliation answer the phone.

This mechanism would have no way to decide exactly how many people there were in the first group over the “finish line”. Several respondents will be on the phone the moment someone answers the winning male/female answer.

Suppose the system aims for a sample with a female/male ratio of 52%/48% and it aims for a 1000 respondents. During dialing it learns female 520 is on the phone. Now this respondent as well as the other respondent currently on the phone get to finish their questionnaire but the script gets changed and from now on any time someone responds “I am female” the answer is “sorry for wasting your time but I am looking for men, bye”

Now because the were still some respondents in the pipeline the mechanism has to calculate how many males (and/or republicans, left handed people whatever) it has to call to get a representative sample, 521 females means 481 males 522 females means 482 males… Any males responses above this number could be discarded in the interest of weighting the sample without calling more people than necessary.

Now if there is a reason to multiply some results with N(male) or N(female) then the result would take the parity of the respective sample size. I suspect the same would apply for the rounded result of a devision * 100…. but my head already hurts so someone else will have to pick up here.

I suspect this also fits with FleetAdmiralJ`s observation.

I am working on a post on the floating point casting and or other rounding error theory for the weird distribution of the variation. (Steve04@kos also smelled this) I noticed that the “gaps” where you would expect there to be many occurrences of a particular variation but there are a lot less than with gallup are either 1 wide (at 0 and -2) 2 wide (2-3 and -4, -5) or 4 wide (5-8)

1,2,4…. it all really smells like something going on the the base2 world. It smells like precisely the kind of rounding error you get when abstract thinkers and mathematics geniuses (and pollsters?) get involved in programming without a feel for all the housekeeping involved in data types, casting, parsing and printing. It sounds like what happens if you zero one or two least significant bits. Incidentally the difference between two numbers that are always both even or both odd could also always be a 2^N which would be interesting of tying both anomalies together.

http://en.wikipedia.org/wiki/Floating_point

http://en.wikipedia.org/wiki/Decimal_floating_point

http://en.wikipedia.org/wiki/Fixed-point_arithmetic

If you think I oversell the effects of inexperienced programmer and rounding then I offer this google search: site:http://catless.ncl.ac.uk/risks rounding error

The only way this effect gets worse is if (computer) science people feel a low level language with a lot of manual computer handholding like C/C++ is just the same as a high level language with a lot of details hidden behind abstraction like python, java or a piece of software like mathematica.

It appears there is very strong trends in the least significant digits of Research2000 polls. Markos, sees a conspiracy, but as far as I can tell the statistics case is predominantly in the least significant digits. The cents to a dollar.

One very interesting parallel comes to mind.

I remember when the right wing blogosphere was exploding. They were going over dem campaign contributions and they noticed a very interesting trend in the least significant digits of the amounts donated. Only a few donations ended in .00 cents. There where something and 11 cent donations, something and 3 cent donations… Clearly these were figures in a foreign currency and the had been converted to US dollars!!! The money was coming from abroad! (remember when that used to be illegal? I do, I didn’t buy an official campaign shirt for that very reason. I guess in 2012 I will be donating to Europeans-for-Obama-pac)

Calmly the left wing blogosphere explained: We want to keep track of which site raises the most money, to keep apart the donations the people on dailykos give a .01 cent extra, the people on firedoglake give .02 cent extra, etcetera, that way we can tell them apart.

Here to there appears to be explanations that attribute to programmer stupidity (or polling cleverness) what is currently being attributed to malice.

Still working on the fitting in the rounding error theory and it is looking more promising by the hour, sadly its too hot to think and I haven’t had a lot of sleep last night… Some amazing risk digest quotes on rounding errors and overflows (rounding errors big brother):

Man Gets $218 Trillion Phone Bill

Rounding error changes [german] Parliament makeup

Microsoft and Lotus Development have admitted that their spreadsheet products may produce inaccurate results because of an inherent problem with the design of all computers. Mistakes can occur in precision calculations, of the kind required by engineers and users in the scientific, banking and finance sectors.

Intel FDIV bug

And then there was the -16,000 Gore votes. Trust me, this all makes perfect sense to a programmer. Dont demand the raw data, demand the source code to the research2000 software.

Sure, there’s an obvious algorithm to generate the even-odd anomaly. We had it in the first drafts of the blog, then decided “Hypotheses non fingo.” Of course it starts from an integer top-line T, which is not what actual polling generally produces. Set T=(M+F)/2, then M+F=2T, voila.

So somebody doing weird but minor “adjustments” to the results could get this anomaly. And that’s exactly what we said in the paper. Then we moved on to the other anomalies.

Your right, the topline will from what I have seen rarely be an integer in the final report. But presumably the raw results of polling is in integers? (ignoring the occasional occasional 3/5`s of a man 😉 )

VS

Please be careful saying such things where R2k`s lawyers can read it.