The first was just a difference from my experience in the use of terminology. For me, the p-values represent significance levels and are the reverse of confidence levels. So, a significance level of p < .05 is exactly the same thing as a confidence level of 95%.
For Millman, the p-value is .95 when the confidence level is 95%. For me, the 95% confidence level is 1-.05, the sharp line p-value. Both make sense; they are just different ways to say the same thing. Scientific writing will always couch results in the p-value method, usually showing the exact value, e.g., p = 0.002 and declaring whether or not statistical significance is achiev.
Another way to think about the logic of statistical
Using two percentages from different groups of people (e.g., males, females, brand awareness), you first assume that the percentages are not different (the null hypothesis). If testing tells you the odds are less than x% that they really are the same, not different, you can reject the idea that they are the same. Then you can declare them statistically significantly different at the (1-x)% confidence level.
Engage more customers with our meticulously selected lists of phone numbers. With our targeted data, you will be able to reach the ideal audience for better and more personalized marketing campaigns. Drive conversions and scale phone number list outreach with ease as you get connected with those matter most in the way of business success. Whether you’re trying to foster customer growth or zero in on customer retention, our lists present the accurate and updated information for forging meaningful connections and taking full advantage of your return on investment.
What’s actually going on is trying to determine whether or not two means or percentages are from the same population distribution or not.
Assume that they are
If they are, then the test statistic, e.g., a Z-value or Student t value, will be a number that could easily come from the same distribution – the equivalent of between 20 and 40 on the accompanying graph. If they are not, it will be bigger – further out on the scale – say, between 50 and 80, much less likely to have come from that common distribution. We are playing the odds.
An excellent chemistry teacher once us various little tricks to engage the class’s interest. He would state various (sometimes boring) facts and then yell SO WHAT?!
That woke us up! Then he’d explain what those facts meant, relative to whatever topic he was teaching. As point out in Millman’s article, base sizes act as magnifying glasses, making statistical significance easier to achieve. In marketing research.
We have conventions for reasonable
base sizes in the hundrs or thousands. But I once saw an experienc professional get very excit because a 0.1% difference between two democratizing creativity: ai’s unstoppable influence on digital marketing percentages was statistically significantly different at the 95% confidence level. Both base sizes were in the tens of thousands, provid by a syndicat service.
As in the case of the 0.1% difference, sometimes statistical significance and substantive or meaningful significance are two very different things! When looking at insights, it’s not a bad idea to ask, so what?
Using the wrong test also struck a nerve. I wrote a short piece (Order up wrong, Quirk’s, February, 2006) that not how the wrong test can lead to different conclusions than the correct one.
One pet peeve is the reliance
On survey programs that test all possible pairs in situations where there are more than two groups (usually, subgroups). The chi-square is design to take all the information into account and test percentages across three or more groups; the Z test between proportions, which is what the survey programs often use, is not.
Another common error
Bas the convenience provid by data tabulation programs, is to test every row of a scale. For example, if a five-point purchase phone number sa intent scale is us, the statistically correct procure is to pick one (count it: ONE!) appropriate summary statistic (e.g., means, top-box percentages, top-two box) and use that to test whether groups differ.
Testing every row of the scale between two groups violates a host of assumptions behind the testing, since no one row is independent of the others.
The top-two box percentages
Dictate the percentage left that can be in the rest of the scale. Nevertheless, the survey data tables cheerfully proce to test every row, generating a lot of meaningless information.
Hopefully, insights bas on erroneous statistical testing are a minor finding in otherwise sound marketing research!
The two words with opposite meanings are okay
It’s the long number scale in between that bothers me. What the heck does a 7.3 mean on a 10-point scale, or what does a 3.8 mean?
A better practice is short scales – true-false, yes-no, excellent-good, fair-poor and so forth – where each answer on the scale means something. It is much easier to explain true-false or yes-no answers to high-level executives than to explain 7.2 on an 11-point scale. In general, the higher the executive’s level, the shorter and simpler the research results must be; and that is where the simple, short answer scales are at their very best. The older I get, the shorter my answer choices become.
Our research founders did not stop with the aforemention sins but I do not wish to punish their collective reputations any further – since I’m one of them. They were a well-intention, studious lot and the useful tools they hand down to the current generation surely counterbalance some of their sins.