Saturday, November 14, 2015

Error bars and statistical significance: can we infer the statistical significance of the difference by looking at the error bars of two bar graphs?

Look at the following graph showing the averages and the corresponding error bars. Can we conclude whether the difference between the means of the two groups is statistically significant?


Well, the short answer is yes and no. A longer answer is, if the error bars provide the corresponding 95% confidence interval, the values in these two groups are normally distributed, then the difference is statistically significant at a 5% level using an unpaired t-test.

Now the long answer...

If we are really picky about the terms, we can say that statistical significance depends on the significance level you choose. So the answer of course depends on the level of significance.

Let's choose 5% significance level, which is usually assumed when the significance level is not specified..

The answer? Still not definitive -- because error bars can represent different things. Some people simply plot the standard deviation of the sample, others use them to represent standard errors of the mean, and still others put confidence intervals there. Depending on the nature of the error bars, we may or may not be able to infer the statistical significance of the difference of the mean.

Some words on standard deviation, standard error of the mean, and the confidence interval. Standard deviation simply measures the variability of a data set, without considering the sample size. When the sample size is large enough, it is a good estimate of the population standard deviation. However, when we conduct a statistical test, we are usually investigating whether the means are different in the two groups. In that sense, the standard errors of the means are more informative. 

If data are normally distributed, the standard errors are less than the standard deviation $\sigma$ -- indeed, the standard error of the mean SEM = $\sigma / \sqrt{n}$.  Now as shown below, the standard error bars do not overlap, is the difference between group means statistically significant?

It is hard to tell. However, when we have the error bars represent the 95% confidence intervals, then we can conclude that the difference is statistically significant at 5% if the bars don't overlap. BUT, the opposite is not necessarily true---two 95% confidence intervals that overlap may be significantly different at the 95% confidence level.


                         

What we can tell is whether the sample mean is statistically significant from a specific value by checking whether that value is within the confidence interval or not. For example, the sample mean is significantly different from 0 at 5\% level if and only if 0 is not contained in the 95% confidence interval.

The confidence interval is larger than the standard error bar -- $t_{(1+\alpha)/2}$ times the error bar width, where $t_{(1+\alpha)/2}$ represents the $\frac{1+\alpha}{2}$ quantile of a t-distribution. When the sample size is large, its value is approximately 1.96, which is the quantile for a normal distribution.

Therefore, we can easily conclude whether the difference is statistically significant when the error bars represent confidence intervals. If the error bars represent standard errors of the means, however, we can conclude that the difference is not statistically significant if they overlap.

P.S.: The meaning of a 95% confidence interval? Some people understand it as the population mean of the distribution is in the confidence interval with probability 0.95. Well, that is a wrong interpretation. The mean is a fixed value, it is either in that interval or not, and there is no randomness involved. The correct interpretation? Consider drawing the sample with the same size many times, and each time you have a 95% confidence interval. Then 95% of those confidence intervals (one for each sample) will cover the population mean -- in other words, the population mean falls into 95% of those confidence intervals.

P.P.S.: Sample s1 consists of 19 observations drawn from the normal distribution $N(5, 1^1)$, and sample s2 consists of 100 observations drawn from the normal distribution $N(6,2^2)$.

Tuesday, November 10, 2015

Mediator vs. Moderator variables


I have a co-author who is a psychologist. As we are collaborating on a paper, I found myself unclear about some of the terms, not just in psychology, but also in statistical analysis. One example is the concepts of mediator (mediating variable) and moderator (moderating variable).

As an economist, these two terms are foreign to me. But after googling them, they can be easily explained in terms of econometrics.

Psychologists use this concept of mediating variable to refer to the underlying psychological factor ($x_m$) that can explain why some external factor ($x_e$) lead to a change in a dependent variable ($y$).
Consider a regression of the dependent variable y on the independent variable $x_e$. If we add $x_m$ to the regression significantly change the direct effect of $x_e$, then $x_m$ can be considered as a mediating variable.





The concept of moderator variables, on the other hand, is closely related to interaction terms in econometrics. When we find that the effect of x1 on y depends on the value of x2, then we call x2 a moderator variable, and it has a moderating effect on the relationship between x1 and y. Moderators are usually categorical variables. Back to the above example, we find altruism x1 influences bidding behavior, but the effect is more pronounced for females. Then here gender x2 is a moderating variable that influences the relationship between altruism x1 and bidding in an all-auction y.

In a regression model,  the relationships between y, x1, and x2 can be expressed as the following relationship:
    $y = b_0 + b_1 x_1 + b_2 (x_1 \times x_2) + \epsilon$

The moderating effect of x2 is measured by b2.

P.S.: Biddings in all-pay auctions provide a measure of competitiveness. The above example is essentially saying that altruistic people are less likely to be competitive, but the effect is more significant for female. In other words, we can say that men seem to care less about the opponent in a competitive environment. How do we measure people's altruism? You can use a dictator game in the lab.


Monday, November 2, 2015

Bonferroni correction

In statistics, the Bonferroni correction is a method used to counteract the problem of multiple comparisons.

Statistical inference logic is based on rejecting the null hypotheses if the likelihood of the observed data under the null hypotheses is low. The problem of multiplicity arises from the fact that as we increase the number of hypotheses being tested, we also increase the likelihood of a rare event, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., make a Type I error).
The Bonferroni correction is based on the idea that if an experimenter is testing m hypotheses, then one way of maintaining the familywise error rate (FWER) is to test each individual hypothesis at a statistical significance level of 1/m times what it would be if only one hypothesis were tested.
So, if the desired significance level for the whole family of tests should be (at most)\alpha , then the Bonferroni correction would test each individual hypothesis at a significance level of\alpha/m . For example, if a trial is testing eight hypotheses with a desired\alpha = 0.05 , then the Bonferroni correction would test each individual hypothesis at\alpha = 0.05/8 = 0.00625 .
https://en.wikipedia.org/wiki/Bonferroni_correction