What is an intuitive explanation for how the t-distribution, normal distribution, F-distribution and...
up vote
4
down vote
favorite
What is an intuitive explanation for how the t-distribution, normal distribution, F-distribution, and Chi-square distribution relate to each other?
Could anyone explain this clearly with a sensible example?
I am a biologist and 've been trying to understand this nearly 10 years now. Every time use the statistical tests without a proper understanding of the base. Textbooks do not refer to this question either, moreover, we are not math or stat specialized in the university.
probability normal-distribution chi-squared
add a comment |
up vote
4
down vote
favorite
What is an intuitive explanation for how the t-distribution, normal distribution, F-distribution, and Chi-square distribution relate to each other?
Could anyone explain this clearly with a sensible example?
I am a biologist and 've been trying to understand this nearly 10 years now. Every time use the statistical tests without a proper understanding of the base. Textbooks do not refer to this question either, moreover, we are not math or stat specialized in the university.
probability normal-distribution chi-squared
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
What is an intuitive explanation for how the t-distribution, normal distribution, F-distribution, and Chi-square distribution relate to each other?
Could anyone explain this clearly with a sensible example?
I am a biologist and 've been trying to understand this nearly 10 years now. Every time use the statistical tests without a proper understanding of the base. Textbooks do not refer to this question either, moreover, we are not math or stat specialized in the university.
probability normal-distribution chi-squared
What is an intuitive explanation for how the t-distribution, normal distribution, F-distribution, and Chi-square distribution relate to each other?
Could anyone explain this clearly with a sensible example?
I am a biologist and 've been trying to understand this nearly 10 years now. Every time use the statistical tests without a proper understanding of the base. Textbooks do not refer to this question either, moreover, we are not math or stat specialized in the university.
probability normal-distribution chi-squared
probability normal-distribution chi-squared
asked Nov 24 at 14:04
Kynda
215
215
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
It is not totally clear to me precisely what you are looking for, but suppose $X_1,X_2,....,X_n$ are i.i.d. normally distributed random variables with mean $mu$ and variance $sigma^2$,
writing their average as $bar{X}={frac1n}sumlimits_{i=1}^{n} X_i$, then $dfrac{bar{X} -mu}{sigma/sqrt{n}}$ has a standard normal distribution $N(0,1)$ indicating the distribution of the sample mean
and $sumlimits_{i=1}^{n} left(frac{X_i-mu}{sigma}right)^2$ has a $chi_n^2$-distribution, i.e. a chi-squared distribution with $n$ degrees of freedom as the sum of the squares of $n$ independent standard normal random variables
while estimating the unbiased sample variance as $S^2=frac1{n-1}sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2$ you have $(n-1)frac{S^2}{sigma^2}$ having a $chi_{n-1}^2$-distribution, i.e. a chi-squared distribution with $n-1$ degrees of freedom since $bar{X}$ is affected by the individual $X_i$
and looking at the distribution of the sample mean you have $dfrac{bar{X} -mu}{S/sqrt{n}}$ having a Student $t$-distribution with $n-1$ degrees of freedom - not quite the same as the standard normal distribution in the first point, but close for large $n$; you can use this to test the hypothesis that the population mean is actually $mu$ without knowing $sigma^2$
as a tool in comparing variances, if $Z_1 sim chi^2_{d_1}$ and independently $Z_2 sim chi^2_{d_2}$, i.e. have chi-squared distributions with $d_1$ and $d_2$ degrees of freedom, then $frac{Z_1 / d_1}{Z_2 / d_2} sim mathrm{F}(d_1, d_2)$, i.e. has an $F$-distribution with parameters $d_1$ and $d_2$
and in particular if $Y_1,Y_2,....,Y_m$ are also i.i.d. normally distributed random variables with a different mean $mu_Y^{,}$ and but the same variance $sigma^2$ as the earlier $X_i$, then using the third bullet point, $dfrac{sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2}{sumlimits_{j=1}^{n} left({Y_j-bar{Y}}right)^2}sim mathrm{F}(n-1, m-1)$, i.e. has an $F$-distribution with parameters $n-1$ and $m-1$ and you can use this as a test of the hypothesis that the variances are equal without knowing their value or the value of the means
You may not know that the $X_1,X_2,....,X_n$ are in fact normally distributed, but the Central Limit Theorem suggests that for large $n$ and finite $mu$ and $sigma^2$ you should have $bar{X}$ approximately normally distributed as in the first bullet point, which may turn out to be good enough for the other properties, though for $n$ too small it may not be
add a comment |
up vote
2
down vote
The short answer is as follows:
- While probability studies the implications of assumed probability distributions, statistics assesses how well the data bear out these assumptions, by measuring something whose distribution is thereby predictable.
- The distributions you've asked about are important because you can construct statistical tests where the null hypothesis would imply such distributions, approximately or otherwise, are those of quantities called test statistics, which if too "abnormal" in their value motivate rejection of the null hypothesis.
- Given $n$ independent variables, each having a Normal distribution of mean $0$ and standard deviation $1$ (hereafter a standard Normal distribution), the sum of their squares has a chi-squared distribution with $n$ degrees of freedom.
- If $X,,Y$ are independent variables, $X$ having a standard Normal distribution and $Y^2$ having a chi-squared distribution, $X/Y$ has a $t$-distribution.
- If you scale two independent chi-squared variables to each have standard deviation $1$, the ratio of these scaled variables has an $F$-distribution, so squaring a $t$-distributed-variable (in which $Y$ has $1$ degree of freedom, so that its standard deviation is $1$) obtains one example of an $F$-distributed variable.
Now for the long answer:
A Normal distribution is specified by its mean $mu$ (which can be chosen arbitrarily) and its standard deviation $sigma$ (which can be any positive number). If a random variable $X$ has such a distribution, we write $Xsim N(mu,,sigma^2)$, where $sigma^2$ is the variance. The number of standard deviations from $mu$ to $X$ is a random variable in its own right, typically denoted $Z$, viz. $X=mu+sigma Z$. It turns out that $Zsim N(0,,1)$; we say $Z$ has a Standard normal distribution.
There are various scenarios in which random variables admit a Normal approximation. For example, the classical central limit theorem (CLT) states that the mean of a large number of independent samples from a finite-variance distribution has an approximately Normal distribution. We'll come back to that one. For another example, when you try to fit a model to data, you have noise terms $epsilon$ viz. $y=f(x)+epsilon$, and we can often justify the assumption $epsilonsim N(0,,sigma^2)$ for some $sigma>0$. Let's say we have $n$ observations. If we divide all noise terms by $sigma$, square the results and sum the squares, the result has a chi-squared distribution with $n$ degrees of freedom. This lets us quantify how surprising it is that the data deviate from expectations as much as they do, because with a distribution in mind we can obtain a $p$-value.
It's time to come back to the CLT. If you knew a distribution's mean $mu$ and variance $sigma^2$, a large sample's mean $overline{X}$ is a random variable, with an approximately Normal distribution. In particular, $frac{overline{X}-mu}{sigma}approx N(0,,1)$. But what makes you think you know the mean and variance? You can estimate these parameters from an existing sample, but then something funny happens. Because we've replaced the true parameter values with estimates of them that are also random variables, it turns out that the Normal approximation no longer works. In particular, if $mu$ is estimated as $m$ and $sigma$ is estimated as $S$, $frac{overline{X}-m}{S}$ has a $t$ distribution. As with the chi squared distribution, the distribution's shape depend on its number of degrees of freedom.
I mentioned noise terms with Normal distributions. They result in a variance with a ch-squared distribution, up to scaling. Now say I wonder whether two variables have the same variance. Because the variance of a sample is a random variable, the ratio of two samples' variance is $F$-distributed, up to scaling. This is the basis of the F-test of equality of variances.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
It is not totally clear to me precisely what you are looking for, but suppose $X_1,X_2,....,X_n$ are i.i.d. normally distributed random variables with mean $mu$ and variance $sigma^2$,
writing their average as $bar{X}={frac1n}sumlimits_{i=1}^{n} X_i$, then $dfrac{bar{X} -mu}{sigma/sqrt{n}}$ has a standard normal distribution $N(0,1)$ indicating the distribution of the sample mean
and $sumlimits_{i=1}^{n} left(frac{X_i-mu}{sigma}right)^2$ has a $chi_n^2$-distribution, i.e. a chi-squared distribution with $n$ degrees of freedom as the sum of the squares of $n$ independent standard normal random variables
while estimating the unbiased sample variance as $S^2=frac1{n-1}sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2$ you have $(n-1)frac{S^2}{sigma^2}$ having a $chi_{n-1}^2$-distribution, i.e. a chi-squared distribution with $n-1$ degrees of freedom since $bar{X}$ is affected by the individual $X_i$
and looking at the distribution of the sample mean you have $dfrac{bar{X} -mu}{S/sqrt{n}}$ having a Student $t$-distribution with $n-1$ degrees of freedom - not quite the same as the standard normal distribution in the first point, but close for large $n$; you can use this to test the hypothesis that the population mean is actually $mu$ without knowing $sigma^2$
as a tool in comparing variances, if $Z_1 sim chi^2_{d_1}$ and independently $Z_2 sim chi^2_{d_2}$, i.e. have chi-squared distributions with $d_1$ and $d_2$ degrees of freedom, then $frac{Z_1 / d_1}{Z_2 / d_2} sim mathrm{F}(d_1, d_2)$, i.e. has an $F$-distribution with parameters $d_1$ and $d_2$
and in particular if $Y_1,Y_2,....,Y_m$ are also i.i.d. normally distributed random variables with a different mean $mu_Y^{,}$ and but the same variance $sigma^2$ as the earlier $X_i$, then using the third bullet point, $dfrac{sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2}{sumlimits_{j=1}^{n} left({Y_j-bar{Y}}right)^2}sim mathrm{F}(n-1, m-1)$, i.e. has an $F$-distribution with parameters $n-1$ and $m-1$ and you can use this as a test of the hypothesis that the variances are equal without knowing their value or the value of the means
You may not know that the $X_1,X_2,....,X_n$ are in fact normally distributed, but the Central Limit Theorem suggests that for large $n$ and finite $mu$ and $sigma^2$ you should have $bar{X}$ approximately normally distributed as in the first bullet point, which may turn out to be good enough for the other properties, though for $n$ too small it may not be
add a comment |
up vote
2
down vote
It is not totally clear to me precisely what you are looking for, but suppose $X_1,X_2,....,X_n$ are i.i.d. normally distributed random variables with mean $mu$ and variance $sigma^2$,
writing their average as $bar{X}={frac1n}sumlimits_{i=1}^{n} X_i$, then $dfrac{bar{X} -mu}{sigma/sqrt{n}}$ has a standard normal distribution $N(0,1)$ indicating the distribution of the sample mean
and $sumlimits_{i=1}^{n} left(frac{X_i-mu}{sigma}right)^2$ has a $chi_n^2$-distribution, i.e. a chi-squared distribution with $n$ degrees of freedom as the sum of the squares of $n$ independent standard normal random variables
while estimating the unbiased sample variance as $S^2=frac1{n-1}sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2$ you have $(n-1)frac{S^2}{sigma^2}$ having a $chi_{n-1}^2$-distribution, i.e. a chi-squared distribution with $n-1$ degrees of freedom since $bar{X}$ is affected by the individual $X_i$
and looking at the distribution of the sample mean you have $dfrac{bar{X} -mu}{S/sqrt{n}}$ having a Student $t$-distribution with $n-1$ degrees of freedom - not quite the same as the standard normal distribution in the first point, but close for large $n$; you can use this to test the hypothesis that the population mean is actually $mu$ without knowing $sigma^2$
as a tool in comparing variances, if $Z_1 sim chi^2_{d_1}$ and independently $Z_2 sim chi^2_{d_2}$, i.e. have chi-squared distributions with $d_1$ and $d_2$ degrees of freedom, then $frac{Z_1 / d_1}{Z_2 / d_2} sim mathrm{F}(d_1, d_2)$, i.e. has an $F$-distribution with parameters $d_1$ and $d_2$
and in particular if $Y_1,Y_2,....,Y_m$ are also i.i.d. normally distributed random variables with a different mean $mu_Y^{,}$ and but the same variance $sigma^2$ as the earlier $X_i$, then using the third bullet point, $dfrac{sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2}{sumlimits_{j=1}^{n} left({Y_j-bar{Y}}right)^2}sim mathrm{F}(n-1, m-1)$, i.e. has an $F$-distribution with parameters $n-1$ and $m-1$ and you can use this as a test of the hypothesis that the variances are equal without knowing their value or the value of the means
You may not know that the $X_1,X_2,....,X_n$ are in fact normally distributed, but the Central Limit Theorem suggests that for large $n$ and finite $mu$ and $sigma^2$ you should have $bar{X}$ approximately normally distributed as in the first bullet point, which may turn out to be good enough for the other properties, though for $n$ too small it may not be
add a comment |
up vote
2
down vote
up vote
2
down vote
It is not totally clear to me precisely what you are looking for, but suppose $X_1,X_2,....,X_n$ are i.i.d. normally distributed random variables with mean $mu$ and variance $sigma^2$,
writing their average as $bar{X}={frac1n}sumlimits_{i=1}^{n} X_i$, then $dfrac{bar{X} -mu}{sigma/sqrt{n}}$ has a standard normal distribution $N(0,1)$ indicating the distribution of the sample mean
and $sumlimits_{i=1}^{n} left(frac{X_i-mu}{sigma}right)^2$ has a $chi_n^2$-distribution, i.e. a chi-squared distribution with $n$ degrees of freedom as the sum of the squares of $n$ independent standard normal random variables
while estimating the unbiased sample variance as $S^2=frac1{n-1}sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2$ you have $(n-1)frac{S^2}{sigma^2}$ having a $chi_{n-1}^2$-distribution, i.e. a chi-squared distribution with $n-1$ degrees of freedom since $bar{X}$ is affected by the individual $X_i$
and looking at the distribution of the sample mean you have $dfrac{bar{X} -mu}{S/sqrt{n}}$ having a Student $t$-distribution with $n-1$ degrees of freedom - not quite the same as the standard normal distribution in the first point, but close for large $n$; you can use this to test the hypothesis that the population mean is actually $mu$ without knowing $sigma^2$
as a tool in comparing variances, if $Z_1 sim chi^2_{d_1}$ and independently $Z_2 sim chi^2_{d_2}$, i.e. have chi-squared distributions with $d_1$ and $d_2$ degrees of freedom, then $frac{Z_1 / d_1}{Z_2 / d_2} sim mathrm{F}(d_1, d_2)$, i.e. has an $F$-distribution with parameters $d_1$ and $d_2$
and in particular if $Y_1,Y_2,....,Y_m$ are also i.i.d. normally distributed random variables with a different mean $mu_Y^{,}$ and but the same variance $sigma^2$ as the earlier $X_i$, then using the third bullet point, $dfrac{sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2}{sumlimits_{j=1}^{n} left({Y_j-bar{Y}}right)^2}sim mathrm{F}(n-1, m-1)$, i.e. has an $F$-distribution with parameters $n-1$ and $m-1$ and you can use this as a test of the hypothesis that the variances are equal without knowing their value or the value of the means
You may not know that the $X_1,X_2,....,X_n$ are in fact normally distributed, but the Central Limit Theorem suggests that for large $n$ and finite $mu$ and $sigma^2$ you should have $bar{X}$ approximately normally distributed as in the first bullet point, which may turn out to be good enough for the other properties, though for $n$ too small it may not be
It is not totally clear to me precisely what you are looking for, but suppose $X_1,X_2,....,X_n$ are i.i.d. normally distributed random variables with mean $mu$ and variance $sigma^2$,
writing their average as $bar{X}={frac1n}sumlimits_{i=1}^{n} X_i$, then $dfrac{bar{X} -mu}{sigma/sqrt{n}}$ has a standard normal distribution $N(0,1)$ indicating the distribution of the sample mean
and $sumlimits_{i=1}^{n} left(frac{X_i-mu}{sigma}right)^2$ has a $chi_n^2$-distribution, i.e. a chi-squared distribution with $n$ degrees of freedom as the sum of the squares of $n$ independent standard normal random variables
while estimating the unbiased sample variance as $S^2=frac1{n-1}sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2$ you have $(n-1)frac{S^2}{sigma^2}$ having a $chi_{n-1}^2$-distribution, i.e. a chi-squared distribution with $n-1$ degrees of freedom since $bar{X}$ is affected by the individual $X_i$
and looking at the distribution of the sample mean you have $dfrac{bar{X} -mu}{S/sqrt{n}}$ having a Student $t$-distribution with $n-1$ degrees of freedom - not quite the same as the standard normal distribution in the first point, but close for large $n$; you can use this to test the hypothesis that the population mean is actually $mu$ without knowing $sigma^2$
as a tool in comparing variances, if $Z_1 sim chi^2_{d_1}$ and independently $Z_2 sim chi^2_{d_2}$, i.e. have chi-squared distributions with $d_1$ and $d_2$ degrees of freedom, then $frac{Z_1 / d_1}{Z_2 / d_2} sim mathrm{F}(d_1, d_2)$, i.e. has an $F$-distribution with parameters $d_1$ and $d_2$
and in particular if $Y_1,Y_2,....,Y_m$ are also i.i.d. normally distributed random variables with a different mean $mu_Y^{,}$ and but the same variance $sigma^2$ as the earlier $X_i$, then using the third bullet point, $dfrac{sumlimits_{i=1}^{n} left({X_i-bar{X}}right)^2}{sumlimits_{j=1}^{n} left({Y_j-bar{Y}}right)^2}sim mathrm{F}(n-1, m-1)$, i.e. has an $F$-distribution with parameters $n-1$ and $m-1$ and you can use this as a test of the hypothesis that the variances are equal without knowing their value or the value of the means
You may not know that the $X_1,X_2,....,X_n$ are in fact normally distributed, but the Central Limit Theorem suggests that for large $n$ and finite $mu$ and $sigma^2$ you should have $bar{X}$ approximately normally distributed as in the first bullet point, which may turn out to be good enough for the other properties, though for $n$ too small it may not be
answered Nov 24 at 16:58
Henry
96.9k474154
96.9k474154
add a comment |
add a comment |
up vote
2
down vote
The short answer is as follows:
- While probability studies the implications of assumed probability distributions, statistics assesses how well the data bear out these assumptions, by measuring something whose distribution is thereby predictable.
- The distributions you've asked about are important because you can construct statistical tests where the null hypothesis would imply such distributions, approximately or otherwise, are those of quantities called test statistics, which if too "abnormal" in their value motivate rejection of the null hypothesis.
- Given $n$ independent variables, each having a Normal distribution of mean $0$ and standard deviation $1$ (hereafter a standard Normal distribution), the sum of their squares has a chi-squared distribution with $n$ degrees of freedom.
- If $X,,Y$ are independent variables, $X$ having a standard Normal distribution and $Y^2$ having a chi-squared distribution, $X/Y$ has a $t$-distribution.
- If you scale two independent chi-squared variables to each have standard deviation $1$, the ratio of these scaled variables has an $F$-distribution, so squaring a $t$-distributed-variable (in which $Y$ has $1$ degree of freedom, so that its standard deviation is $1$) obtains one example of an $F$-distributed variable.
Now for the long answer:
A Normal distribution is specified by its mean $mu$ (which can be chosen arbitrarily) and its standard deviation $sigma$ (which can be any positive number). If a random variable $X$ has such a distribution, we write $Xsim N(mu,,sigma^2)$, where $sigma^2$ is the variance. The number of standard deviations from $mu$ to $X$ is a random variable in its own right, typically denoted $Z$, viz. $X=mu+sigma Z$. It turns out that $Zsim N(0,,1)$; we say $Z$ has a Standard normal distribution.
There are various scenarios in which random variables admit a Normal approximation. For example, the classical central limit theorem (CLT) states that the mean of a large number of independent samples from a finite-variance distribution has an approximately Normal distribution. We'll come back to that one. For another example, when you try to fit a model to data, you have noise terms $epsilon$ viz. $y=f(x)+epsilon$, and we can often justify the assumption $epsilonsim N(0,,sigma^2)$ for some $sigma>0$. Let's say we have $n$ observations. If we divide all noise terms by $sigma$, square the results and sum the squares, the result has a chi-squared distribution with $n$ degrees of freedom. This lets us quantify how surprising it is that the data deviate from expectations as much as they do, because with a distribution in mind we can obtain a $p$-value.
It's time to come back to the CLT. If you knew a distribution's mean $mu$ and variance $sigma^2$, a large sample's mean $overline{X}$ is a random variable, with an approximately Normal distribution. In particular, $frac{overline{X}-mu}{sigma}approx N(0,,1)$. But what makes you think you know the mean and variance? You can estimate these parameters from an existing sample, but then something funny happens. Because we've replaced the true parameter values with estimates of them that are also random variables, it turns out that the Normal approximation no longer works. In particular, if $mu$ is estimated as $m$ and $sigma$ is estimated as $S$, $frac{overline{X}-m}{S}$ has a $t$ distribution. As with the chi squared distribution, the distribution's shape depend on its number of degrees of freedom.
I mentioned noise terms with Normal distributions. They result in a variance with a ch-squared distribution, up to scaling. Now say I wonder whether two variables have the same variance. Because the variance of a sample is a random variable, the ratio of two samples' variance is $F$-distributed, up to scaling. This is the basis of the F-test of equality of variances.
add a comment |
up vote
2
down vote
The short answer is as follows:
- While probability studies the implications of assumed probability distributions, statistics assesses how well the data bear out these assumptions, by measuring something whose distribution is thereby predictable.
- The distributions you've asked about are important because you can construct statistical tests where the null hypothesis would imply such distributions, approximately or otherwise, are those of quantities called test statistics, which if too "abnormal" in their value motivate rejection of the null hypothesis.
- Given $n$ independent variables, each having a Normal distribution of mean $0$ and standard deviation $1$ (hereafter a standard Normal distribution), the sum of their squares has a chi-squared distribution with $n$ degrees of freedom.
- If $X,,Y$ are independent variables, $X$ having a standard Normal distribution and $Y^2$ having a chi-squared distribution, $X/Y$ has a $t$-distribution.
- If you scale two independent chi-squared variables to each have standard deviation $1$, the ratio of these scaled variables has an $F$-distribution, so squaring a $t$-distributed-variable (in which $Y$ has $1$ degree of freedom, so that its standard deviation is $1$) obtains one example of an $F$-distributed variable.
Now for the long answer:
A Normal distribution is specified by its mean $mu$ (which can be chosen arbitrarily) and its standard deviation $sigma$ (which can be any positive number). If a random variable $X$ has such a distribution, we write $Xsim N(mu,,sigma^2)$, where $sigma^2$ is the variance. The number of standard deviations from $mu$ to $X$ is a random variable in its own right, typically denoted $Z$, viz. $X=mu+sigma Z$. It turns out that $Zsim N(0,,1)$; we say $Z$ has a Standard normal distribution.
There are various scenarios in which random variables admit a Normal approximation. For example, the classical central limit theorem (CLT) states that the mean of a large number of independent samples from a finite-variance distribution has an approximately Normal distribution. We'll come back to that one. For another example, when you try to fit a model to data, you have noise terms $epsilon$ viz. $y=f(x)+epsilon$, and we can often justify the assumption $epsilonsim N(0,,sigma^2)$ for some $sigma>0$. Let's say we have $n$ observations. If we divide all noise terms by $sigma$, square the results and sum the squares, the result has a chi-squared distribution with $n$ degrees of freedom. This lets us quantify how surprising it is that the data deviate from expectations as much as they do, because with a distribution in mind we can obtain a $p$-value.
It's time to come back to the CLT. If you knew a distribution's mean $mu$ and variance $sigma^2$, a large sample's mean $overline{X}$ is a random variable, with an approximately Normal distribution. In particular, $frac{overline{X}-mu}{sigma}approx N(0,,1)$. But what makes you think you know the mean and variance? You can estimate these parameters from an existing sample, but then something funny happens. Because we've replaced the true parameter values with estimates of them that are also random variables, it turns out that the Normal approximation no longer works. In particular, if $mu$ is estimated as $m$ and $sigma$ is estimated as $S$, $frac{overline{X}-m}{S}$ has a $t$ distribution. As with the chi squared distribution, the distribution's shape depend on its number of degrees of freedom.
I mentioned noise terms with Normal distributions. They result in a variance with a ch-squared distribution, up to scaling. Now say I wonder whether two variables have the same variance. Because the variance of a sample is a random variable, the ratio of two samples' variance is $F$-distributed, up to scaling. This is the basis of the F-test of equality of variances.
add a comment |
up vote
2
down vote
up vote
2
down vote
The short answer is as follows:
- While probability studies the implications of assumed probability distributions, statistics assesses how well the data bear out these assumptions, by measuring something whose distribution is thereby predictable.
- The distributions you've asked about are important because you can construct statistical tests where the null hypothesis would imply such distributions, approximately or otherwise, are those of quantities called test statistics, which if too "abnormal" in their value motivate rejection of the null hypothesis.
- Given $n$ independent variables, each having a Normal distribution of mean $0$ and standard deviation $1$ (hereafter a standard Normal distribution), the sum of their squares has a chi-squared distribution with $n$ degrees of freedom.
- If $X,,Y$ are independent variables, $X$ having a standard Normal distribution and $Y^2$ having a chi-squared distribution, $X/Y$ has a $t$-distribution.
- If you scale two independent chi-squared variables to each have standard deviation $1$, the ratio of these scaled variables has an $F$-distribution, so squaring a $t$-distributed-variable (in which $Y$ has $1$ degree of freedom, so that its standard deviation is $1$) obtains one example of an $F$-distributed variable.
Now for the long answer:
A Normal distribution is specified by its mean $mu$ (which can be chosen arbitrarily) and its standard deviation $sigma$ (which can be any positive number). If a random variable $X$ has such a distribution, we write $Xsim N(mu,,sigma^2)$, where $sigma^2$ is the variance. The number of standard deviations from $mu$ to $X$ is a random variable in its own right, typically denoted $Z$, viz. $X=mu+sigma Z$. It turns out that $Zsim N(0,,1)$; we say $Z$ has a Standard normal distribution.
There are various scenarios in which random variables admit a Normal approximation. For example, the classical central limit theorem (CLT) states that the mean of a large number of independent samples from a finite-variance distribution has an approximately Normal distribution. We'll come back to that one. For another example, when you try to fit a model to data, you have noise terms $epsilon$ viz. $y=f(x)+epsilon$, and we can often justify the assumption $epsilonsim N(0,,sigma^2)$ for some $sigma>0$. Let's say we have $n$ observations. If we divide all noise terms by $sigma$, square the results and sum the squares, the result has a chi-squared distribution with $n$ degrees of freedom. This lets us quantify how surprising it is that the data deviate from expectations as much as they do, because with a distribution in mind we can obtain a $p$-value.
It's time to come back to the CLT. If you knew a distribution's mean $mu$ and variance $sigma^2$, a large sample's mean $overline{X}$ is a random variable, with an approximately Normal distribution. In particular, $frac{overline{X}-mu}{sigma}approx N(0,,1)$. But what makes you think you know the mean and variance? You can estimate these parameters from an existing sample, but then something funny happens. Because we've replaced the true parameter values with estimates of them that are also random variables, it turns out that the Normal approximation no longer works. In particular, if $mu$ is estimated as $m$ and $sigma$ is estimated as $S$, $frac{overline{X}-m}{S}$ has a $t$ distribution. As with the chi squared distribution, the distribution's shape depend on its number of degrees of freedom.
I mentioned noise terms with Normal distributions. They result in a variance with a ch-squared distribution, up to scaling. Now say I wonder whether two variables have the same variance. Because the variance of a sample is a random variable, the ratio of two samples' variance is $F$-distributed, up to scaling. This is the basis of the F-test of equality of variances.
The short answer is as follows:
- While probability studies the implications of assumed probability distributions, statistics assesses how well the data bear out these assumptions, by measuring something whose distribution is thereby predictable.
- The distributions you've asked about are important because you can construct statistical tests where the null hypothesis would imply such distributions, approximately or otherwise, are those of quantities called test statistics, which if too "abnormal" in their value motivate rejection of the null hypothesis.
- Given $n$ independent variables, each having a Normal distribution of mean $0$ and standard deviation $1$ (hereafter a standard Normal distribution), the sum of their squares has a chi-squared distribution with $n$ degrees of freedom.
- If $X,,Y$ are independent variables, $X$ having a standard Normal distribution and $Y^2$ having a chi-squared distribution, $X/Y$ has a $t$-distribution.
- If you scale two independent chi-squared variables to each have standard deviation $1$, the ratio of these scaled variables has an $F$-distribution, so squaring a $t$-distributed-variable (in which $Y$ has $1$ degree of freedom, so that its standard deviation is $1$) obtains one example of an $F$-distributed variable.
Now for the long answer:
A Normal distribution is specified by its mean $mu$ (which can be chosen arbitrarily) and its standard deviation $sigma$ (which can be any positive number). If a random variable $X$ has such a distribution, we write $Xsim N(mu,,sigma^2)$, where $sigma^2$ is the variance. The number of standard deviations from $mu$ to $X$ is a random variable in its own right, typically denoted $Z$, viz. $X=mu+sigma Z$. It turns out that $Zsim N(0,,1)$; we say $Z$ has a Standard normal distribution.
There are various scenarios in which random variables admit a Normal approximation. For example, the classical central limit theorem (CLT) states that the mean of a large number of independent samples from a finite-variance distribution has an approximately Normal distribution. We'll come back to that one. For another example, when you try to fit a model to data, you have noise terms $epsilon$ viz. $y=f(x)+epsilon$, and we can often justify the assumption $epsilonsim N(0,,sigma^2)$ for some $sigma>0$. Let's say we have $n$ observations. If we divide all noise terms by $sigma$, square the results and sum the squares, the result has a chi-squared distribution with $n$ degrees of freedom. This lets us quantify how surprising it is that the data deviate from expectations as much as they do, because with a distribution in mind we can obtain a $p$-value.
It's time to come back to the CLT. If you knew a distribution's mean $mu$ and variance $sigma^2$, a large sample's mean $overline{X}$ is a random variable, with an approximately Normal distribution. In particular, $frac{overline{X}-mu}{sigma}approx N(0,,1)$. But what makes you think you know the mean and variance? You can estimate these parameters from an existing sample, but then something funny happens. Because we've replaced the true parameter values with estimates of them that are also random variables, it turns out that the Normal approximation no longer works. In particular, if $mu$ is estimated as $m$ and $sigma$ is estimated as $S$, $frac{overline{X}-m}{S}$ has a $t$ distribution. As with the chi squared distribution, the distribution's shape depend on its number of degrees of freedom.
I mentioned noise terms with Normal distributions. They result in a variance with a ch-squared distribution, up to scaling. Now say I wonder whether two variables have the same variance. Because the variance of a sample is a random variable, the ratio of two samples' variance is $F$-distributed, up to scaling. This is the basis of the F-test of equality of variances.
answered Nov 24 at 18:04
J.G.
19.8k21932
19.8k21932
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3011590%2fwhat-is-an-intuitive-explanation-for-how-the-t-distribution-normal-distribution%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown