If we change only one value of a data set, will the mean absolute deviation behave as the same way as...
up vote
0
down vote
favorite
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
statistics standard-deviation
add a comment |
up vote
0
down vote
favorite
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
statistics standard-deviation
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
statistics standard-deviation
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
statistics standard-deviation
statistics standard-deviation
asked yesterday
Avinash Bhawnani
34019
34019
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago
add a comment |
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
[1] 19.50935
[1] 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the 10th
order statistic, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
[1] 138.1427
x.sort[20] = 200; x.sort[20]
[1] 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
[1] 28.79103
[1] 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
[1] 19.50935
[1] 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the 10th
order statistic, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
[1] 138.1427
x.sort[20] = 200; x.sort[20]
[1] 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
[1] 28.79103
[1] 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
add a comment |
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
[1] 19.50935
[1] 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the 10th
order statistic, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
[1] 138.1427
x.sort[20] = 200; x.sort[20]
[1] 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
[1] 28.79103
[1] 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
add a comment |
up vote
0
down vote
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
[1] 19.50935
[1] 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the 10th
order statistic, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
[1] 138.1427
x.sort[20] = 200; x.sort[20]
[1] 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
[1] 28.79103
[1] 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
[1] 19.50935
[1] 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the 10th
order statistic, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
[1] 138.1427
x.sort[20] = 200; x.sort[20]
[1] 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
[1] 28.79103
[1] 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
edited 3 hours ago
answered 4 hours ago
BruceET
34.7k71440
34.7k71440
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006958%2fif-we-change-only-one-value-of-a-data-set-will-the-mean-absolute-deviation-beha%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago