If we change only one value of a data set, will the mean absolute deviation behave as the same way as...











up vote
0
down vote

favorite












I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?










share|cite|improve this question






















  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    3 hours ago















up vote
0
down vote

favorite












I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?










share|cite|improve this question






















  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    3 hours ago













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?










share|cite|improve this question













I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?







statistics standard-deviation






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked yesterday









Avinash Bhawnani

34019




34019












  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    3 hours ago


















  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    3 hours ago
















The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago




The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
3 hours ago










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.



Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
[1] 19.50935
[1] 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


enter image description here



So the two values are roughly the same. Now I sort the data, choose the 10th
order statistic, and replace it by the outlier 200.



x.sort = sort(x);  x.20 = x.sort[20];  x.20
[1] 138.1427
x.sort[20] = 200; x.sort[20]
[1] 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
[1] 28.79103
[1] 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")


enter image description here



Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).



One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.






share|cite|improve this answer























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006958%2fif-we-change-only-one-value-of-a-data-set-will-the-mean-absolute-deviation-beha%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Here is an example using the definition of MAD implemented in R statistical
    software: For the sample $X_i, dots, X_n,$
    $$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
    where $H$ is the median of the sample, and the constant multiple is intended
    to put values on a scale so that MAD and sample standard deviation $S$ are
    roughly comparable for large normal samples. So according to this definition
    MAD is based on the Median of the absolute differences from the sample median.



    Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



    x = rnorm(20, 100, 15)
    summary(x); sd(x); mad(x)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    60.01 84.13 98.49 98.67 111.50 138.14
    [1] 19.50935
    [1] 20.83691

    boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


    enter image description here



    So the two values are roughly the same. Now I sort the data, choose the 10th
    order statistic, and replace it by the outlier 200.



    x.sort = sort(x);  x.20 = x.sort[20];  x.20
    [1] 138.1427
    x.sort[20] = 200; x.sort[20]
    [1] 200
    summary(x.sort); sd(x.sort); mad(x.sort)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    60.01 84.13 98.49 101.77 111.50 200.00
    [1] 28.79103
    [1] 20.83691

    boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
    main="Boxplot of Modified Sample")


    enter image description here



    Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
    Also, the MAD was not increased (20.83691 before and after), but the sample SD
    has increased noticeably (roughly, from 19.5 to 28.8).



    One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.






    share|cite|improve this answer



























      up vote
      0
      down vote













      Here is an example using the definition of MAD implemented in R statistical
      software: For the sample $X_i, dots, X_n,$
      $$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
      where $H$ is the median of the sample, and the constant multiple is intended
      to put values on a scale so that MAD and sample standard deviation $S$ are
      roughly comparable for large normal samples. So according to this definition
      MAD is based on the Median of the absolute differences from the sample median.



      Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



      x = rnorm(20, 100, 15)
      summary(x); sd(x); mad(x)
      Min. 1st Qu. Median Mean 3rd Qu. Max.
      60.01 84.13 98.49 98.67 111.50 138.14
      [1] 19.50935
      [1] 20.83691

      boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


      enter image description here



      So the two values are roughly the same. Now I sort the data, choose the 10th
      order statistic, and replace it by the outlier 200.



      x.sort = sort(x);  x.20 = x.sort[20];  x.20
      [1] 138.1427
      x.sort[20] = 200; x.sort[20]
      [1] 200
      summary(x.sort); sd(x.sort); mad(x.sort)
      Min. 1st Qu. Median Mean 3rd Qu. Max.
      60.01 84.13 98.49 101.77 111.50 200.00
      [1] 28.79103
      [1] 20.83691

      boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
      main="Boxplot of Modified Sample")


      enter image description here



      Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
      Also, the MAD was not increased (20.83691 before and after), but the sample SD
      has increased noticeably (roughly, from 19.5 to 28.8).



      One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.






      share|cite|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        Here is an example using the definition of MAD implemented in R statistical
        software: For the sample $X_i, dots, X_n,$
        $$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
        where $H$ is the median of the sample, and the constant multiple is intended
        to put values on a scale so that MAD and sample standard deviation $S$ are
        roughly comparable for large normal samples. So according to this definition
        MAD is based on the Median of the absolute differences from the sample median.



        Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



        x = rnorm(20, 100, 15)
        summary(x); sd(x); mad(x)
        Min. 1st Qu. Median Mean 3rd Qu. Max.
        60.01 84.13 98.49 98.67 111.50 138.14
        [1] 19.50935
        [1] 20.83691

        boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


        enter image description here



        So the two values are roughly the same. Now I sort the data, choose the 10th
        order statistic, and replace it by the outlier 200.



        x.sort = sort(x);  x.20 = x.sort[20];  x.20
        [1] 138.1427
        x.sort[20] = 200; x.sort[20]
        [1] 200
        summary(x.sort); sd(x.sort); mad(x.sort)
        Min. 1st Qu. Median Mean 3rd Qu. Max.
        60.01 84.13 98.49 101.77 111.50 200.00
        [1] 28.79103
        [1] 20.83691

        boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
        main="Boxplot of Modified Sample")


        enter image description here



        Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
        Also, the MAD was not increased (20.83691 before and after), but the sample SD
        has increased noticeably (roughly, from 19.5 to 28.8).



        One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.






        share|cite|improve this answer














        Here is an example using the definition of MAD implemented in R statistical
        software: For the sample $X_i, dots, X_n,$
        $$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
        where $H$ is the median of the sample, and the constant multiple is intended
        to put values on a scale so that MAD and sample standard deviation $S$ are
        roughly comparable for large normal samples. So according to this definition
        MAD is based on the Median of the absolute differences from the sample median.



        Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



        x = rnorm(20, 100, 15)
        summary(x); sd(x); mad(x)
        Min. 1st Qu. Median Mean 3rd Qu. Max.
        60.01 84.13 98.49 98.67 111.50 138.14
        [1] 19.50935
        [1] 20.83691

        boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


        enter image description here



        So the two values are roughly the same. Now I sort the data, choose the 10th
        order statistic, and replace it by the outlier 200.



        x.sort = sort(x);  x.20 = x.sort[20];  x.20
        [1] 138.1427
        x.sort[20] = 200; x.sort[20]
        [1] 200
        summary(x.sort); sd(x.sort); mad(x.sort)
        Min. 1st Qu. Median Mean 3rd Qu. Max.
        60.01 84.13 98.49 101.77 111.50 200.00
        [1] 28.79103
        [1] 20.83691

        boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
        main="Boxplot of Modified Sample")


        enter image description here



        Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
        Also, the MAD was not increased (20.83691 before and after), but the sample SD
        has increased noticeably (roughly, from 19.5 to 28.8).



        One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited 3 hours ago

























        answered 4 hours ago









        BruceET

        34.7k71440




        34.7k71440






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006958%2fif-we-change-only-one-value-of-a-data-set-will-the-mean-absolute-deviation-beha%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Berounka

            Sphinx de Gizeh

            Different font size/position of beamer's navigation symbols template's content depending on regular/plain...