Caret's train() & resamples() reverses sensitivity/specificity for GLM












4














The documentation for the glm() function states, regarding a factor response variable, that




the first level denotes failure and all others success.




I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.



So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.



The problem is that, even though glm(), and thus caret's train() function treats the second level factor as a success, caret's resamples function (and $resample variable) still treat the first level as success / positive, and therefore sensitivity and specificity are the opposite of what they should be if i want to use resamples() to compare against other models..



Example:



install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)

train_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
method = 'repeatedcv',
number = 5,
repeats = 5,
verboseIter = FALSE,
savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?


I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?










share|improve this question





























    4














    The documentation for the glm() function states, regarding a factor response variable, that




    the first level denotes failure and all others success.




    I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.



    So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.



    The problem is that, even though glm(), and thus caret's train() function treats the second level factor as a success, caret's resamples function (and $resample variable) still treat the first level as success / positive, and therefore sensitivity and specificity are the opposite of what they should be if i want to use resamples() to compare against other models..



    Example:



    install.packages('ISLR')
    library('ISLR')
    summary(Default)
    levels(Default$default) # 'yes' is second level on factor
    glm_model <- glm(default ~ ., family = "binomial", data = Default)
    summary(glm_model)

    train_control <- trainControl(
    summaryFunction = twoClassSummary,
    classProbs = TRUE,
    method = 'repeatedcv',
    number = 5,
    repeats = 5,
    verboseIter = FALSE,
    savePredictions = TRUE)
    set.seed(123)
    caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
    summary(caret_model)
    caret_model # shows Sens of ~0.99 and Spec of ~0.32
    caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

    confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?


    I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?










    share|improve this question



























      4












      4








      4


      1





      The documentation for the glm() function states, regarding a factor response variable, that




      the first level denotes failure and all others success.




      I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.



      So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.



      The problem is that, even though glm(), and thus caret's train() function treats the second level factor as a success, caret's resamples function (and $resample variable) still treat the first level as success / positive, and therefore sensitivity and specificity are the opposite of what they should be if i want to use resamples() to compare against other models..



      Example:



      install.packages('ISLR')
      library('ISLR')
      summary(Default)
      levels(Default$default) # 'yes' is second level on factor
      glm_model <- glm(default ~ ., family = "binomial", data = Default)
      summary(glm_model)

      train_control <- trainControl(
      summaryFunction = twoClassSummary,
      classProbs = TRUE,
      method = 'repeatedcv',
      number = 5,
      repeats = 5,
      verboseIter = FALSE,
      savePredictions = TRUE)
      set.seed(123)
      caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
      summary(caret_model)
      caret_model # shows Sens of ~0.99 and Spec of ~0.32
      caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

      confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?


      I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?










      share|improve this question















      The documentation for the glm() function states, regarding a factor response variable, that




      the first level denotes failure and all others success.




      I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.



      So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.



      The problem is that, even though glm(), and thus caret's train() function treats the second level factor as a success, caret's resamples function (and $resample variable) still treat the first level as success / positive, and therefore sensitivity and specificity are the opposite of what they should be if i want to use resamples() to compare against other models..



      Example:



      install.packages('ISLR')
      library('ISLR')
      summary(Default)
      levels(Default$default) # 'yes' is second level on factor
      glm_model <- glm(default ~ ., family = "binomial", data = Default)
      summary(glm_model)

      train_control <- trainControl(
      summaryFunction = twoClassSummary,
      classProbs = TRUE,
      method = 'repeatedcv',
      number = 5,
      repeats = 5,
      verboseIter = FALSE,
      savePredictions = TRUE)
      set.seed(123)
      caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
      summary(caret_model)
      caret_model # shows Sens of ~0.99 and Spec of ~0.32
      caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

      confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?


      I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?







      r r-caret glm resampling






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 '18 at 1:16









      jmuhlenkamp

      1,403525




      1,403525










      asked Jun 6 '17 at 5:52









      shaneker

      1158




      1158
























          1 Answer
          1






          active

          oldest

          votes


















          0














          The following will invert the sensitivity:



          temp <- Default
          temp$default <- fct_relevel(temp$default, "Yes")
          levels(temp$default)
          levels(Default$default)

          caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
          summary(caret_model)
          caret_model


          Based on page 272 of the book Applied Predictive Modeling;



          The glm() function models the probability of the second factor
          level, so the function relevel() is used to temporarily reverse the
          factors levels.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44382253%2fcarets-train-resamples-reverses-sensitivity-specificity-for-glm%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            The following will invert the sensitivity:



            temp <- Default
            temp$default <- fct_relevel(temp$default, "Yes")
            levels(temp$default)
            levels(Default$default)

            caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
            summary(caret_model)
            caret_model


            Based on page 272 of the book Applied Predictive Modeling;



            The glm() function models the probability of the second factor
            level, so the function relevel() is used to temporarily reverse the
            factors levels.






            share|improve this answer


























              0














              The following will invert the sensitivity:



              temp <- Default
              temp$default <- fct_relevel(temp$default, "Yes")
              levels(temp$default)
              levels(Default$default)

              caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
              summary(caret_model)
              caret_model


              Based on page 272 of the book Applied Predictive Modeling;



              The glm() function models the probability of the second factor
              level, so the function relevel() is used to temporarily reverse the
              factors levels.






              share|improve this answer
























                0












                0








                0






                The following will invert the sensitivity:



                temp <- Default
                temp$default <- fct_relevel(temp$default, "Yes")
                levels(temp$default)
                levels(Default$default)

                caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
                summary(caret_model)
                caret_model


                Based on page 272 of the book Applied Predictive Modeling;



                The glm() function models the probability of the second factor
                level, so the function relevel() is used to temporarily reverse the
                factors levels.






                share|improve this answer












                The following will invert the sensitivity:



                temp <- Default
                temp$default <- fct_relevel(temp$default, "Yes")
                levels(temp$default)
                levels(Default$default)

                caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
                summary(caret_model)
                caret_model


                Based on page 272 of the book Applied Predictive Modeling;



                The glm() function models the probability of the second factor
                level, so the function relevel() is used to temporarily reverse the
                factors levels.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 30 '18 at 3:37









                user1420372

                6681721




                6681721






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44382253%2fcarets-train-resamples-reverses-sensitivity-specificity-for-glm%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Berounka

                    Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

                    Sphinx de Gizeh