Caret's train() & resamples() reverses sensitivity/specificity for GLM

The documentation for the glm() function states, regarding a factor response variable, that

the first level denotes failure and all others success.

I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.

So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.

The problem is that, even though glm(), and thus caret's train() function treats the second level factor as a success, caret's resamples function (and $resample variable) still treat the first level as success / positive, and therefore sensitivity and specificity are the opposite of what they should be if i want to use resamples() to compare against other models..

Example:

install.packages('ISLR')

library('ISLR')

summary(Default)

levels(Default$default) # 'yes' is second level on factor

glm_model <- glm(default ~ ., family = "binomial", data = Default)

summary(glm_model)



train_control <- trainControl(

    summaryFunction = twoClassSummary,

    classProbs = TRUE,

    method = 'repeatedcv',

    number = 5,

    repeats = 5,

    verboseIter = FALSE,

    savePredictions = TRUE)

set.seed(123)

caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model # shows Sens of ~0.99 and Spec of ~0.32

caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?



confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

asked Jun 6 '17 at 5:52

shaneker

1158

add a comment |

The documentation for the glm() function states, regarding a factor response variable, that

the first level denotes failure and all others success.

I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.

So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.

Example:

install.packages('ISLR')

library('ISLR')

summary(Default)

levels(Default$default) # 'yes' is second level on factor

glm_model <- glm(default ~ ., family = "binomial", data = Default)

summary(glm_model)



train_control <- trainControl(

    summaryFunction = twoClassSummary,

    classProbs = TRUE,

    method = 'repeatedcv',

    number = 5,

    repeats = 5,

    verboseIter = FALSE,

    savePredictions = TRUE)

set.seed(123)

caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model # shows Sens of ~0.99 and Spec of ~0.32

caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?



confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

asked Jun 6 '17 at 5:52

shaneker

1158

add a comment |

The documentation for the glm() function states, regarding a factor response variable, that

the first level denotes failure and all others success.

I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.

So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.

Example:

install.packages('ISLR')

library('ISLR')

summary(Default)

levels(Default$default) # 'yes' is second level on factor

glm_model <- glm(default ~ ., family = "binomial", data = Default)

summary(glm_model)



train_control <- trainControl(

    summaryFunction = twoClassSummary,

    classProbs = TRUE,

    method = 'repeatedcv',

    number = 5,

    repeats = 5,

    verboseIter = FALSE,

    savePredictions = TRUE)

set.seed(123)

caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model # shows Sens of ~0.99 and Spec of ~0.32

caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?



confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

asked Jun 6 '17 at 5:52

shaneker

1158

The documentation for the glm() function states, regarding a factor response variable, that

the first level denotes failure and all others success.

I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.

So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.

Example:

install.packages('ISLR')

library('ISLR')

summary(Default)

levels(Default$default) # 'yes' is second level on factor

glm_model <- glm(default ~ ., family = "binomial", data = Default)

summary(glm_model)



train_control <- trainControl(

    summaryFunction = twoClassSummary,

    classProbs = TRUE,

    method = 'repeatedcv',

    number = 5,

    repeats = 5,

    verboseIter = FALSE,

    savePredictions = TRUE)

set.seed(123)

caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model # shows Sens of ~0.99 and Spec of ~0.32

caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?



confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?

r r-caret glm resampling

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

asked Jun 6 '17 at 5:52

shaneker

1158

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

asked Jun 6 '17 at 5:52

shaneker

1158

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

edited Nov 23 '18 at 1:16

jmuhlenkamp

1,403525

asked Jun 6 '17 at 5:52

shaneker

1158

asked Jun 6 '17 at 5:52

shaneker

1158

asked Jun 6 '17 at 5:52

shaneker

1158

add a comment |

1 Answer
1

active

oldest

votes

The following will invert the sensitivity:

temp <- Default

temp$default <- fct_relevel(temp$default, "Yes")

levels(temp$default)

levels(Default$default)



caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model

Based on page 272 of the book Applied Predictive Modeling;

The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.

answered Nov 30 '18 at 3:37

user1420372

6681721

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44382253%2fcarets-train-resamples-reverses-sensitivity-specificity-for-glm%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The following will invert the sensitivity:

temp <- Default

temp$default <- fct_relevel(temp$default, "Yes")

levels(temp$default)

levels(Default$default)



caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model

Based on page 272 of the book Applied Predictive Modeling;

The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.

answered Nov 30 '18 at 3:37

user1420372

6681721

add a comment |

The following will invert the sensitivity:

temp <- Default

temp$default <- fct_relevel(temp$default, "Yes")

levels(temp$default)

levels(Default$default)



caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model

Based on page 272 of the book Applied Predictive Modeling;

The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.

answered Nov 30 '18 at 3:37

user1420372

6681721

add a comment |

The following will invert the sensitivity:

temp <- Default

temp$default <- fct_relevel(temp$default, "Yes")

levels(temp$default)

levels(Default$default)



caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model

Based on page 272 of the book Applied Predictive Modeling;

The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.

answered Nov 30 '18 at 3:37

user1420372

6681721

The following will invert the sensitivity:

temp <- Default

temp$default <- fct_relevel(temp$default, "Yes")

levels(temp$default)

levels(Default$default)



caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)

summary(caret_model)

caret_model

Based on page 272 of the book Applied Predictive Modeling;

The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.

answered Nov 30 '18 at 3:37

user1420372

6681721

answered Nov 30 '18 at 3:37

user1420372

6681721

answered Nov 30 '18 at 3:37

user1420372

6681721

answered Nov 30 '18 at 3:37

user1420372

6681721

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htykuut