Caret's train() & resamples() reverses sensitivity/specificity for GLM
The documentation for the glm()
function states, regarding a factor response variable, that
the first level denotes failure and all others success.
I assume caret's train()
function calls glm()
under the hood when using method = 'glm'
, and therefore the same applies.
So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success
event), I must follow this convention.
The problem is that, even though glm(), and thus caret's train()
function treats the second level factor as a success, caret's resamples
function (and $resample
variable) still treat the first level as success
/ positive
, and therefore sensitivity
and specificity
are the opposite of what they should be if i want to use resamples()
to compare against other models..
Example:
install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)
train_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
method = 'repeatedcv',
number = 5,
repeats = 5,
verboseIter = FALSE,
savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?
confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?
I can see the correct sens/spec in confusionMatrix
with positive = 'Yes'
but what is the solution for resamples()
so that I can accurately compare it against other models?
r r-caret glm resampling
add a comment |
The documentation for the glm()
function states, regarding a factor response variable, that
the first level denotes failure and all others success.
I assume caret's train()
function calls glm()
under the hood when using method = 'glm'
, and therefore the same applies.
So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success
event), I must follow this convention.
The problem is that, even though glm(), and thus caret's train()
function treats the second level factor as a success, caret's resamples
function (and $resample
variable) still treat the first level as success
/ positive
, and therefore sensitivity
and specificity
are the opposite of what they should be if i want to use resamples()
to compare against other models..
Example:
install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)
train_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
method = 'repeatedcv',
number = 5,
repeats = 5,
verboseIter = FALSE,
savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?
confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?
I can see the correct sens/spec in confusionMatrix
with positive = 'Yes'
but what is the solution for resamples()
so that I can accurately compare it against other models?
r r-caret glm resampling
add a comment |
The documentation for the glm()
function states, regarding a factor response variable, that
the first level denotes failure and all others success.
I assume caret's train()
function calls glm()
under the hood when using method = 'glm'
, and therefore the same applies.
So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success
event), I must follow this convention.
The problem is that, even though glm(), and thus caret's train()
function treats the second level factor as a success, caret's resamples
function (and $resample
variable) still treat the first level as success
/ positive
, and therefore sensitivity
and specificity
are the opposite of what they should be if i want to use resamples()
to compare against other models..
Example:
install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)
train_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
method = 'repeatedcv',
number = 5,
repeats = 5,
verboseIter = FALSE,
savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?
confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?
I can see the correct sens/spec in confusionMatrix
with positive = 'Yes'
but what is the solution for resamples()
so that I can accurately compare it against other models?
r r-caret glm resampling
The documentation for the glm()
function states, regarding a factor response variable, that
the first level denotes failure and all others success.
I assume caret's train()
function calls glm()
under the hood when using method = 'glm'
, and therefore the same applies.
So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success
event), I must follow this convention.
The problem is that, even though glm(), and thus caret's train()
function treats the second level factor as a success, caret's resamples
function (and $resample
variable) still treat the first level as success
/ positive
, and therefore sensitivity
and specificity
are the opposite of what they should be if i want to use resamples()
to compare against other models..
Example:
install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)
train_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
method = 'repeatedcv',
number = 5,
repeats = 5,
verboseIter = FALSE,
savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?
confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?
I can see the correct sens/spec in confusionMatrix
with positive = 'Yes'
but what is the solution for resamples()
so that I can accurately compare it against other models?
r r-caret glm resampling
r r-caret glm resampling
edited Nov 23 '18 at 1:16
jmuhlenkamp
1,403525
1,403525
asked Jun 6 '17 at 5:52
shaneker
1158
1158
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The following will invert the sensitivity:
temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)
caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model
Based on page 272 of the book Applied Predictive Modeling;
The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44382253%2fcarets-train-resamples-reverses-sensitivity-specificity-for-glm%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The following will invert the sensitivity:
temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)
caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model
Based on page 272 of the book Applied Predictive Modeling;
The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.
add a comment |
The following will invert the sensitivity:
temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)
caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model
Based on page 272 of the book Applied Predictive Modeling;
The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.
add a comment |
The following will invert the sensitivity:
temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)
caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model
Based on page 272 of the book Applied Predictive Modeling;
The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.
The following will invert the sensitivity:
temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)
caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model
Based on page 272 of the book Applied Predictive Modeling;
The glm() function models the probability of the second factor
level, so the function relevel() is used to temporarily reverse the
factors levels.
answered Nov 30 '18 at 3:37
user1420372
6681721
6681721
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44382253%2fcarets-train-resamples-reverses-sensitivity-specificity-for-glm%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown