VIF - Variance Inflation, when to remove the variable
up vote
0
down vote
favorite
I'm doing a regression analysis on cement mixtures. The goal is obviously to create the mixture with the most strength. Here are the following variables for me to work with:
Variables: Strength = Cement, Slag, Fly_Ash, Water, Superplasticizer, Coarse_Aggregate, Fine_Aggregate, Age
From reading some articles on cement mixtures, as well as analyzing my dataset, it seems that Fine Aggregate and Coarse Aggregate account for 75% of the kilograms in a mixture. So if I had a cement mixture of 1000kg, Fine and Coarse Aggregate account for 750kg of that combined.
When I ran the initial analysis, they had relatively high P-Values of 25% and 16%. We've been told that anything 10% (sometimes even 5%) or above is pretty bad and we should look to remove the variable. I decided, that with my 'knowledge' in cement I should keep these in (if possible) and try to run with it. So I carried forward with the model and checked the variance inflation, all were pretty good under the threshold of VIF = 10. I looked at the residuals/scatterplots (using SAS) and noticed that there is curvature in Age, suggesting a higher order term (Age x Age) and then an interaction that's very significant between Superplasticizer x FlyAsh. I ran the model with those interactions and then both aggregates had 96% and 75% P-Values which is tragic and suggests I should remove them.
I'm hard-pressed to remove them w/ the knowledge that almost all mixtures use it. I created an interaction between the aggregates to see if there was some way that would allow me to keep them (drop the P-values). So the interaction, when running with /vif
for variance inflation makes the P-values significant but the VIF number for the interaction is 200+. I know you ignore the VIF number of the first order variable, but do you look at the interaction? If so, I have to remove that obviously.
regression regression-analysis
add a comment |
up vote
0
down vote
favorite
I'm doing a regression analysis on cement mixtures. The goal is obviously to create the mixture with the most strength. Here are the following variables for me to work with:
Variables: Strength = Cement, Slag, Fly_Ash, Water, Superplasticizer, Coarse_Aggregate, Fine_Aggregate, Age
From reading some articles on cement mixtures, as well as analyzing my dataset, it seems that Fine Aggregate and Coarse Aggregate account for 75% of the kilograms in a mixture. So if I had a cement mixture of 1000kg, Fine and Coarse Aggregate account for 750kg of that combined.
When I ran the initial analysis, they had relatively high P-Values of 25% and 16%. We've been told that anything 10% (sometimes even 5%) or above is pretty bad and we should look to remove the variable. I decided, that with my 'knowledge' in cement I should keep these in (if possible) and try to run with it. So I carried forward with the model and checked the variance inflation, all were pretty good under the threshold of VIF = 10. I looked at the residuals/scatterplots (using SAS) and noticed that there is curvature in Age, suggesting a higher order term (Age x Age) and then an interaction that's very significant between Superplasticizer x FlyAsh. I ran the model with those interactions and then both aggregates had 96% and 75% P-Values which is tragic and suggests I should remove them.
I'm hard-pressed to remove them w/ the knowledge that almost all mixtures use it. I created an interaction between the aggregates to see if there was some way that would allow me to keep them (drop the P-values). So the interaction, when running with /vif
for variance inflation makes the P-values significant but the VIF number for the interaction is 200+. I know you ignore the VIF number of the first order variable, but do you look at the interaction? If so, I have to remove that obviously.
regression regression-analysis
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm doing a regression analysis on cement mixtures. The goal is obviously to create the mixture with the most strength. Here are the following variables for me to work with:
Variables: Strength = Cement, Slag, Fly_Ash, Water, Superplasticizer, Coarse_Aggregate, Fine_Aggregate, Age
From reading some articles on cement mixtures, as well as analyzing my dataset, it seems that Fine Aggregate and Coarse Aggregate account for 75% of the kilograms in a mixture. So if I had a cement mixture of 1000kg, Fine and Coarse Aggregate account for 750kg of that combined.
When I ran the initial analysis, they had relatively high P-Values of 25% and 16%. We've been told that anything 10% (sometimes even 5%) or above is pretty bad and we should look to remove the variable. I decided, that with my 'knowledge' in cement I should keep these in (if possible) and try to run with it. So I carried forward with the model and checked the variance inflation, all were pretty good under the threshold of VIF = 10. I looked at the residuals/scatterplots (using SAS) and noticed that there is curvature in Age, suggesting a higher order term (Age x Age) and then an interaction that's very significant between Superplasticizer x FlyAsh. I ran the model with those interactions and then both aggregates had 96% and 75% P-Values which is tragic and suggests I should remove them.
I'm hard-pressed to remove them w/ the knowledge that almost all mixtures use it. I created an interaction between the aggregates to see if there was some way that would allow me to keep them (drop the P-values). So the interaction, when running with /vif
for variance inflation makes the P-values significant but the VIF number for the interaction is 200+. I know you ignore the VIF number of the first order variable, but do you look at the interaction? If so, I have to remove that obviously.
regression regression-analysis
I'm doing a regression analysis on cement mixtures. The goal is obviously to create the mixture with the most strength. Here are the following variables for me to work with:
Variables: Strength = Cement, Slag, Fly_Ash, Water, Superplasticizer, Coarse_Aggregate, Fine_Aggregate, Age
From reading some articles on cement mixtures, as well as analyzing my dataset, it seems that Fine Aggregate and Coarse Aggregate account for 75% of the kilograms in a mixture. So if I had a cement mixture of 1000kg, Fine and Coarse Aggregate account for 750kg of that combined.
When I ran the initial analysis, they had relatively high P-Values of 25% and 16%. We've been told that anything 10% (sometimes even 5%) or above is pretty bad and we should look to remove the variable. I decided, that with my 'knowledge' in cement I should keep these in (if possible) and try to run with it. So I carried forward with the model and checked the variance inflation, all were pretty good under the threshold of VIF = 10. I looked at the residuals/scatterplots (using SAS) and noticed that there is curvature in Age, suggesting a higher order term (Age x Age) and then an interaction that's very significant between Superplasticizer x FlyAsh. I ran the model with those interactions and then both aggregates had 96% and 75% P-Values which is tragic and suggests I should remove them.
I'm hard-pressed to remove them w/ the knowledge that almost all mixtures use it. I created an interaction between the aggregates to see if there was some way that would allow me to keep them (drop the P-values). So the interaction, when running with /vif
for variance inflation makes the P-values significant but the VIF number for the interaction is 200+. I know you ignore the VIF number of the first order variable, but do you look at the interaction? If so, I have to remove that obviously.
regression regression-analysis
regression regression-analysis
asked yesterday
Speakmore
3616
3616
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006963%2fvif-variance-inflation-when-to-remove-the-variable%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown