Advance Biostats SPSS Assignment: Multiple Logistic Regression in Action
Due 4/13/19 7 p.m Est
Be on time and Original work
Know SPSS
Data Attached
Step By Step Guide
Multiple logistic regression is a model that uses analysis of predictor variables to make predictions as to the likelihood of occurrences of an outcome.
For this Assignment, you use multiple logistic regression to analyze a dataset. You identify assumptions required by multiple logistic regression and evaluate whether they have been met by the data. Finally, you interpret your results and evaluate the use of multiple logistic regression.
The Assignment
Use the Wk 7 Dataset (SPSS document)
- Variables and variable selection
- Use a table to list the variables, Sex, Age in Years, Serum Cholesterol, Obese, and Hypertension, and each of their levels of measurement.
- Create new variables Age_Cat and Chole_Cat:
- Age_Cat: Convert Age in Years into a categorical variable with 2 categories, Less than 40, 40 and greater
- Chole_Cat: Convert Serum Cholesterol into 3 categories, Under 200, 200-299, and 300 and greater
Add the new variables to each record by coding the responses to the original variable using the assigned categories. Be sure that the variable view in SPSS has the correct information on the 2 new variables.
- Simple Binary Logistic Regression
- Use Hypertension as the dependent variable and Chole_Cat as the independent variable in the first model. Report the Odds Ratio and significance of the Odds Ratio for the relationship between the dependent and independent variables.
- Use Hypertension as the dependent variable and Serum Cholesterol (the original variable) as the independent variable in the second model. Report the Odds Ratio and significance of the Odds Ratio for the relationship between the dependent and independent variables.
- How does the level of measurement for the independent variable affect the outcome (include the OR and its significance in your response)? How does the level of measurement of the independent variable change your interpretation of the Odds Ratio?
- Multivariate Logistic Regression
- Run a multivariate binary logistic regression model using SPSS and Hypertension as the dependent variable, Chole_Cat, Age_Cat, Obese, and Sex as the Covariates. Include the output in your submission.
- Identify the Odds Ratio and the significance of the Odds Ratio for each of the covariates. How has the relationship between Chole_Cat and Hypertension changed with the addition of the other variables (compare to the output from # 2a)?
- Test the assumption that the model fits the data using the Hosmer-Lemeshow Goodness of Fit test. Interpret the Chi Square statistic given in the output of this test and state what it means in terms of the assumptions needed to use logistic regression with this data.
- Rerun the logistic regression model from #3a and use the save function to create the following new variables: Predicted Probabilities, Deviance Residuals, and Cook’s Distance. Evaluate the model using these saved variables and the following Scatter Plots.
- Create a Scatter Plot of the Deviance Residuals (DEV) and the variable ID: Are there any outliers? What does this mean when evaluating your model?
- Create a Scatter Plot of Cook’s Distance (COO) and the variable ID: Are there any influential cases? What does this mean when evaluating your model?
- Create a Scatter Plot of Deviance (DEV) and the Predicted Probabilities (PRE). Discuss whether anything in this scatterplot could cause you some concern in terms of your model.
(NOTE FROM INSTRUCTOR) Not all information generated from a SPSS analysis (in SPSS output window) need to be transferred to your paper You need to select (copy and paste) relevant tables and graphs only and use them in your assignment.
Also, as you know, explanation and interpretation for each graph or table needs to be placed by the table or graph.
It is difficult for reader of your work to keep going back and forth between your analysis and interpretation if they are not side by side.