Data Mining Work..
I need this work to be done.
The file BostonHousing.xls contains Housing data for 506 census tracts of Boston from the 1970 census. Following is the description of each variable in the file:
The original data are 506 observations on 14 variables, medv being the target variable:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per USD 10,000
PTRATIO pupil-teacher ratio by town
B 1000(B – 0.63)^2 where B is the proportion of blacks by town
LSTAT percentage of lower status of the population
MEDV median value of owner-occupied homes in USD 1000’s
CAT. MEDV categorical data for high (1) and low(0) median home prices.
Answer the following questions using SPSS Modeler:
1. Partition data in BostonHousing.xls into training (400 records) and test(106 records)).
2. Fit a multiple regression model to the median home price (MEDV) using CRIM, CHAS, and RM and evaluate the model based on reported statistics.
3. Write the equation for predicting the median home price and predict the median price for a home with CRIM=0.325 CHAS=1 , and RM=6.5
4. Fit a logistic regression using CAT.MEDV as your target (output) variable using the same predictor variables. Compare the reported statistics with statistics reported by multiple linear regression.