mini project
Use any data set or the Breast Cancer dataset
to do the hands-on project
Use the following learning schemes to analyze the data (example wine.arff).
C4.5
– weka.classifiers.j48.J48
Decision List
– weka. classifiers.PART
Or any 2 classifiers you prefer.
A) What is the most important descriptor (attribute) in wine.arff or the data set you chose?
B) How well were these two schemas able to learn the patterns in the dataset? How would you quantify your answer?
C) Compare the training set and 10-fold cross-validations scores of the two schemas.
D) Would you trust these two models? Did they really learn what is important for proper classification of wine?
E) Which one would you trust more, even if just very slightly?
Submit the hands-on project report in MS Word document.
Make sure to submit the screens shots of the answers or results of the analysis and explain or discuss the results.
Use the Hands-On PowerPoint slides guidelines to write the report.