Assignment 2
Learning Objectives
- prepare data for modeling
- create data subsets for training and validation
- experiment with creating and accessing vectors and data frames
Data Files
Tasks
An organization has collected data on customer visits, transactions, operating system, and gender and desires to build a model to predict revenue. For the moment, the goal is to prepare the data for modeling.
- (5 pts) Locate the data set and load the data into R.
- (10 pts) Calculate the following summative statistics: total number of cases, mean number of visits, median revenue, maximum and minimum number of transactions, most commonly used operating system. Exclude any cases where there is a missing value.
- (15 pts) Create a scatterplot of visits (x-axis) versus revenue (y-axis).
- (30 pts) Impute missing transaction and gender values.
- (15 pts) Split the data set into two equally sized data sets where one can be used for training a model and the other for validation. Take every even numbered case and add them to the training data set and every even odd case and add them to the validation data set, i.e., row 1, 3, 5, 7, etc. are validation data while rows 2, 4, 6, etc. are training data.
- (15 pts) Split the data set into two equally sized data sets where one can be used for training a model and the other for validation. Take a random set of cases equal to 50% of the data set and make that the training data subsets. Assign the unselected cases for the validation data subset.
- (10 pts) Calculate the mean revenue for each of the four data sets and compare them.
Deliverables & Submission Instructions
You need to submit an .R extension file. Be sure to state all the assumptions and give explanations as comments in the .R file wherever needed to help us assess your submission. Please name the submission file LAST_FirstInitial_2.R for example for John Smith’s assignment, the file should be named Smith_J_2.R. Note in the comments anything that does not work or you did not complete. Make sure that whatever you submit works; no credit will be given for code that does not work. Upload the submission to Blackboard. Make sure you follow the R Programming Style Guide.
Scoring
Total Number of Earnable Points: 100
Approximate Time to Complete: 3-4 hours
Due Date: see Calendar or Blackboard
Approximate Time to Complete: 3-4 hours
Due Date: see Calendar or Blackboard