Characteristics and Challenges of "Big Data"
This lesson describes the role of data and the data scientist in decision making and explains the overarching principles of collecting, integrating, and analyzing data. It explains the general concept of "big data" and the common challenges faced when working with large data sets. Introduces the CRISP-DM framework for data mining.
Objectives
Upon completion of this lesson, you will be able to
Upon completion of this lesson, you will be able to
- define the discipline of data science
- describe the essential characteristics of "big data"
- list the 6 V's of big data
- identify the key considerations in planning an analytics project
- describe the data science workflow
- differentiate between data, information, and knowledge
- break down a data mining project using CRISP-DM
Required Readings
- Chapter 1 in text book
- F. Provost, F. & Fawcett, T. (2013). Data Science and its relation to Big Data and datadriven decision making. Big Data 1(1).
- Braun, Mikio. Three Things About Data Science You Won't Find in Books. March 23, 2015.
- Lohr, Steve. For Big-Data Scientists, 'Janitor Work' is Key Hurdle to Insights. New York Times, August 17, 2014.
- Lorica, Ben. Data Analysis: Just one component of the Data Science Workflow. O'Reilly Radar, September 8, 2013.
- Lorica, Ben. Data Scientists tackle the analytic lifecycle. O'Reilly Radar, July 14, 2013.
- Shaerer, Colin (2000). The CRISP-DM Model: The New Blueprint for Data Mining. Journal of Data Warehousing. Volume 5(4).
Suggested Readings
- Loukides, Mike. What is Data Science? O'Reilly Radar, June 2, 2010.
- Definitions of "Big Data", OpenTracker.com
- Kevin Normandeau. Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity. September 12, 2013.
- Doug Laney. Data Management: Controlling Data Volume, Velocity and Variety. The Meta Group (now part of Gartner), February 6, 2001.
- Suer, Miles. Who Owns Enterprise Data. Informatica Perspectives, Nov 18, 2014.
- Suer, Miles. Analytics Stories: A Healthcare Case Study. Informatica Perspectives, Oct 23, 2014.
- Suer, Miles. Analytics Stories: A Pharmaceuticals Case Study. Informatica Perspectives, Jan 12, 2015.
- Chapman et al (2000). CRISP-DM 1.0: A Step-by-Step Data Mining Guide. SPSS.