This is a paper I wrote for a recent class I took on Statistics for Business at Ashford University. I thought it was pretty relevant to my blog so I would share in case anyone had the patience to actually read the whole thing 🙂 Enjoy!
PREDICTIVE ANALYTICS FOR BUSINESS
The application of predictive analytics towards business provides companies with competitive advantage and removes the guesswork of strategic decision-making. Predictive analytics is the practice of applying statistical data mining algorithms on historical business data to predict future customer behaviors or trends. This practice encompasses many methods and processes outside the scope of this paper. In this paper, I will dive into segment of predictive analytics related directly to predicting business outcomes using statistical data mining practices.
Data Mining Process
The process of predicting business outcomes using data mining practices begins with defining what type of outcomes the business is trying to predict. “The Knowledge Discovery and Data Mining (KDD) process consists of data selection, data cleaning, data transformation and reduction, mining, interpretation and evaluation, and finally incorporation of the mined “knowledge” with the larger decision making process.” (Microsoft Research) An example data-mining directive from a universities perspective might be to determine what factors influence a student’s propensity to drop. In this analysis, we must determine which data-mining tasks to perform as well as what algorithm to use. With the understanding of what tasks to perform and what algorithms to use, development begins by organizing and preparing the business data for the predictive model. Upon completion of the analysis, we need to run new data through the predictive model and leverage a visualization tool to interpret the results.
Data Mining Tasks and Algorithms
Data Mining Tasks
The specific data-mining tasks performed are based on the type of question in which the business is looking to answer. Sometimes the answers to these questions lead to new questions that were previously unknown. Below is summary of the most prevalent data-mining tasks performed to answer most business related questions.
Classification. The classification data-mining task is geared towards determining the relationship between categorical dependent variables towards some measure or fact. A good example would be to look at income levels of a student’s parents as a factor in predicting if the student will go to college. The results of this task are often presented in classification (also called decision) tree diagrams.
Regression. Regression analysis is a statistical technique in which we use observed data to relate a variable of interest, which is called the dependent (or response) variable, to one or more independent (or predictor) variables. (Bowerman, O’Connell, & Orris, 2009, p. 499) Similar to classification, regression analysis can be used to understand the relationship between qualitative variables in response to the change of a measure or quantitative variable. Following the example of a student’s likelihood to attend college, regression analysis could provide an understanding of where the largest variance in a parent’s income level has the most effect on a child’s chances of attending college. A common form of regression analysis is to use a simple linear regression model. In this analysis, the simple linear regression model assumes that a straight line can represent the relationship between the variables. (Bowerman, O’Connell, & Orris, 2009, p. 499). Other more complex models exist however; the simple linear regression model is used often in business to determine general trends in quantitative metrics.