Time Series. Algorithms used to aid in time-series analysis are numerous as they are complex. With the vast number of them available we are only going to focus on the general principles considered by most algorithms. Time series has a unique type of classification method considering the timeline is a continuous scale that is relevant. Some issues with time-series algorithms are how to handle the gaps in the associated measures. Take our call center example. If we were to forecast call volume simply by looking at the daily average over the last four weeks we would have serious gaps in days the call center was closed (eg. Weekends and holidays). These gaps could greatly skew our results causing us to provide erroneous forecasts to the business stakeholders. By using a smoothing technique, we can filter out the days in which the call center is closed to get a more accurate projection of the call volume. Other factors may also influence the time-series algorithm such as seasonality. Most business processes have seasonality, which affects events such as call volume in the call center. By using a device known as a correlogram (Statistica) we can account for seasonality in the quantitative variables (eg. Call Volume) we are trying to measure. The net result, after applying all the relevant time-series algorithms, is an accurate, reliable forecast model for predicting future outcomes based on time.
Data Preparation
Once we fully understand the types of data-mining tasks and models needed to answer the business stakeholder’s questions, we begin the process of actually preparing the data for modeling and analysis.
Qualitative Variables (Dimensions)
The qualitative variables are the dimension by which predictive models slice and dice the predictions being modeled. The dimensions mainly include attributes such as Product Name and Color as well as hierarchies such as Product Sub-Category and Product Category.
Attributes. Attributes of a qualitative variable or dimension are the basic, often textual, bits of information related to business nouns. In the example related to a university a “student taking classes” contains two dimensions, students and classes. These nouns contain many attributes such as student name, address, and class type and location. All of these attributes may be leveraged by the data mining models and as a result need to be cleansed and standardized from the many different sources of data businesses use (eg. Line of Business applications).
It’s arduous to search out knowledgeable people on this subject, however you sound like you understand what you’re speaking about! Thanks
Ben – do you have this in single-page format? Thanks.
Aw, this was a really nice post. In idea I would like to put in writing like this additionally – taking time and actual effort to make a very good article… but what can I say… I procrastinate alot and by no means seem to get something done.
There are certainly a lot of details like that to take into consideration. That is a great point to bring up. I offer the thoughts above as general inspiration but clearly there are questions like the one you bring up where the most important thing will be working in honest good faith. I don?t know if best practices have emerged around things like that, but I am sure that your job is clearly identified as a fair game. Both boys and girls feel the impact of just a moment’s pleasure, for the rest of their lives.