Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD).
The importance of Data Mining are:-
-Discovery of Pattern
-Prediction of key outcomes
-Creation of actionable information(Data mining can derive actionable information from large volumes of data. For example, a town planner might use a model that predicts income based on demographics to develop a plan for low-income housing.)
Data Mining process includes the following 4 stages:-
1. Problem Definition - This step is used to clear the definition for which data mining is to be done. Defining the problem is the most important part of the complete process. The whole data extraction and analysis depends on the data required. If you need to know which store is performing poorly in a locality then you need data on the store level.
2. Data Collection - The data is extracted from a humongous amount of data. The correct data need to be extracted from the raw data. It is then cleaned and sampled as well. Cleaning the data protects the analysis from exception handling.
3. Model Building - In this phase, you select and apply various modeling techniques and calibrate the parameters to optimal values. It is time to evaluate how well the model satisfies the originally-stated business goal (Stage 1)
4. Knowledge Deployment - Knowledge deployment is the use of data mining within a target environment. In the deployment phase, insight and actionable information can be derived from data.
1. Data Cleaning – Clean the data and remove noise and inconsistency
2. Data Integration – Combine multiple data source
3. Data Selection – Relevant data is retrieved from the data set
4. Data Transformation – Data is transformed in a useable format to solve the problem statement
5. Data Mining – Intelligent methods are applied in order to extract data pattern
6. Pattern Evaluation – Data patterns are evaluated
7. Knowledge Presentation – Showcase the evaluated pattern