数据挖掘笔记 第一章:引言
2013-12-22 11:20
477 查看
教科书:数据挖掘:概念与技术(第二版),Jiawei Han和Micheline Kamber 著,机械工业出版社(2007)
Lecture 1: Introduction
1) Why data mining?
Necessity Is the Mother of Invention需要是发明之母
2) What is data mining?
Data mining (knowledge discovery from data从大量数据中提取或挖掘知识)
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data从大量的数据中挖掘哪些令人感兴趣的、有用的、隐含的、先前未知的和可能有用的模式或知识
Alternative names: Knowledge discovery (mining) in databases (KDD) 数据库中的知识挖掘
Steps of a KDD Process
Learning the application domain: relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:Find useful features, dimensionality/variable reduction, invariant representation
Choosing functions of data mining: summarization, classification, regression, association, clustering
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation: visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
Architecture: Typical Data Mining System
3) On what kind of data?
Traditional database and appllications
Relational database, data warehouse, transactional database关系数据库,数据仓库,事务数据库
Advanced database and advanced applications
Object-relational databases对象-关系数据库
Temporal database, sequence data (incl. biosequences), time-series data时间数据库、序列数据库和时间序列数据库
Spatial database and spatiotemporal database空间数据库和时间空间数据库
Text databases Multimedia database文本数据库和多媒体数据库
Heterogeneous databases and legacy databases异构数据库和遗产数据库
Data streams and sensor data数据流和传感器数据
Structure data, graphs, social networks and link databases
Text databases Multimedia database文本数据库和多媒体数据库
The World-Wide Web万维网
4) Data Mining Functionalities
Lass/concept description: Characterization and discrimination 类/概念描述: 特性化和区分
Frequent patterns, association, correlation and causality频繁模式、关联和相关
Classification and prediction分类和预测
Cluster analysis聚类分析
Outlier analysis离群点分析
Trend and evolution analysis趋势和演变分析
5) Are all the patterns interesting?
6) Classification of data mining systems
Lecture 1: Introduction
1) Why data mining?
Necessity Is the Mother of Invention需要是发明之母
2) What is data mining?
Data mining (knowledge discovery from data从大量数据中提取或挖掘知识)
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data从大量的数据中挖掘哪些令人感兴趣的、有用的、隐含的、先前未知的和可能有用的模式或知识
Alternative names: Knowledge discovery (mining) in databases (KDD) 数据库中的知识挖掘
Steps of a KDD Process
Learning the application domain: relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:Find useful features, dimensionality/variable reduction, invariant representation
Choosing functions of data mining: summarization, classification, regression, association, clustering
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation: visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
Architecture: Typical Data Mining System
3) On what kind of data?
Traditional database and appllications
Relational database, data warehouse, transactional database关系数据库,数据仓库,事务数据库
Advanced database and advanced applications
Object-relational databases对象-关系数据库
Temporal database, sequence data (incl. biosequences), time-series data时间数据库、序列数据库和时间序列数据库
Spatial database and spatiotemporal database空间数据库和时间空间数据库
Text databases Multimedia database文本数据库和多媒体数据库
Heterogeneous databases and legacy databases异构数据库和遗产数据库
Data streams and sensor data数据流和传感器数据
Structure data, graphs, social networks and link databases
Text databases Multimedia database文本数据库和多媒体数据库
The World-Wide Web万维网
4) Data Mining Functionalities
Lass/concept description: Characterization and discrimination 类/概念描述: 特性化和区分
Frequent patterns, association, correlation and causality频繁模式、关联和相关
Classification and prediction分类和预测
Cluster analysis聚类分析
Outlier analysis离群点分析
Trend and evolution analysis趋势和演变分析
5) Are all the patterns interesting?
6) Classification of data mining systems
相关文章推荐
- 数据挖掘笔记 第一章:引言
- 数据挖掘笔记 第一章:引言
- 数据挖掘笔记 第一章:引言
- 数据挖掘 自习笔记 第一章 绪论
- 数据挖掘:概念与技术 学习笔记 第一章
- SQL SERVER 2005 数据挖掘与商业智能完全解决方案---学习笔记(四)
- R语言笔记-R语言数据挖掘方法及应用--1
- 数据挖掘笔记(7)——应用
- 大数据学习笔记之四十一 数据挖掘算法之预测建模的回归模型
- 《python数据挖掘入门与实践》“电影推荐” 笔记3
- 【学堂在线数据挖掘:理论方法笔记】第三天(3.27)
- 机器学习&数据挖掘笔记_16(常见面试之机器学习算法思想简单梳理)
- SQL SERVER 2005 数据挖掘与商业智能完全解决方案---学习笔记(五)
- 数据挖掘笔记-分类-Adaboost-原理与简单实现
- pthon数据挖掘与分析实战【笔记】-第四章 数据预处理4.1数据清洗
- 【数据挖掘笔记十三】数据挖掘的发展趋势和研究前沿
- 《数据挖掘——概念和技术》笔记之数据预处理
- 数据挖掘之特征工程(笔记)
- python数据分析与挖掘学习笔记(3)_小说文本数据挖掘part1
- 数据挖掘笔记(1)