Building the Unstructured Data Warehouse: Architecture, Analysis, and Design
2014-05-26 13:07
253 查看
Building the Unstructured Data Warehouse: Architecture, Analysis, and Design
earn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now!
Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.
Master these ten objectives:
Build an unstructured data warehouse using the 11-step approach
Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure
Overcome challenges including blather, the Tower of Babel, and lack of natural relationships
Avoid the Data Junkyard and combat the Spider's Web
Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0, including iterative development
Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement
Design the Document Inventory system and link unstructured text to structured data
Leverage indexes for efficient text analysis and taxonomies for useful external categorization
Manage large volumes of data using advanced techniques such as backward pointers
Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances
The following outline briefly describes each chapter's content:
Chapter 1 defines unstructured data and explains why text is the main focus of this book.
Chapter 2 addresses the challenges one faces when managing unstructured data.
Chapter 3 discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development.
Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL).
Chapter 5 describes the 11 steps required to develop the unstructured data warehouse.
Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value.
Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes.
Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse.
Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important.
Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. In addition, the data warehouse appliance is discussed.
Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies.
earn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now!
Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.
Master these ten objectives:
Build an unstructured data warehouse using the 11-step approach
Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure
Overcome challenges including blather, the Tower of Babel, and lack of natural relationships
Avoid the Data Junkyard and combat the Spider's Web
Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0, including iterative development
Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement
Design the Document Inventory system and link unstructured text to structured data
Leverage indexes for efficient text analysis and taxonomies for useful external categorization
Manage large volumes of data using advanced techniques such as backward pointers
Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances
The following outline briefly describes each chapter's content:
Chapter 1 defines unstructured data and explains why text is the main focus of this book.
Chapter 2 addresses the challenges one faces when managing unstructured data.
Chapter 3 discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development.
Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL).
Chapter 5 describes the 11 steps required to develop the unstructured data warehouse.
Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value.
Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes.
Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse.
Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important.
Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. In addition, the data warehouse appliance is discussed.
Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies.
相关文章推荐
- Building the Unstructured Data Warehouse: Architecture, Analysis, and Design
- Managing the data warehouse and analysis services cube
- Building the Data Warehouse (3rd Edition)
- Building the Data Warehouse
- 对数据集“dsArea”执行查询失败。 (rsErrorExecutingCommand),Query execution failed for dataset 'dsArea'. (rsErrorExecutingCommand),Manually process the TFS data warehouse and analysis services cube
- Python Web-第六周-JSON and the REST Architecture(Using Python to Access Web Data)
- Reading Notes: The Data Warehouse Toolkit 2nd
- Self Learning Note <The Data Warehouse ETL Toolkit> - Chapter 3 Extracting
- Data Structures and algorithm analysis—1.2.4&1.2.5Modular Arithmeti&The P Word(数据结构—模数运算&P字)
- 评论数据库Win A Free Copy of Packt’s Managing Multimedia and Unstructured Data in the Oracle Database e-book
- Building the Enterprise Fabric for Big Data with Vertica and Spark Integration
- Incremental update for data warehouse analysis!
- Symbolic Data Analysis and the SODAS Software
- Designing Data Storage Architecture-Using the Windows Azure Storage Services
- Building a Robust Web Based Email Client (WebMail) Using the IP*Works! ADO.NET Data Provider(中文)
- [Clojure] Data Collection and Data Analysis on the music of www.xiami.com - Part 1
- Designing Data Storage Architecture - Introducing the Windows Azure Content Delivery Network
- Designing the Application Architecture - Building Windows Azure Service Part5: Worker Role Backgroun
- Building a Robust Web Based Email Client (WebMail) Using the IP*Works! ADO.NET Data Provider(原文)
- The Data Warehouse ETL Toolkit学习笔记-架构(规划与设计主线)