Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Present an example illustrating such a huge and sparse data cube. Suppose that the data for analysis includes the attribute age. Answer: a Draw a star schema diagram for the data warehouse. This step is not required here as the data are already sorted.
Each new incoming image frame is only compared with the previous one, satisfying the time and resource constraint. Answer: Present an example illustrating such a huge and sparse data cube. Give your opinion of which might be more empirically useful and state the reasons behind your answer. Each step away from the center represents the stepping down of a concept hierarchy of the dimension. Probabilistic Graphical Models Principles And Techniques Solution.
Using the data for age given in Exercise 2. This book is a must-have for all instructors, researchers, developers and users in the area of data mining and knowledge discovery. It is especially poor when the percentage of missing values per attribute varies considerably. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. Answer: A department store, for example, can use data mining to assist with its target marketing mail campaign. However, one unique type of knowledge about stream data is the patterns of spatial change with respect to the time. Recent applications pay special attention to spatiotemporal data streams.
Conceptually, it is the length of the vector. Answer: For decision-making queries and frequently-asked queries, the update-driven approach is more preferable. Use Euclidean distance on the transformed data to rank the data points. We can, for example, use the data in the database to construct a decision tree to induce missing values for a given attribute, and at the same time have human-entered rules on how to correct wrong data types. Suppose that a data warehouse for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg grade.
Otherwise, update the value and propagate the result up. Based on your observation, describe another possible kind of knowledge that needs to be discovered by data mining methods but has not been listed in this chapter. To construct this spatial data warehouse, we may need to integrate spatial data from heterogeneous sources and systems. For median, keep a small number, p, of centered values e. Spectators may be students, adults, or seniors, with each category having its own charge rate.
Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. Otherwise, the median can easily be calculated from the above set. If a user needs to use spatial measures in a spatial data cube, we can selectively precompute some spatial measures in the spatial data cube. Can they be performed alternatively by data query processing or simple statistical analysis? Most of the results are published, but they are seldom recorded in databases with the experiment details who, when, how, etc. The spatial data sensed may not be so accurate, so the algorithms must have high tolerance with respect to noise. For example, the knowledge base may contain concept hierarchies and metadata e. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining.
These have become very popular in the past few years. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. If the sales amount in a tuple is greater than an existing one in the top-10 list, insert the new sales amount from the new tuple into the list, and discard the smallest one in the list. The resulting computed data cube for the billing database would have large amounts of missing or removed data, resulting in a huge and sparse data cube. Give examples of each data mining functionality, using a real-life database that you are familiar with. We seek to observe whether any new planet is being created or any old planet is disappearing.
The need for parallel and distributed data mining algorithms has been brought about by the huge size of many databases, the wide distribution of data, and the computational complexity of some data mining methods. How would you support this feature? Answer: a Use smoothing by bin means to smooth the above data, using a bin depth of 3. However, there is no commonly accepted subjective similarity measure. Data quality can be assessed in terms of accuracy, completeness, and consistency. Suppose your task as a software engineer at Big-University is to design a data mining system to examine their university course database, which contains the following information: the name, address, and status e. Each cluster that is formed can be viewed as a class of objects.
For the data collected in multiple heterogeneous databases to be used in decision-making processes, any semantic heterogeneity problems among multiple databases must be analyzed and solved so that the data can be integrated and summarized. When at the lowest conceptual level e. Use Euclidean distance on the transformed data to rank the data points. The instructor solutions manual is available for the mathematical, engineering, physical, chemical, financial textbooks, and others. Use the formula as shown in the hint to obtain the variance. Another challenge is the parallel, distributed, and incremental processing of data mining algorithms.
Also, some frequently used intermedi- ate mining results can be precomputed and stored in the database or data warehouse system, thereby enhancing the performance of the data mining system. For example, many experimental results regarding protein interactions have been published. Answer: a Enumerate three categories of measures, based on the kind of aggregate functions used in computing a data cube. For example, exceptions in credit card transactions can help us detect the fraudulent use of credit cards. If not, insert it into the summary table and propagate the result up.