CASIA OpenIR  > 智能感知与计算研究中心
Information-Theoretic Outlier Detection for Large-Scale Categorical Data
Wu, Shu1; Wang, Shengrui2
AbstractOutlier detection can usually be considered as a pre-processing step for locating, in a data set, those objects that do not conform to well-defined notions of expected behavior. It is very important in data mining for discovering novel or rare events, anomalies, vicious actions, exceptional phenomena, etc. We are investigating outlier detection for categorical data sets. This problem is especially challenging because of the difficulty of defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and an optimization model of outlier detection, via a new concept of holoentropy that takes both entropy and total correlation into consideration. Based on this model, we define a function for the outlier factor of an object which is solely determined by the object itself and can be updated efficiently. We propose two practical 1-parameter outlier detection methods, named ITB-SS and ITB-SP, which require no user-defined parameters for deciding whether an object is an outlier. Users need only provide the number of outliers they want to detect. Experimental results show that ITB-SS and ITB-SP are more effective and efficient than mainstream methods and can be used to deal with both large and high-dimensional data sets where existing algorithms fail.
KeywordOutlier Detection Holoentropy Total Correlation Outlier Factor Attribute Weighting Greedy Algorithms
WOS HeadingsScience & Technology ; Technology
Indexed BySCI
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Computer Science, Information Systems ; Engineering, Electrical & Electronic
WOS IDWOS:000314934900009
Citation statistics
Cited Times:23[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Corresponding AuthorWu, Shu
Affiliation1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing 100190, Peoples R China
2.Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
Recommended Citation
GB/T 7714
Wu, Shu,Wang, Shengrui. Information-Theoretic Outlier Detection for Large-Scale Categorical Data[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,2013,25(3):589-602.
APA Wu, Shu,&Wang, Shengrui.(2013).Information-Theoretic Outlier Detection for Large-Scale Categorical Data.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,25(3),589-602.
MLA Wu, Shu,et al."Information-Theoretic Outlier Detection for Large-Scale Categorical Data".IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 25.3(2013):589-602.
Files in This Item: Download All
File Name/Size DocType Version Access License
Information-theoreti(1401KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wu, Shu]'s Articles
[Wang, Shengrui]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wu, Shu]'s Articles
[Wang, Shengrui]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wu, Shu]'s Articles
[Wang, Shengrui]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Information-theoretic Outlier Detection for large-scale Categorical Data.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.