Wednesday 23 July 2014

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage




OCCT: A ONE-CLASS CLUSTERING TREE FOR IMPLEMENTING ONE-TO-MANY DATA LINKAGE
ABSTRACT:
One-to-many data linkage is an essential task in many domains, yet only a handful of prior publications have addressed this issue. Furthermore, while traditionally data linkage is performed among entities of the same type, it is extremely necessary to develop linkage techniques that link between matching entities of different types as well. In this paper we propose a new one-to-many data linkage method that links between entities of different natures. The proposed method is based on a one-class clustering tree (OCCT)which characterizes the entities that should be linked together. The tree is built such that it is easy to understand and transform into association rules,i.e.,the inner nodes consist only of features describing the first set of entities, while the leaves of the tree represent features of their matching entities from the second dataset. We propose four splitting criteria and two different pruning methods which can be used for inducing the OCCT. The method was evaluated using datasets from three different domains. The results affirm the effectiveness of the proposed method and show that the OCCT yields better performance in terms of precision and recall (in most cases it is statistically significant) when compared to a C4.5 decision tree-based linkage method.
EXISTING SYSTEM:
 Data linkage is the task of identifying different entries (i.e., data items) that refer to the same entity across different data sources. The goal of the data linkage task is joining datasets that do not share a common identifier (i.e., a foreign key).Common data linkage scenarios include: linking data when combining two different databases; data deduplication (a data compression technique for eliminating redundant data) which is commonly done as a preprocessing step for data mining tasks identifying individuals across different census datasets linking similar DNA sequences and, matching astronomical objects from different catalogues. It is common to divide data linkage into two types: one-to-one and one-to-many. In one-to-one data linkage, the goal is to associate an entity from one dataset with a single matching entity in another dataset. In one-to-many data linkage, the goal is to associate an entity from the first dataset with a group of matching entities from the other dataset. Most of the previous works focus on one-to-one data linkage.
DISADVANTAGES OF EXISTING SYSTEM:
·       It is not secure.
·       It can able to do one to one data linkage only.
·       It consumes large amount of time.

PROPOSED SYSTEM:
We propose a new data linkage method aimed at performing one-to-many (and can be ex-tended to many-to-many) linkage. In addition, while data linkage is usually performed among entities of the same type, the proposed data linkage technique can match entities of different types. For example, in a student database we might want to link a student record with the courses she should take (according to different features which describe the student and features describing the courses). The proposed method links between the entities using a One-Class Clustering Tree (OCCT). A clustering tree is a tree in which each of the leaves contains a cluster instead of a single classification. Each cluster is generalized by a set of rules (e.g., a set of conditional probabilities) that is stored in the appropriate leaf.

ADVANTAGES OF PROPOSED SYSTEM:
·       It resolves three major problem data leakage prevention, recommender systems, and fraud detection.
·       It can able to detect abnormal access to database records that might indicate a potential data leakage or data misuse.

SYSTEM CONFIGURATION:-

HARDWARE REQUIREMENTS:-


ü Processor                  -        Pentium –IV

ü Speed                        -        1.1 Ghz
ü RAM                         -        512 MB(min)
ü Hard Disk                 -        40 GB
ü Key Board                -        Standard Windows Keyboard
ü Mouse                       -        Two or Three Button Mouse
ü Monitor                     -        LCD/LED

SOFTWARE REQUIREMENTS:

         Operating system :         Windows XP
         Coding Language :         Java
         Data Base             :         MySQL
         Tool                     :         Net Beans IDE

REFERENCE:
  Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2011-09-0577.

No comments:

Post a Comment