TY - JOUR AU - Mahdieh Ataeyan AU - Negin Daneshpour PY - 2021/06/09 Y2 - 2024/03/28 TI - Automated Noise Detection in a Database Based on a Combined Method JF - Statistics, Optimization & Information Computing JA - Stat., optim. inf. comput. VL - 9 IS - 3 SE - Research Articles DO - 10.19139/soic-2310-5070-879 UR - http://iapress.org/index.php/soic/article/view/879 AB - Data quality has diverse dimensions, from which accuracy is the most important one. Data cleaning is one of the preprocessing steps in data mining which consists of detecting errors and repairing them. Noise is a common type of error, that occur in database. This paper proposes an automated method based on the k-means clustering for noise detection. At first, each attribute (Aj) is temporarily removed from data and the k-means clustering is applied to other attributes. Thereafter, the k-nearest neighbors is used in each cluster. After that a value is predicted for Aj in each record by the nearest neighbors. The proposed method detects noisy attributes using predicted values. Our method is able to identify several noises in a record. In addition, this method can detect noise in fields with different data types, too. Experiments show that this method can averagely detect 92% of the noises existing in the data. The proposed method is compared with a noise detection method using association rules. The results indicate that the proposed method have improved noise detection averagely by 13%. ER -