The Optimal Inference Rules Selection for Unstructured Data Multi-Classification

  • Mariem Bounabi USMBA
  • Karim EL Moutaouakil
  • Khalid Satori
Keywords: FIS, FTF-IDF, FP-Growth, Text Mining, ML classifiers

Abstract

The Fuzzy Inference System (FIS) is frequently utilized in a variety of Text Mining applications. In the text processing domains, where the amount of the processed data is vast, inserting manual rules for FIS remains a real issue, especially in the text processing domains, where the size of the processed databases is enormous. Therefore, an automated and optimal inference rules (IR) selection strengthens the FIS process. In this work, we propose to apply the FP-Growth as an association model algorithm and an automatic way to identify IR for fuzzy text vectorization. Once the fuzzy vectors are generated, we call the selection variables algorithms, e.g., Info Gain and Relief, to reduce the given descriptor dimensionality. To test the new descriptor performance, we propose multi-classes text classifification systems using several machine learning algorithms. Applying benchmarked databases, the new technique to produce Fuzzy descriptors achieves a signifificant gain in time, precision rules, and weighting quality. Moreover, comparing the classifification systems, the accuracy is improved by 10% comparing with other approaches.

References

Mehmet Tolun Seda Sahin Kasim Oztoprak Research Interest (2016). Expert Systems. In book: Kirk-Othmer Encyclopedia of ChemicalTechnology.

U. P., V. Govindan and S. Madhu Kumar, “Enhanced sparse representation classifier for text classification”, Expert Systems with Applications, vol. 129, pp. 260-272, 2019. Available: 10.1016/j.eswa.2019.04.003.

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1-38.

C. N. Manikopoulos, M. C. Zhou and S. S. Nerukar, “Design and Implementation of Fuzzy Logic Controllers for a Heat Exchanger in a Water for Injection Systems,” J. of Fuzzy and Intelligent Systems, vol. 3, no. 1, pp. 43-57, 1995.

M. Bounabi, K. El Moutaouakil and K. Satori, “Association Models to Select the Best Rules for Fuzzy Inference System”, 2020.

M. Bounabi, K. Moutaouakil and K. Satori, “Text classification using Fuzzy TF-IDF and Machine Learning Models”, 4th International Conference on Big Data and Internet of Things, pp. 1-6, 2019.

M. Bounabi, K. Moutaouakil and K. Satori, “A comparison of text classification methods using different stemming techniques”, International Journal of Computer Applications in Technology, vol. 60, no. 4, p. 298, 2019. Available: 10.1504/ijcat.2019.101171.

M. Bounabi, K. Moutaouakil and K. Satori, “ The Automatic Option of Inference Rules for the Fuzzy TF-IDF” The 2nd International Conference on Electronics,

R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, very large databases, VLDB, vol. 1215, no. 20, pp. 487-499, 1994.

H. Bathla and K. Kathuria, “Apriori algorithm and filtered association in association rule mining”, Int J ComputSci Mob Comput, vol. 4, pp. 299-306, 2015.R. Agarwal and R. Srikant, “Fast algorithms for mining association rules”, In Proc. of the 20th VLDB Conference, pp. 487-499, 1994.

Borgelt, C. (2005, August). An Implementation of the FP-growth Algorithm. In Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations (pp. 1-5).

E. A. Fox,Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. Dissertation. Cornell University, USA. Order Number: AAI8328584. 1983.

J. A. Goguen, “L. A. Zadeh. Fuzzy sets. Information and control, vol. 8 (1965), pp. 338–353. - L. A. Zadeh. Similarity relations and fuzzy orderings. Information sciences, vol. 3 (1971), pp. 177–200.,” Journal of Symbolic Logic, vol. 38, no. 4, pp. 656–657, 1973.11

T., Kazuo,T. Niimura. An Introduction to Fuzzy Logic for Practical Applications. New York (N.Y.): Springer, 1997.

T. J. Ross (2009). Fuzzy logic with engineering applications. John Wiley & Sons.

T. Shaocheng, T. Jiantao and W. Tao, “Fuzzy adaptive control of multivariable nonlinear systems”, Fuzzy Sets and Systems, vol. 111.

Y. Gupta, A. Saini and A. Saxena, “A new fuzzy logic based ranking function for efficient Information Retrieval system”, Expert Systems with Applications, vol. 42, no. 3, pp. 1223-1234, 2015. Available: 10.1016/j.eswa.2014.09.009.

J. Jang, “ANFIS: adaptive-network-based fuzzy inference system”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 23, no. 3, pp. 665-685, 1993. Available: 10.1109/21.256541.

R. Babuska and H. Verbruggen, “An overview of fuzzy modeling for control”, Control Engineering Practice, vol. 4, no. 11, pp. ˇ 1593-1606, 1996. Available: 10.1016/0967-0661(96)00175-x.

L. A. Zadeh. Is there a need for fuzzy logic. Information sciences, 178(13), 2751-2779, 2008.

H. Zheng, J. He, G. Huang, Y. Zhang and H. Wang, “Dynamic optimisation based fuzzy association rule mining method”.International Journal of Machine Learning and Cybernetics, 10(8), 2187-2198, 2020.

A. El-Semary, J. Edmonds, J.Gonzalez-Pino, and M. Papa, “Applying data mining of fuzzy association rules to network intrusion detection”. In the Proceedings of Workshop on Information Assurance United States Military Academy. 2006, June

R.Chen, Y.C. Hu, G.H. Tzeng, “Finding fuzzy classification rules using data mining techniques”. Pattern Recogn. Lett. 24, 509–519 2003.

S. Puri, & S. P. Singh. “An Efficient Hindi Text Classification Model Using SVM”. In Computing and Network Sustainability,pp. 227-237. Springer, Singapore, 2019.

Pasquier, N. (2000, May). Extraction de Bases pour les R`egles d’Association `a partir des Item sets Ferm´es Fr´equents. In INFORSID’2000 Congress (pp. 56-77).

Mannila, H., Toivonen, H., & Verkamo, A. I. (1994, July). Effecient algorithms for discovering association rules. In KDD-94: AAAI workshop on Knowledge Discovery in Databases (pp. 181-192).

Agrawal, R., & Shafer, J. C. (1996). Parallel mining of association rules. IEEE Transactions on knowledge and Data Engineering, 8(6), 962-969.

Brin, S., Motwani, R., Ullman, J. D., &Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. Acm Sigmod Record, 26(2), 255-264.

Agarwal, R., &Srikant, R. (1994, September). Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference (pp. 487-499).

Han, J., Pei, J., /& Yin, Y. (2000). Mining frequent patterns without candidate generation. ACM sigmod record, 29(2), 1-12.

R. J. Urbanowicz, M. Meeker, W. Lacava, R. S. Olson, and H. Jason, “Relief-Based Feature Selection: Introduction and Review.”

E. C. Blessie and E. Karthikeyan, “Sigmis: A Feature Selection Algorithm Using Correlation Based Method,” vol. 6, no. 3, pp. 385–394, 2012.

N. Aharrane, K. El moutaouakil, and K. Satori. 2015. A comparison of supervised classification methods for a statistical set of features:Application: IEEE Amazigh OCR. In Intelligent Systems and Computer Vision (ISCV), pp. 1-8, March 2015.

D. Greene and P. Cunningham. “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering”, Proc. ICML 2006.

M. Sokolova, N. Japkowicz and S. Szpakowicz: “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation”, Lecture Notes in Computer Science, Vol. 4304, 2006, pp. 1015-102.

Published
2022-02-08
How to Cite
Bounabi, M., Karim EL Moutaouakil, & Khalid Satori. (2022). The Optimal Inference Rules Selection for Unstructured Data Multi-Classification. Statistics, Optimization & Information Computing, 10(1), 225-235. https://doi.org/10.19139/soic-2310-5070-1131
Section
Research Articles