体育赛事投注记录

advertisement

Investigation of Imbalanced Big Data Set Classification: Clustering Minority Samples Over Sampling Technique

  • Sachin PatilEmail author
  • Shefali Sonavane
Conference paper
  • 33 Downloads
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1162)

Abstract

most of the real-world data sets exhibit a skewed scenario of data distribution in contrast to the well-established data sets. the total number of instances of a particular class extremely surpasses the count of other classes. this uneven dispersal of classes leads to a state of imbalance data sets posing an extreme difficulty for learning procedures. additionally, due to its intrinsic complex data features, analyzing such imbalanced data sets has setup an avenue for focused researchers. imbalanced class distribution is effectively handled with over sampling of minority class data which is usually independent of the classifiers. a over sampling technique: clustering minority samples over sampling technique (cmsot) is proposed to enhance the classification of imbalanced data sets. the projected technique is implemented on apache hadoop under mapreduce environment. the data sets are mainly encompassed from the uci repository. the effect of true positive rates justifying the imbalance ratio including the examination of improved classification from the generated pool is studied. the achieved experimental results along with its corresponding statistical analysis of over sampled data sets clearly mark the supremacy of the planned technique to the selected benchmarking techniques.

Keywords

Imbalanced big data sets Relative difference Imbalance ratio Over sampling Safe-level 

References

  1. 1.
    Wei, W., Li, J., Cao, L., Ou, Y., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web. 4, 449–475 (2013)
  2. 2.
    Tomczak, J., ZięBa, M.: Probabilistic combination of classification rules and its application to medical diagnosis. Mach. Learn. 1–3, 105–135 (2015)
  3. 3.
    Chen, Y.: An empirical study of a hybrid imbalanced-class DT-RST classification procedure to elucidate therapeutic effects in uremia patients. Med. Biol. Eng. Compu. 6, 983–1001 (2016)
  4. 4.
    Elhag, S., Fernández, A., Bawakid, A., Alshomrani, S., Herrera, F.: On the combination of genetic fuzzy systems and pairwise learning for improving detection rates on intrusion detection systems. Expert Syst. Appl. 1, 193–202 (2015)
  5. 5.
    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
  6. 6.
    Del Río, S., López, V., Benítez, J., Herrera, F.: On the use of MapReduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
  7. 7.
    Jiang, H., Chen, Y., Qiao, Z., Weng, T., Li, K.: Scaling up MapReduce-based big data processing on multi-GPU systems. Cluster Comput. 1, 369–383 (2015)
  8. 8.
    Huang, J., Ling, C.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 3, 299–310 (2005)
  9. 9.
    Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intell. Data Anal. 5, 429–449 (2002)
  10. 10.
    He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)
  11. 11.
    Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: A review. Int. J. Pattern Recognit Artif Intell. 04, 687–719 (2009)
  12. 12.
    Maalouf, M., Trafalis, T.: Robust weighted kernel logistic regression in imbalanced and rare events data. Comput. Stat. Data Anal. 55, 168–183 (2011)
  13. 13.
    Japkowicz, N., Myers, C., Gluck, M.: A novelty detection approach to classification. InIJCAI 1, 518–523 (1995)
  14. 14.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
  15. 15.
    Han, H., Wang, W., Mao, B.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, Springer, Berlin, pp. 878–887 (2005)
  16. 16.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Berlin. pp. 475–482 (2009)
  17. 17.
    He, H., Bai, Y., Garcia, E., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, pp. 322–1328 (2008)
  18. 18.
    Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc. 1, 92–122 (2014)
  19. 19.
    Hu, F., Li, H.: A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Math. Problems Eng. (20130
  20. 20.
    Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: SMOTEBoost: Improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, Springer, Berlin. pp. 107–119 (2003)
  21. 21.
    Xiang, H., Yang, Y., Zhao, S.: Local clustering ensemble learning method based on improved AdaBoost for rare class analysis. J. Comput. Inf. Syst. 4, 1783–1790 (2012)
  22. 22.
    Gong, J., Kim, H.: RHSBoost: Improving classification performance in imbalance data. Comput. Stat. Data Anal. 111, 1–3 (2017)
  23. 23.
    Barua, S., Islam, M., Yao, X., Murase, K.: MWMOTE—majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 2, 405–425 (2012)
  24. 24.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: density-based synthetic minority over-sampling technique. Appl. Intell. 3, 664–684 (2012)
  25. 25.
    UCI machine learning repository. Accessed 13 Nov 2019

Copyright information

© Springer Nature Singapore Pte Ltd. 2021

Authors and Affiliations

  1. 1.Rajarambapu Institute of TechnologyUrun IslampurIndia
  2. 2.Walchand College of EngineeringSangliIndia

Personalised recommendations