Feature Selection Technique for improving classification performance in the web-phishing detection process

Anggit Ferdita  Nugraha; Dwiky Alfian  Tama; Dewi Anisa  Istiqomah; Surya Tri Atmaja  Ramadhani; Bayu Nadya  Kusuma; Vikky Aprelia  Windarni

doi:10.34306/conferenceseries.v4i1.667

Authors

Anggit Ferdita Nugraha University of Amikom Yogyakarta
Dwiky Alfian Tama University of Amikom Yogyakarta
Dewi Anisa Istiqomah University of Amikom Yogyakarta
Surya Tri Atmaja Ramadhani University of Amikom Yogyakarta
Bayu Nadya Kusuma University of Amikom Yogyakarta
Vikky Aprelia Windarni University of Amikom Yogyakarta

DOI:

https://doi.org/10.34306/conferenceseries.v4i1.667

Keywords:

Web-phishing, Feature Selection, Pearson correlation, Classification

Abstract

Web phishing is a type of cybercrime that occasionally threatens the online activities of website visitors. Web phishing uses a phoney website page that closely mimics the legitimate Website in order to fool its target into providing crucial information. Web phishing attacks also continue to grow in popularity year after year. As a result, it is vital to design a web phishing detection system in order to reduce the number of victims and financial losses caused by web phishing attacks. The development of a web phishing detection system continues to this day, with machine learning being the most often used model. Unfortunately, the construction of a machine learning-based web phishing detection system frequently employs only a single classification step; however, the feature selection process enables an increase in the performance of the resultant classification. Thus, an experiment was conducted in this paper by using a feature selection procedure based on the Pearson correlation algorithm prior to doing machine learning modelling utilizing popular algorithms such as Naive Bayes, Decision Tree, and Random Forest. As a result, using a web phishing dataset from the UCI Machine Learning Repository, it was determined that the addition of the feature selection process based on the use of decision tree and random forest algorithms resulted in an increase in accuracy of up to 94.60 percent and 95.50 percent, respectively, and a slight decrease in accuracy of 0.4 percent when implemented in the Naive Bayes algorithm.