INTRUSION DETECTION SYSTEM USING DECISION TREE AND APRIORI ALGORITHM Ms. Trupti Phutane PG Student, Computer Department G. H. Raisoni College of Engineering, Pune, India Prof. Apashabi Pathan, Asst. Professor, Computer Department, G. H Raisoni College of Engineering, Pune, India ABSTRACT Intrusion Detection System (IDS) has become important mechanism to protect the network. Data mining techniques makes it possible to search large amount of data for characteristics, rules and patterns. It helps to network for detecting intrusion and attacks. Here, we present intrusion detection model based on Decision Tree algorithm and Apriori clustering algorithm. Both Algorithms of Data Mining in Intrusion Detection System are able to predict new type of attacks based on the training data sets. Hence, data mining is important approach that is used in IDS (Intrusion Detection System). Previously, data mining based network intrusion detection system was giving accuracy and good detection on different types of attacks. In this paper, the performance of the data mining algorithms improved C5.0 are being used in order to detect the different types of attacks with high accuracy and less error prone as well as it helps to increase performance of the system. Keywords - Intrusion Detection System; KDD Dataset; Network Security; Decision Tree Algorithm Cite This Article: Ms. Trupti Phutane , Prof. Apashabi Pathan, Intrusion Detection System Using Decision Tree and Apriori Algorithm. International Journal of Computer Engineering and Technology, 6(7), 2015, pp. 09-18. http://www.iaeme.com/currentissue.asp?JType=IJCET&VType=6&IType=7 _____________________________________________________________________
1. INTRODUCTION Data mining technique is newly used in intrusion detection. Data mining is well known for “Data Retrieval process that is retrieved from the big collection of data. It http://www.iaeme.com/ijcet.asp
is used to retransform it into a statistically significant structures and events in data. There are many different types of data mining techniques such as K-Means,ID3,NB Tree etc. that has to keep track of classification, link analysis, clustering, association, rule abduction, deviation analysis, and sequence analysis. Data Mining presents an Intrusion Detection Model including these data mining techniques by extracting knowledge from the large datasets and by analyzing them. The above approach is known as the intrusion detection as data analysis model, whereas the previous techniques were knowledge engineering processes. As computer systems and the Internet have grown in size, complexity and demands has also grown simultaneously. These demands has lead IDS to monitor suspicious activity and network traffic on individual hosts and networks. With our huge capitalistic society where there is a demand that is given to the suppliers to fullfill. Hence, Suppliers gets seek to fullfill that demand and customer satisfaction. This emerges a big deal to the development of Intrusion Detection Systems. Some of these Intrusion Detection systems are considered as free open source applications, while remaining are considered as commercial products. As a result any organization considering implementing a IDS has a range of options available. The goal of this information is to cover different criteria that is helpfull to evaluate Network Intrusion Detection Systems. Organizations and companies use Internet services as their communication and marketplace to do business website. The increasing level of network activities and the increasing rate of network attacks is being advancing, impacting to the availability, confidentiality, and integrity of critical information data. Hence, the security tools should be used by the networking system such as firewall, antivirus, IDS and Honey Pot to prevent important data from criminal enterprises. Firewall cannot support the network against intrusion that when attempts during the opening port. Hence, Firewall is not only the option provided for the network system to prevent different types of attacks. So, In this paper, I am presenting the details of Apriori clustering algorithm and Decision Tree Algorithm used for intrusion detection System to detect and to prevent all different types of attacks.
2. LITERATURE REVIEWPreviously, in the paper, “Intrusion Detection Systems Using Decision Trees and Support Vector Machines”, the experiment was conducted using Decision Tree and Super Vector Machine and its performance was compared. After comparing its performance, the result was that, that accuracy of decision Tree was better than SVM for the classes-Probe, URL & R2L.As well as, Decision tree Supports Multi-class Classification and which is not supported by SVM.. In the paper,” Network Intrusion Detection Using Improved Decision Tree Algorithm”, the result shown according to the previously used C4.5 decision tree is 95.7 percent of attack detection accuracy. Here, using proposed decision tree using C5.0 gives more accuracy that is 96.9 percent with comparing of previously C4.5 technique..In the paper,” Improve Intrusion Detection Using Decision Tree with Sampling”, IDS aims to decrease Error rate and improve accuracy rate of attack detection in order to identify different types of attacks with good detection rate..In the paper,” An Efficient Intrusion Detection based on Decision Tree Classifier using Feature Reduction, the comparison and analysis of four machine learning algorithms of the data mining is done assuming their performances..In the paper, “Intrusion Detection System in Computer Networks Using Decision Tree and SVM Algorithms”, feature selection and application of the decision tree rules on IDS, the hybrid algorithm is used on decision tree and support vector machine(SVM).. http://www.iaeme.com/ijcet.asp
Intrusion Detection System Using Decision Tree and Apriori Algorithm
3. PROBLEM STATEMENT To determine the best way to classify and analyse the KDD99 data set in order to get high accuracy in the classification of attacks and in training time, and know any better way to identify each type of four attacks (Probe, Dos, U2R, R2L) in order to facilitate the task of choice.
3.1 DECISION TREE The values of its attributes can be used to classify the data items of the decision tree. The pre-classified data is being used to construct a Decision tree. The data items can be divided into classes and are partitioned. The process continues repeatedly for each subset and when all the data belongs to the same class, the process ends. The specificity of an attribute is denoted by a node of a decision tree. Every node has edges, they are eventually labeled as per their value of attribute in parent node. A leaf or a node is connected by an edge. For the categorization of a decision value labels the leaves. The training data is being used by an induction of data. However, the drawback involves the decision making of the attributes, thus classifying the data into various classes. This problem can be resolved by the ID3 algorithm, which uses the information theoretic approach. The impurity of the data items is measured by the concept of entropy using information theory. When all the data items belong to one class, the value of entropy is smaller. On the other hand, the value of entropy is higher when the data items have more classes. The usefulness of each attribute is denoted by the information gained, which is measured using entropy value. The weighted average impurity (entropy) is measured by the decrease in the information gain measure. The data items can be efficiently classified with the attributes with the largest information gain. Thus, the classification of the unknown object commences at the base of the decision tree, which follows the branch, ultimately reaching the leaf node towards the end. Several alogirithms implement the decision tree induction, which includes ID3, extending into C4.5 and C5.0. CART is also one of the decision tree algorithms. The advantages of C4.5 includes, being able to choose an appropriate attribute selection measure, handling continuous attributes, handling training data with missing attribute values and improves computation efficacy. The best attribute is used to construct a C4.5 using a set of data items, they are then further divided into subsets.
3.2 DECISION TREE AS INTRUSION DETECTION MODEL Binary decision tree classifier i.e the SVM is used to compare the decision tree classifier. 5 different classifiers can be used. The data is divided into two classes, the normal and the attack patterns. The attack patterns comprise of four classes namely the Probe, DOS, U2R and R2L. The primary aim is to divide normal and attack patterns, this same process is for all the 5 classes. The classifier is constructed and tested using the training data and the testing data respectively and the normal and attack data can be classified. The drawback of classification is intrusion detection, as each user is recognized as one of the attack types. However, decision tree works with large data, thus making it useful in real-time intrusion detection. Hence, the security officer can inspect the decision trees construct interpretability with ease and require minimum processing while using in rule-based models. The decision tree enables generalization accuracy, which is used for intrusion models, which enables to identify new intruisions.
3.3 INTRUSION DETECTION DATA RATE:TheKDD99 dataset contest uses a version of DARPA98 dataset .In KDD99 dataset, each example represents attribute values of a class in the network data flow, and each class is labeled either normal or attack. The classes in KDD99 dataset categorized into five main classes (one normal class and four main intrusion classes: probe, DOS, U2R, and R2L). 1. Normal connections are generated by simulated daily user behavior such as downloading files, visiting web pages. 2. Denial of Service (DoS) attack causes the computing power or memory of a victim machine too busy or too full to Handle legitimate requests. DoS attacks are classified based on the services that an attacker renders unavailable to legitimate users like apache2, land, mail bomb, back, etc. 3. Remote to User (R2L) is an attack that a remote user gains access of a local user/account by sending packets to a Machine over a network communication, which include send mail, and X lock. 4. User to Root (U2R) is an attack that an intruder begins with the access of a normal user account and then becomes a root-user by exploiting various vulnerabilities of the system .Most common exploits of U2R attacks are regular buffer-overflows, load-module, Fd-format, and Ffb -config. 5. Probing (Probe) is an attack that scans network together information or finds known vulnerabilities. An intruder with a map of machines and services that are available on a network can use the information to look for exploits. In KDD99 dataset these four attack classes (DoS, U2R, R2L, and probe) are divided into 22 different attack classes that tabulated.
3.4 Decision Tree and Apriori Algorithms: Decision Tree Algorithm:Step 1: Connect Client And Server Step 2:-IDS will Accept Input Data from Client Step 3:- Apply Apriori Algorithm Step 4:-If the Training Data (Attacks) from the KDD CupSet is matched with the Tested Data,then the o/p is same. Step 5:-Exit. Apriori Algorithm:Step 1: Association rule generation is usually split Up into two separate steps: Step 2: First, minimum support is applied to find all Frequent itemsets in a database. Step 3: Second, these frequent itemsets and the minimum confidence constraint are used to form rules.
4. PROPOSED SYSTEM We are using KDDCUPSET for storing types of attacks. The client packets go through the comparing of packets with defined packets and if new pattern is detected it is stored in KDDCUPSET for prohibiting further attacks by different clients. The client who attacked with new pattern is blocked after detecting new pattern. In KDDCUPSET we are storing predefined attacks for out testing. From that
Intrusion Detection System Using Decision Tree and Apriori Algorithm
KDDCUPSET we are taking patterns for attacks. We can store new patterns in that KDDCUPSET. 
Figure.1 Decision Tree Algorithm with Apriori Algorithm
If our system detect the attack and according to that attack the attack file is created inform of rows and columns. That file is compare with our dataset i.e. KDD Dataset. According to that comparison we detect and prevent the attack and generate the rules for that. In the above fig, Client is connected to Intrusion Detection System sending input packets to the system. There is KDD CUP Dataset is being used to store number of attack files. These attack files are already tested with the IDS. Now, Apriori Algorithm is applied here which consists of four types of input packets in terms of Attack files i.e Dos,Probe,U2R and R2L.These packets are known as Training Datasets. Then, the aggregation is done of those four types of attack files. If the training dataset is matched with the tested dataset , then the input packet is matched as an output packet. Proposed Intrusion detection technique is represented in flowchart 1. Data preprocessing is done to convert the non-numeric value to numeric value. The information obtained by KDD Cup99 can be a combination of many system calls. A system call is a text base record. 
Figure. 2. Flowchart of Proposed Decision tree Approach for Intrusion detection
5.1 APRIORI ALGORITHM Association rule generation is divided into two steps: 1. First, minimum support is applied to find all frequent itemsets in a database. 2. Second, these frequent itemsets and the minimum confidence constraint are used to form rules. While the second step is applied, the first step needs more attention. After finding all frequent itemsets in a database, it becomes very difficult since it involves searching all possible itemsets (item combinations). The set of possible itemsets is the power set over I and has size 2n 1 (excluding the empty set which is not a valid itemset). Although the size of the powerset grows exponentially in the number of items n in I, efficient search is possible using the downward-closure property of support (also called anti-monotonicity) which guarantees that for a frequent itemset, all its subsets are also frequent and thus for an infrequent itemset, all its supersets must also be infrequent. Exploiting this property, efficient algorithms (e.g., Apriori and Eclat) can find all frequent itemsets. Apriori Algorithm Pseudocode procedure Apriori (T, minSupport) //T is the database and minSupport is the minimum support L1= frequent items; for (k= 2; Lk-1 !=; k++) Ck= candidates generated from Lk-1 //that iscartesian product Lk-1 x Lk-1 and eliminating any k-1 size itemset that is not //frequent for each transaction t in database do #increment the count of all candidates in Ck that are contained in t Lk = candidates in Ck with minSupport //end for each//end for return ; As it is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
6. EXPERIMENTAL RESULT AND ANALYSIS The experimental results using Decision Tree Algorithm and Apriori Algorithm will achieve high detection rate on different types of network attacks & also increases speed and accuracy of the system. For distinguishing Intrusions and Normal Attacks, KDD Dataset has been used. The comparison of two graphs are done for accuracy and speed. The existing graph is compared with proposed graph and hence, shows the better results than the previous graph.
In the above graph, the percentage of Speed for Probe, R2L and U2R was less i.e 85%,71% and 68%.
Fig 4: Proposed Speed Graph
In the above graph, the percentage of the speed is increased for Probe,R2L and R2U i.e 95% 93% and 91%.
7. CONCLUSION Firstly, we have used an intrusion detection model using Decision Tree. For detecting various attacks, with high accuracy and less false alarm rates, the proposed Algorithm gives 96.9 percent of result. The experimental results on KDD dataset proposed algorithm achieved high detection rate on different types of network attacks. In this paper, we develop an intrusion detection system for detecting the intrusion behavior normal or Attack using Decision tree and Stratified weighted Sampling. A decision http://www.iaeme.com/ijcet.asp
Intrusion Detection System Using Decision Tree and Apriori Algorithm
Tree generates to build the system more accurate for attack detection. In this project, we are using Apriori Algorithm and Decision Tree together to use preprocessing step to KDD cup dataset which is classified in to three phase, data preprocessing phase, fusion decision phase and data call back phase. These strategies ensure the availability of our performance in terms of Accuracy Rate and Error rate. Stratified weighted sampling techniques to generate the samples from the original datasets and then apply the decision tree algorithm which overcomes the limitations of the ID3 algorithm. Hence the proposed method can be implemented for various datasets where size of data is large and result are very accurate with less Error rate than existing algorithm. Hence the CPU and memory utilization is decreased. Thus, proposed Approach is very apt and reliable for intrusion detection.
Intrusion Detection System Using Decision Tree Algorithm, Manish Kumar Asst. Professor, Dept. of Master of Computer Applications, M.S.Ramaiah Institute of Technology, Bangalore-560054,2012. Evaluation of Different Data Mining Algorithms with KDD CUP 99 Data Set, Safaa O. Al-mamory University of Babylon/college of computers and Sciences Firas S. Jassim University of Diyla /college of Sciences,Vol.(21): 2013 Association Rule Mining for KDD intrusion Detection Data Set,Asim Das and S.Siva Sathya, Department of Computer Science, Pondicherry University, Pondicherry, India :2012 Intrusion Detection Systems Using Decision Trees and Support Vector Machines Sandhya Peddabachigari, Ajith Abraham*, Johnson Thomas Department of Computer Science, Oklahoma State University, USA.June Decision Tree based Support Vector Machine for Intrusion Detection Mrs. Snehal A. Mulay Department of Information Technology, Bharati Vidyapith’s COE, Pune, India [email protected] Prof. P. R. Devale HOD, Department of Information Technology Bharati Vidyapith’s COE, Pune, India Prof. G.V. Garje HOD, Department of Computer and IT PVG’s COET, Pune, India.2010 Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System Jashan Koshal, Monark Bag Indian Institute of Information Technology Allahabad, Uttar Pradesh-211012, India.Aug 2012 Intrusion Detection System using Memtic Algorithm Supporting with Genetic and Decision Tree Algorithms 1K.P.Kaliyamurthie , 2D,Parameswari , 3DR. R.M. Suresh Mar 2012 Assistant Professor, Dept of IT, Bharath University. Chennai,Tamil Nadu600073. Assistant Professor, Dept of MCA,Jerusalem College of Engineering. Chennai,Tamil Nadu-600100. Professor & Head, Dept of CSE, RMD Engineering College. Chennai, Tamil Nadu 601206. Network Intrusion Detection Using Improved Decision Tree Algorithm K.V.R. Swamy, K.S. Vijaya Lakshmi Department Of Computer Science
and Engineering V.R.Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India.Sept 2011 2010 Combining Naïve Bayes and Decision Tree For Adaptive Intrusion Detection Dewan Md. Farid1, Nouria Harbi1, and Mohammad Zahidur Rahman2 1ERIC Laboratory, University Lumire Lyon 2 France 2Department of Computer Science and Engineering, Jahangirnagar University, Bangladesh.Apr 2010. An Efficient Intrusion Detection Based on Decision Tree Classifier Using Feature Reduction. Yogendra Kumar Jain and Upendra.Jan 2012. Intrusion Detection System in Computer Networks Using Decision Tree and SVM Algorithms Zeinab Kermansaravi 1, Hamid Jazayeriy1,2, Soheil Fateri1June 2013. (1) Computer Engineering Department, Islamic Azad University, Babol Branch, Babol, Iran (2) Electrical and Computer Engineering Department, Noshirvani University of Technology, Babol, Iran Intrusion Detection System using Support Vector Machine and Decision Tree Snehal A. Mulay Bharati Vidyapeeth University, Pune. An Improved Algorithm for fuzzy Data Mining for Intrusion Detection, German Florez, Susan M. Bridges, and Rayford B. Vaughn