Data Mining Applications Used in Education Sector

The purpose of this work is to study the usage trends of Data Mining (DM) methods in education. It discusses different data mining techniques used for different types of educational data. The related papers were initially selected from the metadata containing words like Online Learning (OL) and Educational Data Mining (EDM) . The papers were then filtered on the basis of DM algorithms, the purpose of study, and the types of data used . The findings suggested that EDM is the most commonly used technique for the prediction of students‟ academic success, and the most used purpose is classification , followed by clustering and association . Further, this research also contains the study conducted on moodle data to find anomalies. K-means clustering was applied to find the optimal number of clusters on moodle data that consists of log and quiz dataset. The growth in the number of Internet users has increased learning through the online process. Hence, several activities are performed in OL systems, which generate a massive amount of data to be analysed to obtain useful information. Therefore, this type of research is very beneficial to academicians and instructors to identify the learner‟s behaviors and develop suitable models.


Introduction
The use of information and communication technology (ICT) in education has been rapidly growing in recent years. The use of ICT in education is gradually turning the conventional classroom teaching environments into online learning (OL) environment. The OL system improves the learning experience of students and reduces the need for the direct involvement of the instructor. As the OL system has become accessible through the internet, students can enroll themselves in the courses from anywhere and can be involved in different learning activities. It provides a huge repository of data through the Learning Management System (LMS). To analyse these data, different DM algorithms need to be applied to obtain meaningful information and represent it in a way to facilitate the process of decision making to increase the effectiveness of the learning process. Different types of data are generated from different sources, such as student attendance records, course information, curriculum, and classroom scheduled information. Similarly, data of varied and diverse kinds are also produced from various other web-based applications deployed in an educational environment such as educational games, virtual environments, discussion forums, notice board, interactive multimedia systems, online test/quizzes, user's activity logs, and various other learning contents and text.
The paper summarises several works done in the field of educational data mining. It addresses mainly two research questions: what are the different data mining algorithms applied in educational data? and what are the different types of data used in EDM? Hence the aim of this research paper is to disseminate the information about different EDM algorithms and different types of educational data used for the analysis. The related research papers were selected based on the trends of DM methods in educational data. So, the data used in this research were from the education sector, which includes students' personal records, previous academic records, log from the user's interaction with the system, midterm assessment records, and survey questionnaires.
This research paper also includes an analysis done in the moodle data of an undergraduate course called Human-Computer Interaction (COMP 341), offered by the Department of Computer Science and Engineering at Kathmandu University, Nepal. The study was conducted to find outliers in the moodle data. To achieve this task, unsupervised learning technique, i.e., k-means clustering method, is used. The paper concludes with the research directions, which include the expansion of research that can be conducted in the future. The paper identifies educationists, course instructors, online learning system administrators, and researchers working in the field of EDM as the major stakeholders who will be benefited from this research paper.

Data Mining
Data Mining (DM) is the process which deals with the automatic extraction and analysis of data from large sets of data to explore previously unknown patterns (Maimon & Rokach, 2005). It is a core step of Knowledge Discovery in Databases (KDD). KDD is the process of extracting information from data in the context of large data sets. It is iterative and interactive and consists of six steps, as shown in Figure 1. It starts with understanding the domain and ends with discovering the knowledge from the patterns generated by using DM methods. The discovered knowledge from the KDD process is used for different purposes, such as understanding student's behavior, assisting instructors, improving teaching methods, and evaluating and improving elearning systems (Romero et al., 2008). Hence DM involves methods to search for new and generalizable relationships and findings rather than attempting to test prior hypotheses (Collins et al., 2002).

Figure 1
Process of Knowledge Discovery in Database (Data Mining, 2019) Journal of Education and Research, Vol. 10, No. 2, 2020

Educational Data Mining
Educational Data Mining (EDM) is the application of DM techniques to educational data (Romero et al., 2008). It is concerned with developing methods that discover useful knowledge from data originating from educational environments. It utilises DM methods to better understand student"s performance through educational systems (De Morais et al., 2014;Rana & Garg, 2016). International Educational Data Mining Society described EDM as a new field of study that aims to prepare techniques for exploration of a specialised form of information received from the educational sector and the application of these techniques to improve understanding about students and the environment in which they imbibe knowledge (Siemens & Baker, 2012). It uses computational approaches to analyse educational data (Romero & Ventura, 2010). According to Romero and Ventura (2010), "EDM seeks to use these data repositories to better understand learners and learning, and to develop computational approaches that combine data and theory to transform practice to benefit learners" (p. 601). They classified the contribution provided by EDM activities into several categories such as analysis and visualization of data, Providing feedback for supporting instructors, Recommendations for students, Predicting student performance, Student modeling, Detecting undesirable student behaviors, Grouping students, Social network analysis, Constructing courseware, Developing concept maps and Planning and scheduling.

Educational Data Mining Techniques
There are several popular methods of EDM that can be applied in educational data, such as classification, clustering, and regression. Classification is a procedure of grouping of individual items based on quantitative information (Vasani & Gawali, 2014). Clustering is a technique of grouping the students according to their learning and interaction patterns (Romero & Ventura, 2013). This research paper implements the clustering technique in the moodle data of the course COMP 341. Regression is a DM technique used to predict a range of numeric values (also called continuous values), given a particular dataset. In EDM, regression analyses are used to predict a student"s knowledge. Regression has also been applied for predicting whether the student will answer a question correctly enough, and also to create a model that illustrates the student"s learning behavior (Romero & Ventura, 2010). Similarly, other methods like The process of EDM is conducted through the following steps (Rana & Garg, 2016): 1. Data Cleaning: Raw, noisy, and inconsistent data is cleaned by using various data cleaning methods such as smoothing (used to smooth the noisy data). 2. Data Integration: Data from varied sources are combined in a coherent data store. 3. Data Selection: Relevant data required for the analysis is selected from the database. 4. Data Transformation: Selected data is transformed into the forms appropriate for mining. 5. Data Mining: Data patterns are extracted using different intelligent methods of machine learning and artificial intelligence.
Further, Rana and Garg (2016) listed some of the applications of EDM as follows: • Data Analysis: It included the analysis of educational data to assist the course administrators for student"s academic performance related to decision making. • Students' Performance Prediction: EDM can be used to predict student"s performance using attributes such as grades, class performance, and position in the class. • Grouping Students: Clustering algorithms like Hierarchical clustering and Kmeans clustering algorithms can be used to group the students according to their respective nature and academic performance. • Classification: Classification algorithms such as Decision Trees, Naïve Bayes, and Logistic Regression can be used to predict student"s performance.
Slater et al. (2017) focused on EDM tools and other tools frequently used to conduct EDM analysis. These authors had covered the primary tools used by some of the core research groups and/or organizations in the field. They had mentioned prediction as one of the objectives of the analysis. Fauvel and Yu (2016) mentioned that the predictive method could be further split into a) sequential prediction and interpolation and b) supervised Learning and Descriptive, which is further split into a) clustering and b) exploratory analysis. Predictive methods are used to obtain single or multiple variables with predicted value from the predictor variable or group of variables. It is divided into three types: classification, regression, and prediction (Anoopkumar & Rahman, 2015). Classification is used to predict class labels in the form of perpetuating or discrete. The most commonly used classification approaches in EDM make use of decision trees and logistic regression (Anoopkumar & Rahman, 2015). Regression is utilised to derive a prediction from continuous variables. The most common regression approaches for EDM are neural networks and linear regression (Anoopkumar & Rahman, 2015). The next is the prediction of density, where predicted values are derived by using probability density function. Different kernel functions can be used in EDM to estimate the value of density, including Gaussian functions (Anoopkumar & Rahman, 2015).
The computational approach of EDM has led to research on learning analytics (LA). EDM and LA communities have emerged as alternatives to frequentist and Bayesian approaches for working with educational data (Romero & Ventura, 2007;Baker & Siemens, 2014). As the Society for Learning Analytics Research defines, Learning Analytics (LA) are the compilation, quantification, analysis, and notification of information related to students in relation to their individual characteristics so that the process of learning can be well understood and improved upon along with the surroundings in which it takes place (as cited in Siemens & Baker, 2012). The major goal of the LA goal is to improve student learning by giving a better environment (Simon, 2017).

Literature Review
El-Halees (2009) analysed students' learning behavior using EDM. The EDM methods of association rules, classification, clustering, and outlier detection were applied in this research. The study showed the usefulness of DM in higher education to improve student performance. To achieve this task, association rules were discovered from the data for excellent final grade students, classification rules were discovered using a decision tree, clustering of the student into a group was done using Expectation-Maximization (EM), and outlier analysis was conducted to detect outliers in data. Initial preprocessing of the data discovered that attendance, students" GPAs and lab grades were directly related to the final grades. Association rule suggested that students who failed in the final term also failed in midterms. A total of 37 outliers were found that could be used by the instructor to find out the students who need special attention. The study collected all available users' usage data from moodle and applied DM techniques to discover hidden knowledge, where this knowledge can be used to improve the student's performance and identify a group of students who need special attention. In 2012, they conducted a study of graduate students" data of 15 years (1993-2007) (Abu Tair & El-Halees, 2012). This research was done to improve graduate students" performance and overcome the problem of low grades of graduate students. Baradwaj and Pal (2011) applied the decision tree method using the J48 algorithm for classification to extract knowledge to describe students" performance in the endsemester examination. The findings suggested that the Previous Semester Marks (PSM) had the highest gain ratio than in a class test, seminar, assignments marks, general proficiency, attendance, and lab work. The knowledge extracted by the decision tree was represented in the form of IF-THEN rules. This study used the data set of 50 students from the session 2007 to 2010.
In the same year, Kumar and Vijayalakshmi (2011) proposed two algorithms: ID3 and C4.5 (J48) for classification to predict the student"s performance in the final exam based on the marks obtained in the first semester. For this purpose, the C4.5 decision tree was implemented. The research outcome was the prediction of the number of students who were likely to pass or fail. This research also performed a comparative analysis of C4.5 and ID3 based on the accuracy comparing the result of the tree with the original marks obtained and the time taken to derive the tree. C4.5 was found to be more efficient than ID3.
In 2013, a study was conducted to predict student"s performance in university results on the basis of their performance in the Unit test, assignment, graduation, percentage, and attendance (Borkar & Rajeswari, 2013). Several methods, like Association rule mining, Apriori algorithm, and Correlation coefficient, were implemented. The finding suggested that: To get good university performance, the student must be good at their assignment, attendance, and unit test. To support the research evidence, analysis of generated association rules and correlation coefficients values were carried out. However, the evidence of the research was not as strong as the correlation coefficient values between different attributes were not identical to the associations obtained from the Apriori algorithm. Despite that, the evidence presented was well connected to the claims as the rules generated by association rule mining and correlation coefficient Journal of Education and Research, Vol. 10, No. 2, 2020 values showed that the different attributes of students were dependent, which impact students' university performance. Ratnapala et al. (2014) used EDM techniques to conduct a quantitative analysis of student"s interaction with an e-learning system through instructor-led, non-graded, and graded courses. The finding suggested that the learning environment differentiation can change the student"s online access behavior. The majority of the student population were not self-motivated to do self-learning. Lack of interest and motivation to carry online learning, and the main course study side by side was found. However, the datasets were not large enough, and the reason to use k-means clustering was also not well justified in this research. Yukselturk et al. (2014) predicted dropout through data mining approaches in an online program. In this research, 3-Nearest Neighbor (NN) and Decision Tree (DT) were found to be more sensitive. Though 3-NN and DT were said to be more sensitive, it does not clarify in what ways and also accuracy can be questioned since the dataset was not large enough. Kadiyala and Potluri (2014) used the k-means clustering technique and decision tree technique for the analysis of students" academic performance. This study collected data of 200 students from their exam results and applied k means clustering method to group the students into three categories (i.e., low, medium, and high) based on students" performance in percentage. The result of clustering showed a low-performance group having a percentage less than 60, medium performance group having a percentage greater or equal to 60 and less than 85, and a high-performance group having a percentage greater than or equal to 85. The research also applied a decision tree to classify the patterns of students" performance in order to obtain specific knowledge to improve both the educational system and learners" performance.
In the same year, the research was conducted using an EM algorithm for clustering, which showed the groups of students with a similar characteristic of performance, and the J48 classifier was used for classification, which showed correctly classified instances with an accuracy of 96.6667 % (Prabha & Shanavas, 2014). This research explored the application areas of EDM in the OL system. The findings were: adopting DM tools and techniques in academic institution helped in improving decision making, improve the services they provide, and increase the student grades and retention. In this research, although the MATHS TUTOR, an LMS environment for school students for 6th, 7th, and 8th grade, was designed and implemented in three schools, the dataset was collected only from 6th grade for analysis. Further, it only used the EM algorithm for clustering and J48 for classification, which can be considered as the limitation of this research. A year later, Prabha and Shanavas (2015) conducted research to better understand how the students identify the settings in which they learn to improve education outcomes. Different methods such as EDM Classification for Model Construction and Model usage, Prediction to develop Predictive model, and Clustering for Classification of clustering algorithm techniques were used. The finding suggested that the classification of a students group according to their knowledge level with test marks will make easier for the teacher to concentrate the areas for weak students. To support the evidence, prediction model was developed with the use of classification algorithm (using "if-then rule"). Model construction using EDM and use of real datasets of 60 students from 6th grade logged into MATHS TUTOR were considered, which can be also taken as the limitation of this research.
In the same year, Kashyap and Chauhan (2015) conducted research focusing on the comparative analysis of various EDM techniques and Machine Learning (ML) algorithms. Different methods, such as association, clustering, and classification, were considered. The study showed that for classification, Naïve Bayes classification was the best algorithm in performance; for clustering, the k-means clustering algorithm was the best algorithm, and for the association, the Apriori algorithm was the best and more accurate as compared to other algorithms. As a continuation, Kashyap and Chauhan (2016) also conducted research on the comparative analysis of different ML techniques and compared the accuracy of different classification techniques. Decision Tree algorithms such as C5.0 and ID3 produced an accurate result for the classification of the structured educational dataset. To classify the unstructured educational dataset, Support Vector Machine (SVM), Naïve Bayes Classification, as well as Neural Network (NN) Classification produced accurate results in terms of several parameters such as speed and efficiency. Similarly, in bio-medical data analysis, decision tree algorithm C5.0 provided a better result than the C4.5 algorithm. Further, Neural Network (NN) Classifier produced a more accurate result than the Decision Tree and Naïve Bayes Classifier in terms of efficiency for the analysis of the Mammographic Mass dataset. In Bank Direct Marketing, SVM Classifier provided more accurate results in terms of speed and efficiency as compared to other classifiers. Preethi and Goswami (2015) conducted research to study the students" performance using classification methods such as Decision Tree (J48) and Bayesian (Naïve Bayes). The naïve Bayes classifier had an accuracy of 74%, and the J48 classifier had an accuracy of 73% in classifying instances. From the result of the J48 prediction model, it suggested that the time difference between posts (in mins) greater than 3 had the highest number of predictions to obtain grade "B". The dataset of 100 students was collected from an online examination for this research.
In the same year, Kaur et al. (2015) conducted research on predicting and analyzing students" performance and identifying slow learners among students in academics. Different classification algorithms such as Naïve Bayes, Multi-Layer Perception, SMO, J48, and REPTree were considered for this research. Multi-Layer Perception algorithms were found to be the best classifier, with an accuracy of 75% than other classifiers. Aziz et al. (2015) applied the Naïve Bayes classifier to extract the hidden pattern of Students" Academic Performance (SAP) to identify the parameters that influence the students" academic success. The study showed that Naïve Bayes applying 3-fold crossvalidation classified the instances with an accuracy of 57.4%. Among six different parameters, the family income had a high influence on SAP with 56.8% probability. Also, it showed that an average student category had a better classification with an accuracy of 68.5% than other categories such as poor and good. Pratiyush and Manu (2016) applied one of the supervised learning algorithms called SVM for the classification task. The main goal of this research was to predict the students" placement results in a labeled class as Yes or No. The study collected data of 200 students with six independent attributes such as Attendance, GPA, Reasoning Aptitude, Quantitative Aptitude, Communication Skills, Technical Skills, and one dependent attribute (i.e., Placement). This study showed how the classification result of the students" placement gives a better perception of how a particular group of students should perform and what they should target on new educational trends to get placed in the future. Journal of Education and Research, Vol. 10, No. 2, 2020 Regression, Random Forest, and Gradient Boosting Decision Tree (GBDT)). Among these models, GBDT produced the highest accuracy of 88%. Amrieh et al. (2016) applied different classifiers such as Artificial Neural Network (ANN), Naïve Bayesian (NB), and Decision Tree (DT). This research was conducted to propose a new student"s performance prediction model using a classification technique. The result showed that learner's behaviour (features) and their educational achievement had a strong relationship and one of the features (i.e., visited resources) was the most effective features. Using behavioral features, the accuracy of the prediction model achieved up to 22.1% improvement, while removing such features and using ensemble methods, the accuracy of the prediction model achieved up to 25.8% improvement. The accuracy of the prediction model was more than 80% through the testing and validation process. This study applied new data attributes/features called students' behavioral features, where these features related to students" interactivity in the OL system. The study also applied different ensemble methods such as Bagging, Boosting, and Random Forest to improve the accuracy of classifiers.
In the same year, Saa (2016) applied multiple classifier methods such as C4.5, ID3, CART, and CHAID to find a qualitative model to classify and predict the students" performance. The study showed the comparative analysis based on accuracy where CART had an accuracy of 40%, CHAID, and C4.5 had an accuracy of 34.07% and 35.19%, ID3 with the lowest accuracy of 33.33%, and Naïve Bayes classifier with an accuracy of 36.40%. Nichat and Raut (2017) measured the student performance using two methods of classification techniques such as Decision Tree Induction Algorithm and Decision Trees. The finding suggests that the C4.5 algorithm was more accurate and took less execution time than ID3 with different data sizes and early analysis of student's performance helped in time management. The evidence provided in this research was enough for exploring the student's performance as satisfactory or not satisfactory and their weaknesses in a particular subject or field, which helped to predict the performance of the student activity. association rules were applied. The research helped teachers to assess students to identify their gaps and find courses that match their levels without being lost in the large volume of videos available on the internet. Almarabeh (2017) applied five classifiers: Naïve Bayes, Bayesian Network, ID3, J48, and Neural Network to predict and analyse students" performance in the university. The experimental result showed Bayesian Network as the best classifier with an accuracy of 92.0% than other classifiers such as Naïve Bayes, J48, Neural Network, and ID3 with an accuracy of 91.11%, 91.11%, 90.2%, and 88.0%, respectively. For this study, students" data consisted of 225 instances and 10 attributes.
Al-Shehri et al. (2017) applied both SVM and KNN for the classification task to find the best prediction model based on their accuracy. So, the model developed was used to predict the students" grades. For this study, data of 375 students were collected. The experiment result showed that SVM achieved a slightly better result of 96% accuracy than KNN with 95% accuracy. Kapur et al. (2017) used different classification algorithms such as Decision tree (J48), Naïve Bayes, Random Forest, Naïve Bayes Multinominal, K-star, and IBk. This research intended to study and compare all classification algorithms to find wellperforming algorithms for students" final marks prediction. The experiment result showed that Random Forest had higher correctly classified instances with an accuracy of 76.666% than other methods. The dataset used in this study contained 480 entries of students with 16 attributes. Costa et al. (2017) investigated the effectiveness of algorithms used for the early prediction of students who are likely to fail. Four prediction techniques, SVM, Decision Tree via J48, Neural Network, and Naive Bayes, were applied on the dataset of 424 undergraduate students. The study showed that the SVM technique was effective than the other three techniques with an efficiency of 92%. Sarra et al. (2018) studied the usefulness of DM for determining students who are at higher risk of failure and more likely to drop out. For this purpose, they created a profile of students through Bayesian Profile Regression (BPR) on the basis of student"s performance, motivation, and resilience with the data collected through an online questionnaire. The study suggested that BPR can be used for identifying students who

Data Mining Applications Used in Education Sector | 39
Journal of Education and Research, Vol. 10, No. 2, 2020 are at high risk of dropping out, and necessary steps could be taken by the instructor in hand. Hussain et al. (2018) first applied features selection methods such as correlationbased attribute evaluation, gain-ratio attribute evaluation, information-gain attribute evaluation, relief attribute evaluation, and symmetrical uncertainty attribute evaluation. Then four different classification algorithms such as J48, Random Forest, BayesNet, and PART were implemented. The main aim of this research was to find highly influential attributes of students' academic performance and also compare four classification algorithms, such as J48, Random Forest, BayesNet, and PART. The experiment result showed that the Random Forest Classification method was the bestsuited algorithm for the dataset with an accuracy of 99% (84.33% without selected features) than other classification methods such as PART (74.33%), J48 (73%), and BayesNet (65.33%), with selected attributes. The datasets consist of 300 records with 24 attributes. Feature selection methods included 12 most influencing features.

Data Mining Application in Moodle Data
The data mining process involves successively processing raw data into more refined forms, enabling further processing of the data and the extraction of relevant information. It can be broken down into three core processes: Data Preprocessing, Pattern Recognition, Interpreting Results (Kamath, 2009). The datasets have many outliers or anomalies. Removal of these anomalies can help in the better prediction of student performance and behavior. Hence the study was conducted to detect the anomalies using the k-means clustering method.
The dataset used in this study was obtained from the computer students of Kathmandu University, Nepal, enrolled in the course COMP-341 (Human-Computer Interaction). There were two types of logs obtained from Moodle. The system log data from the Moodle had 14840 observations, including the column headers. There were nine columns for the active 128 users in the system. The columns present in the data were: Time, User's full name, Affected user, Event context, Component, Event name, Description, Origin, and IP address. The quiz grade dataset from Moodle had 105 observations, including the column headers. The dataset had 10 columns for each user altogether. The columns present in the data were: First name, Surname, ID number, Institution Department, Email address, Assignment: Mini-Research Project Proposal Journal of Education and Research, Vol. 10, No. 2, 2020 "assignment" (Real), Quiz: quiz1 (Real), Quiz: COMP 341 MCQ (Real), and Last downloaded from this course. The data was prepared so that a minimal set of features were chosen for clustering the data. The total number of log count and the average grade of a student from Quiz 1 and 2 were taken as the most effective features. All kinds of a click in the moodle system were extracted from the log data of each student. After extracting the click count for different modules, the total number of click counts were calculated for each student (Grade count is out of 20). The data were then narrowed down to "Total.Click" and "Grade" for each student. After pre-processing the data, a scatter plot of Grade vs. Total.Click was generated to study the relationship between these features. Figure 3 showed that most students, despite having low clicks performed above average, i.e., greater than 18.89. The students with the number of clicks above 100 always scored more than the average grade. Thus, the scatter plot shows a direct relationship between Total.Click and Grade. A higher number of interactions in the Moodle system resulted in a higher quiz score than normal.  Clustering is finding a group of objects such that the objects in one group will be like one another and different from the objects in another group. In EDM, clustering can be used to distinguish between student activities and their behaviors. In this study, students are clustered into groups according to their activity and exam scores. K-means clustering is one of the simplest and most commonly used clustering methods for splitting datasets into k groups. It is a popular clustering mechanism based on the distance between objects. The 'k' in k-means is the number of clusters that k-means should generate. The dataset needs to be standardised (i.e., scaled) to make variables comparable. After standardization, the data is fed to the k-means algorithm. The cluster size can be increased by one each time to see if a cluster outside of the normal group of clusters was formed.

Figure 3
Comparison of K-Means Clusters for k=2,3,4 and 5 The elbow method and average silhouette method were used to find out the optimal number of clusters. Determining optimal clusters requires an optimal value of "k'. The elbow method defines clusters, such as the total intra-cluster variation (total withincluster sum of squares) is minimised. On the other hand, the average silhouette approach measures the quality of clustering, i.e., it determines how well each object lies within its cluster. A high average silhouette indicates good clustering (Air Force Institute of Technology, n.d., Average Silhouette Method section, para. 1).

Figure 4
Elbow Method for Determining the Optimal Number of Clusters Figure 4 suggests that 3 is the optimal number of clusters as it appears to bend in the knee (or elbow). As both approaches suggested 3 as the optimal number of clusters, the final analysis and clusters are extracted using k=3.

Figure 6
K-Means Clustering with k=3 The k-means algorithm split the group into 3 groups: i. Cluster 1: Students with a high number of total clicks and high grades. ii. Cluster 2: Students with a low number of total clicks and below-average grades. iii. Cluster 3: Students with a moderate number of total clicks with high grades.
In the next step, the distance between the objects and cluster centers are calculated, and three largest distances from the result are identified as outliers. Table 3 shows that cluster 1 has two outliers and cluster 2 has one outlier.

Discussion and Conclusion
Clustering of quiz and log data of 128 students enrolled in the course COMP 341resulted in some interesting results. The main reason to use clustering was to group data and to identify outliers. The clustering method k-means was used to find the outliers. Out of 128 data points, 3 data points were flagged as outliers by this clustering technique. The result shows that 3 students did not conform to the normal distribution Journal of Education and Research, Vol. 10, No. 2, 2020 of the data. In terms of clustering, these data points were far off from the clusters that were formed and had a large intra-cluster distance, among other data points within the cluster.
Educational Data Mining is one of the emerging and promising area in the field of Information Technology (IT). Due to the growth of OL users, EDM is growing its popularity as it is a suitable approach to identify the learner"s behaviors and build the predictive model. This paper includes several pieces of research done in the field of EDM. The paper summarises different DM algorithms used in various types of educational data. It also consists of the study done on the moodle data of the course COMP 341. The study implements the k-means clustering method to find the anomalies in the dataset.
The emerging research on EDM has led to the researchers on the integration of educational theories and developing an environment for the notification of information, also called Learning Analytics. Liu et al. (2017), in the paper on a data-driven personalization system, have mentioned that there was a little focus on pedagogical and pastoral contents of learning. Formulation or improvement of DM algorithms can also be made where LA can be very useful. Learning Analytics focuses on how these algorithms can be deployed and integrated into learning designs. It can provide visible improvements for students (Liu et al., 2017). Simon (2017) mentioned that LA could be data-driven, and the learners' log data can be studied to optimise learning. The author had further stated the major goals of LA, such as it helps students find more personalised ways to learn, help teachers to improve students learning, and help the student in their learning by giving a better environment. This paper also described different learning theories, such as behaviourisms, cognitivism, constructivism, and social constructivism, and highlighted that the integration of learning theories could provide a conceptual framework, allows to interpret what is observed, and also provides a solution to the problem which occur during learning. Wong et al. (2019) reasoned that a good understanding of how learning occurs, how learning can be supported, and how student characteristics influence learning is needed if the goal of LA is to understand and optimise learning. Based on a recent review of papers published in the Review of Educational Research (RER) journal over the last century, Murphy and Knight (2016) found that learning sciences have been guided by three predominant theoretical lenses: behavioral, cognitive, and contextual (Wong et al.,