Identifying students that are likely to have difficulty graduating is something that is of value to any educational system that cares about its students. If these students can be identified with enough anticipation, then action can be taken to make sure those students become more engaged. Perhaps those students could receive extra attention from the teachers, or the material could be restructured to ensure it is more engaging to those students.
Naive Bayes, Random Forrest and Support Vector Machine models were trained using Scikit-learn on a dataset of 395 students, which contained information such as:
Personal habits :
And of course a field indicating whether or not the student graduated or failed to graduate.
The machine learning algorithms used were:
An evaluation was made for which of these algorithms would be the most appropriate to recommend to a board of directors. Considering factors such as the likely dataset sizes that would be used for this task, the running time for each algorithm, and the accuracy of the different algorithms, it was decided that the Support Vector Machine was the best model to recommend.
The Support Vector Machine model was fine tuned using Gridsearch and Cross Validation. This attained a final F1 score of 0.80 on the test set.
Note: This project was part of a submission for my Machine Learning Nanodegree at Udacity.
The full iPython Notebook writeup can be viewed in the following link: