Description of ML techniques
| ML solution | Algorithm | Description | Source |
|---|---|---|---|
| Classification [Supervised] | Logic-based | ||
| Decision trees | The decision tree consists of nodes, each containing a question related to a particular feature. The algorithm starts at the root node, determines which features are present for that root node question and then, depending on the answer, moves on to the next node. The classification consists of a number of decisions which occur at each node, ending at the leaf node | Thomassey and Fiordaliso (2006:410), Kotsiantis (2007:251), Marsland (2009:133), Narasimha Murty and Susheela Devi (2011:127) | |
| C4.5 decision trees | An iteration of the normal decision tree algorithm by pruning it to reduce the number of nodes without losing the ability to classify the instance | Thomassey and Fiordaliso (2006:410), Marsland (2009:143) | |
| Random forests (transfer learning decision forests) | This model uses random forests (see Clustering, unsupervised) where the knowledge produced can subsequently be applied or transferred to a given target task. This generates a classifier that can be used to exploit the knowledge from other tasks to improve the ability of the classifier to perform a target task | Goussies et al. (2014:4312 | |
| Perceptron-based | |||
| Neural network (dual-use algorithm) | A combination of mathematically generated neurons, which operate similarly to the human brain. These neurons are each assigned a weight based on what the artificial neural network learns, collectively forming part of a mathematical function. A network consisting of an input and an output layer can classify linearly separable classes – this is known as a feed-forward network | SMACC (2017:9) | |
| Statistical | |||
| Naïve Bayes | The Naïve Bayes algorithm is a probabilistic model which determines the probability of different classes or outcomes based on previously encountered examples These examples are identified in the training data | Larsson and Segerås (2016:12) | |
| Bayesian belief networks | The network is modelled on the Bayesian theorem. Using a graphical model, it considers the probabilistic dependencies among features and plots these probabilistic relationships | Heckerman (2008:33), Narasimha Murty and Susheela Devi (2011:97), Witten et al. (2016:340) | |
| K-nearest neighbour | The algorithm classifies instances or patterns according to the nearest known neighbour class by finding similarities in the instance being classified to patterns or features in the training set | Narasimha Murty and Susheela Devi (2011:48) | |
| Support vector machines (dual-use algorithm) | A binary classifier that aims to separate data into two classes based on the case features | Ayodele (2010b:25) | |
| Prediction [Supervised] | Conditional random fields | An algorithm can predict outputs by combining discriminative classification with graphical modelling | Sutton and Mccallum (2011:269) |
| Clustering [Unsupervised] | Parallelisation MapReduce k-nearest neighbour | A method where two methods are applied to the existing k-nearest neighbour (KNN) algorithm. Firstly, a clustering algorithm is used to group similar data to reduce the number of samples the KNN algorithm needs to process. Secondly, a map and reduced parallel model are applied to the data set to identify the independent categories of data and run the KNN algorithm multiple times simultaneously (in parallel). This improves the performance of the algorithm on larger samples | Du (2017) |
| Semi-supervised clustering | Clustering algorithms are unsupervised ML algorithms that work to find a partition in the data set. Semi-supervised clustering assists the algorithm in finding a better-quality partition by providing the algorithm with any prior knowledge about the data. The clustering algorithm is then guided by the prior knowledge to find the partition in the data | Jain et al. (2014) | |
| K-means clustering | K-means clustering divides the data into k number of categories. To perform k-means clustering, the number of clusters, that is k, needs to be specified, and a random initial central data point (centroid) needs to be selected for each cluster. The data is then grouped based on the distance of each data point from the initial centre. The algorithm runs again until the cluster centres no longer need to move | Marsland (2009:196), Ayodele (2010b:27) | |
| Random forests | This model consists of a number of decision trees, each composed of a subsample of features, and is usually weaker than a full decision tree. The average, or the weighted average, of the trees is determined and used to perform the classification, effectively combining the power of the individual trees, which often produces a higher-quality result | Bucheli and Thompson (2014:4), Dataiku (2017:7) | |
| Self-organising maps | A form of neural network that uses unsupervised learning. The objective of the self-organising map is to produce its own representation or self-organisation of the given data as outputs are not provided | Kohonen (1990:1464), Hadzic et al. (2007:225) | |
| Outlier detection [Unsupervised] | Association rules | Association rules determine the associative relationships between data, where the occurrence of one feature may indicate the possible occurrence of another feature. Instead of predicting a particular class, association rules can predict combinations of features and which features are commonly associated with each other, irrespective of class | Narasimha Murty and Susheela Devi (2011), Witten et al. (2016) |
| ML solution | Algorithm | Description | Source |
|---|---|---|---|
| Decision trees | The decision tree consists of nodes, each containing a question related to a particular feature. The algorithm starts at the root node, determines which features are present for that root node question and then, depending on the answer, moves on to the next node. The classification consists of a number of decisions which occur at each node, ending at the leaf node | ||
| C4.5 decision trees | An iteration of the normal decision tree algorithm by pruning it to reduce the number of nodes without losing the ability to classify the instance | ||
| Random forests (transfer learning decision forests) | This model uses random forests (see Clustering, unsupervised) where the knowledge produced can subsequently be applied or transferred to a given target task. This generates a classifier that can be used to exploit the knowledge from other tasks to improve the ability of the classifier to perform a target task | ||
| Neural network (dual-use algorithm) | A combination of mathematically generated neurons, which operate similarly to the human brain. These neurons are each assigned a weight based on what the artificial neural network learns, collectively forming part of a mathematical function. A network consisting of an input and an output layer can classify linearly separable classes – this is known as a feed-forward network | ||
| Naïve Bayes | The Naïve Bayes algorithm is a probabilistic model which determines the probability of different classes or outcomes based on previously encountered examples These examples are identified in the training data | ||
| Bayesian belief networks | The network is modelled on the Bayesian theorem. Using a graphical model, it considers the probabilistic dependencies among features and plots these probabilistic relationships | ||
| K-nearest neighbour | The algorithm classifies instances or patterns according to the nearest known neighbour class by finding similarities in the instance being classified to patterns or features in the training set | ||
| Support vector machines (dual-use algorithm) | A binary classifier that aims to separate data into two classes based on the case features | ||
| Conditional random fields | An algorithm can predict outputs by combining discriminative classification with graphical modelling | ||
| Clustering [Unsupervised] | Parallelisation MapReduce k-nearest neighbour | A method where two methods are applied to the existing k-nearest neighbour (KNN) algorithm. Firstly, a clustering algorithm is used to group similar data to reduce the number of samples the KNN algorithm needs to process. Secondly, a map and reduced parallel model are applied to the data set to identify the independent categories of data and run the KNN algorithm multiple times simultaneously (in parallel). This improves the performance of the algorithm on larger samples | |
| Semi-supervised clustering | Clustering algorithms are unsupervised ML algorithms that work to find a partition in the data set. Semi-supervised clustering assists the algorithm in finding a better-quality partition by providing the algorithm with any prior knowledge about the data. The clustering algorithm is then guided by the prior knowledge to find the partition in the data | ||
| K-means clustering | K-means clustering divides the data into k number of categories. To perform k-means clustering, the number of clusters, that is k, needs to be specified, and a random initial central data point (centroid) needs to be selected for each cluster. The data is then grouped based on the distance of each data point from the initial centre. The algorithm runs again until the cluster centres no longer need to move | ||
| Random forests | This model consists of a number of decision trees, each composed of a subsample of features, and is usually weaker than a full decision tree. The average, or the weighted average, of the trees is determined and used to perform the classification, effectively combining the power of the individual trees, which often produces a higher-quality result | ||
| Self-organising maps | A form of neural network that uses unsupervised learning. The objective of the self-organising map is to produce its own representation or self-organisation of the given data as outputs are not provided | ||
| Outlier detection [Unsupervised] | Association rules | Association rules determine the associative relationships between data, where the occurrence of one feature may indicate the possible occurrence of another feature. Instead of predicting a particular class, association rules can predict combinations of features and which features are commonly associated with each other, irrespective of class | |