Table A1. Description of ML techniques... | Emerald Publishing

Table A1.

Description of ML techniques

ML solution	Algorithm	Description	Source
Classification [Supervised]	Logic-based
	Decision trees	The decision tree consists of nodes, each containing a question related to a particular feature. The algorithm starts at the root node, determines which features are present for that root node question and then, depending on the answer, moves on to the next node. The classification consists of a number of decisions which occur at each node, ending at the leaf node	Thomassey and Fiordaliso (2006:410), Kotsiantis (2007:251), Marsland (2009:133), Narasimha Murty and Susheela Devi (2011:127)
	C4.5 decision trees	An iteration of the normal decision tree algorithm by pruning it to reduce the number of nodes without losing the ability to classify the instance	Thomassey and Fiordaliso (2006:410), Marsland (2009:143)
	Random forests (transfer learning decision forests)	This model uses random forests (see Clustering, unsupervised) where the knowledge produced can subsequently be applied or transferred to a given target task. This generates a classifier that can be used to exploit the knowledge from other tasks to improve the ability of the classifier to perform a target task	Goussies et al. (2014:4312
	Perceptron-based
	Neural network (dual-use algorithm)	A combination of mathematically generated neurons, which operate similarly to the human brain. These neurons are each assigned a weight based on what the artificial neural network learns, collectively forming part of a mathematical function. A network consisting of an input and an output layer can classify linearly separable classes – this is known as a feed-forward network	SMACC (2017:9)
	Statistical
	Naïve Bayes	The Naïve Bayes algorithm is a probabilistic model which determines the probability of different classes or outcomes based on previously encountered examples These examples are identified in the training data	Larsson and Segerås (2016:12)
	Bayesian belief networks	The network is modelled on the Bayesian theorem. Using a graphical model, it considers the probabilistic dependencies among features and plots these probabilistic relationships	Heckerman (2008:33), Narasimha Murty and Susheela Devi (2011:97), Witten et al. (2016:340)
	K-nearest neighbour	The algorithm classifies instances or patterns according to the nearest known neighbour class by finding similarities in the instance being classified to patterns or features in the training set	Narasimha Murty and Susheela Devi (2011:48)
	Support vector machines (dual-use algorithm)	A binary classifier that aims to separate data into two classes based on the case features	Ayodele (2010b:25)
Prediction [Supervised]	Conditional random fields	An algorithm can predict outputs by combining discriminative classification with graphical modelling	Sutton and Mccallum (2011:269)
Clustering [Unsupervised]	Parallelisation MapReduce k-nearest neighbour	A method where two methods are applied to the existing k-nearest neighbour (KNN) algorithm. Firstly, a clustering algorithm is used to group similar data to reduce the number of samples the KNN algorithm needs to process. Secondly, a map and reduced parallel model are applied to the data set to identify the independent categories of data and run the KNN algorithm multiple times simultaneously (in parallel). This improves the performance of the algorithm on larger samples	Du (2017)
	Semi-supervised clustering	Clustering algorithms are unsupervised ML algorithms that work to find a partition in the data set. Semi-supervised clustering assists the algorithm in finding a better-quality partition by providing the algorithm with any prior knowledge about the data. The clustering algorithm is then guided by the prior knowledge to find the partition in the data	Jain et al. (2014)
	K-means clustering	K-means clustering divides the data into k number of categories. To perform k-means clustering, the number of clusters, that is k, needs to be specified, and a random initial central data point (centroid) needs to be selected for each cluster. The data is then grouped based on the distance of each data point from the initial centre. The algorithm runs again until the cluster centres no longer need to move	Marsland (2009:196), Ayodele (2010b:27)
	Random forests	This model consists of a number of decision trees, each composed of a subsample of features, and is usually weaker than a full decision tree. The average, or the weighted average, of the trees is determined and used to perform the classification, effectively combining the power of the individual trees, which often produces a higher-quality result	Bucheli and Thompson (2014:4), Dataiku (2017:7)
	Self-organising maps	A form of neural network that uses unsupervised learning. The objective of the self-organising map is to produce its own representation or self-organisation of the given data as outputs are not provided	Kohonen (1990:1464), Hadzic et al. (2007:225)
Outlier detection [Unsupervised]	Association rules	Association rules determine the associative relationships between data, where the occurrence of one feature may indicate the possible occurrence of another feature. Instead of predicting a particular class, association rules can predict combinations of features and which features are commonly associated with each other, irrespective of class	Narasimha Murty and Susheela Devi (2011), Witten et al. (2016)

ML solution	Algorithm	Description	Source
Classification [Supervised]	Logic-based
	Decision trees	The decision tree consists of nodes, each containing a question related to a particular feature. The algorithm starts at the root node, determines which features are present for that root node question and then, depending on the answer, moves on to the next node. The classification consists of a number of decisions which occur at each node, ending at the leaf node	Thomassey and Fiordaliso (2006:410), Kotsiantis (2007:251), Marsland (2009:133), Narasimha Murty and Susheela Devi (2011:127)
	C4.5 decision trees	An iteration of the normal decision tree algorithm by pruning it to reduce the number of nodes without losing the ability to classify the instance	Thomassey and Fiordaliso (2006:410), Marsland (2009:143)
	Random forests (transfer learning decision forests)	This model uses random forests (see Clustering, unsupervised) where the knowledge produced can subsequently be applied or transferred to a given target task. This generates a classifier that can be used to exploit the knowledge from other tasks to improve the ability of the classifier to perform a target task	Goussies et al. (2014:4312
	Perceptron-based
	Neural network (dual-use algorithm)	A combination of mathematically generated neurons, which operate similarly to the human brain. These neurons are each assigned a weight based on what the artificial neural network learns, collectively forming part of a mathematical function. A network consisting of an input and an output layer can classify linearly separable classes – this is known as a feed-forward network	SMACC (2017:9)
	Statistical
	Naïve Bayes	The Naïve Bayes algorithm is a probabilistic model which determines the probability of different classes or outcomes based on previously encountered examples These examples are identified in the training data	Larsson and Segerås (2016:12)
	Bayesian belief networks	The network is modelled on the Bayesian theorem. Using a graphical model, it considers the probabilistic dependencies among features and plots these probabilistic relationships	Heckerman (2008:33), Narasimha Murty and Susheela Devi (2011:97), Witten et al. (2016:340)
	K-nearest neighbour	The algorithm classifies instances or patterns according to the nearest known neighbour class by finding similarities in the instance being classified to patterns or features in the training set	Narasimha Murty and Susheela Devi (2011:48)
	Support vector machines (dual-use algorithm)	A binary classifier that aims to separate data into two classes based on the case features	Ayodele (2010b:25)
Prediction [Supervised]	Conditional random fields	An algorithm can predict outputs by combining discriminative classification with graphical modelling	Sutton and Mccallum (2011:269)
Clustering [Unsupervised]	Parallelisation MapReduce k-nearest neighbour	A method where two methods are applied to the existing k-nearest neighbour (KNN) algorithm. Firstly, a clustering algorithm is used to group similar data to reduce the number of samples the KNN algorithm needs to process. Secondly, a map and reduced parallel model are applied to the data set to identify the independent categories of data and run the KNN algorithm multiple times simultaneously (in parallel). This improves the performance of the algorithm on larger samples	Du (2017)
	Semi-supervised clustering	Clustering algorithms are unsupervised ML algorithms that work to find a partition in the data set. Semi-supervised clustering assists the algorithm in finding a better-quality partition by providing the algorithm with any prior knowledge about the data. The clustering algorithm is then guided by the prior knowledge to find the partition in the data	Jain et al. (2014)
	K-means clustering	K-means clustering divides the data into k number of categories. To perform k-means clustering, the number of clusters, that is k, needs to be specified, and a random initial central data point (centroid) needs to be selected for each cluster. The data is then grouped based on the distance of each data point from the initial centre. The algorithm runs again until the cluster centres no longer need to move	Marsland (2009:196), Ayodele (2010b:27)
	Random forests	This model consists of a number of decision trees, each composed of a subsample of features, and is usually weaker than a full decision tree. The average, or the weighted average, of the trees is determined and used to perform the classification, effectively combining the power of the individual trees, which often produces a higher-quality result	Bucheli and Thompson (2014:4), Dataiku (2017:7)
	Self-organising maps	A form of neural network that uses unsupervised learning. The objective of the self-organising map is to produce its own representation or self-organisation of the given data as outputs are not provided	Kohonen (1990:1464), Hadzic et al. (2007:225)
Outlier detection [Unsupervised]	Association rules	Association rules determine the associative relationships between data, where the occurrence of one feature may indicate the possible occurrence of another feature. Instead of predicting a particular class, association rules can predict combinations of features and which features are commonly associated with each other, irrespective of class	Narasimha Murty and Susheela Devi (2011), Witten et al. (2016)

Source: Compiled by author from multiple sources as indicated above