Table 4. Description of ML limitations... | Emerald Publishing

Table 4.

Description of ML limitations

Limitation	Description of limitation	Source	Qualitative accounting objective
Poor interpretability	The limitation relates to the fact that users cannot understand how information is generated by the ML technology owing to the complexity of the ML model	Ayodele (2010a), Sainani (2014), The Royal Society (2017)	Verifiability and Understandability
Overfitting	The risk is that input features with little modelling benefit are included in the training data. These features may increase the sensitivity of the technology to changes in the inputs, even though they could be excluded with no disadvantages. In such instances, the ML model may be too closely linked to the training data used to train it and unable to classify other data sets appropriately. This increases the risk of misleading representations	Hawkins (2004), Sculley et al. (2015), Witten et al. (2016)	Relevance
It takes a long time to train	The risk of increased training or learning times for ML models as the size and complexity of the data sets increase	Ghanem (2012)	Timeliness
Complex, which makes it slow	The risk of increased processing time due to the complexity of the ML model. In the case of a classification technique, for example, the model will be slow to classify data	Kotsiantis (2007), Witten et al. (2016)	Timeliness
Training rate may be slow depending on available labelled data	In this instance, the training rate is impacted by the available labelled data to train the ML model. The ML model is trained using a labelled data set consisting of examples of input data and labels indicating predicted targets or output data. This type of data is not as prevalent as unstructured data	Ayodele (,2010b), Castle (2018), Larsson and Segerås (2016), Marsland (2009), SMACC (2017), Zheng et al. (2017)	Timeliness
Requires independent variables	An ML technique such as Naïve Bayes requires independent variables in the data, which implies that the values of the different features of each variable do not influence one another. However, as each variable has a high number of features, it is unlikely that there are no dependencies among them. This may result in incorrect processing and outputs of the ML model	Marsland (2009), Samoil (2015)	Faithful representation
Training set sensitive	If a feature has a category which was not observed in the training data set, then a zero probability will be assigned to that category, thus resulting in the ML model not being able to make a prediction known as zero frequency	Witten et al. (2016), Samoil (2015), Larsson and Segerås (2016)	Materiality and faithful representation
Computing intensive	The risk of costs exceeding the financial benefits to the business as the ML model requires advanced data integration tools and infrastructure, which may present significant costs to the business to acquire	Gillion (2017), Sapp (2017)	Cost-saving
Excessive output	In the case of association rules ML models, the number of rules discovered may be excessive, which may impact the relevance of the output	Kaur (2014)	Relevance
Requires lots of time	ML models may take a lot of time to produce outputs, and this may be due to how many times the algorithm needs to run to achieve an accurate result	Witten et al. (2016), Kaur (2014), Ayodele (2010b)	Timeliness
Requires adequate data	The risk is that the ML model is inaccurate owing to insufficient data for training	Burrell (2016)	Materiality and faithful representation
Trade-off between accuracy, which requires memory and overfitting	A limitation of some ML models is that training them using large feature data sets results in more accurate predictions but requires more memory to store and has an increased risk of overfitting	Witten et al. (2016), Sutton and Mccallum (2011)	Relevance and faithful representation

Limitation	Description of limitation	Source	Qualitative accounting objective
Poor interpretability	The limitation relates to the fact that users cannot understand how information is generated by the ML technology owing to the complexity of the ML model	Ayodele (2010a), Sainani (2014), The Royal Society (2017)	Verifiability and Understandability
Overfitting	The risk is that input features with little modelling benefit are included in the training data. These features may increase the sensitivity of the technology to changes in the inputs, even though they could be excluded with no disadvantages. In such instances, the ML model may be too closely linked to the training data used to train it and unable to classify other data sets appropriately. This increases the risk of misleading representations	Hawkins (2004), Sculley et al. (2015), Witten et al. (2016)	Relevance
It takes a long time to train	The risk of increased training or learning times for ML models as the size and complexity of the data sets increase	Ghanem (2012)	Timeliness
Complex, which makes it slow	The risk of increased processing time due to the complexity of the ML model. In the case of a classification technique, for example, the model will be slow to classify data	Kotsiantis (2007), Witten et al. (2016)	Timeliness
Training rate may be slow depending on available labelled data	In this instance, the training rate is impacted by the available labelled data to train the ML model. The ML model is trained using a labelled data set consisting of examples of input data and labels indicating predicted targets or output data. This type of data is not as prevalent as unstructured data	Ayodele (2010b), Castle (2018), Larsson and Segerås (2016), Marsland (2009), SMACC (2017), Zheng et al. (2017)	Timeliness
Requires independent variables	An ML technique such as Naïve Bayes requires independent variables in the data, which implies that the values of the different features of each variable do not influence one another. However, as each variable has a high number of features, it is unlikely that there are no dependencies among them. This may result in incorrect processing and outputs of the ML model	Marsland (2009), Samoil (2015)	Faithful representation
Training set sensitive	If a feature has a category which was not observed in the training data set, then a zero probability will be assigned to that category, thus resulting in the ML model not being able to make a prediction known as zero frequency	Witten et al. (2016), Samoil (2015), Larsson and Segerås (2016)	Materiality and faithful representation
Computing intensive	The risk of costs exceeding the financial benefits to the business as the ML model requires advanced data integration tools and infrastructure, which may present significant costs to the business to acquire	Gillion (2017), Sapp (2017)	Cost-saving
Excessive output	In the case of association rules ML models, the number of rules discovered may be excessive, which may impact the relevance of the output	Kaur (2014)	Relevance
Requires lots of time	ML models may take a lot of time to produce outputs, and this may be due to how many times the algorithm needs to run to achieve an accurate result	Witten et al. (2016), Kaur (2014), Ayodele (2010b)	Timeliness
Requires adequate data	The risk is that the ML model is inaccurate owing to insufficient data for training	Burrell (2016)	Materiality and faithful representation
Trade-off between accuracy, which requires memory and overfitting	A limitation of some ML models is that training them using large feature data sets results in more accurate predictions but requires more memory to store and has an increased risk of overfitting	Witten et al. (2016), Sutton and Mccallum (2011)	Relevance and faithful representation

Notes:

We have inserted the Table here for ease of review. We have kept to the standard JFRA convention in the “clean copy” version of the manuscript