Description of ML limitations
| Limitation | Description of limitation | Source | Qualitative accounting objective |
|---|---|---|---|
| Poor interpretability | The limitation relates to the fact that users cannot understand how information is generated by the ML technology owing to the complexity of the ML model | Ayodele (2010a), Sainani (2014), The Royal Society (2017) | Verifiability and Understandability |
| Overfitting | The risk is that input features with little modelling benefit are included in the training data. These features may increase the sensitivity of the technology to changes in the inputs, even though they could be excluded with no disadvantages. In such instances, the ML model may be too closely linked to the training data used to train it and unable to classify other data sets appropriately. This increases the risk of misleading representations | Hawkins (2004), Sculley et al. (2015), Witten et al. (2016) | Relevance |
| It takes a long time to train | The risk of increased training or learning times for ML models as the size and complexity of the data sets increase | Ghanem (2012) | Timeliness |
| Complex, which makes it slow | The risk of increased processing time due to the complexity of the ML model. In the case of a classification technique, for example, the model will be slow to classify data | Kotsiantis (2007), Witten et al. (2016) | Timeliness |
| Training rate may be slow depending on available labelled data | In this instance, the training rate is impacted by the available labelled data to train the ML model. The ML model is trained using a labelled data set consisting of examples of input data and labels indicating predicted targets or output data. This type of data is not as prevalent as unstructured data | Ayodele (,2010b), Castle (2018), Larsson and Segerås (2016), Marsland (2009), SMACC (2017), Zheng et al. (2017) | Timeliness |
| Requires independent variables | An ML technique such as Naïve Bayes requires independent variables in the data, which implies that the values of the different features of each variable do not influence one another. However, as each variable has a high number of features, it is unlikely that there are no dependencies among them. This may result in incorrect processing and outputs of the ML model | Marsland (2009), Samoil (2015) | Faithful representation |
| Training set sensitive | If a feature has a category which was not observed in the training data set, then a zero probability will be assigned to that category, thus resulting in the ML model not being able to make a prediction known as zero frequency | Witten et al. (2016), Samoil (2015), Larsson and Segerås (2016) | Materiality and faithful representation |
| Computing intensive | The risk of costs exceeding the financial benefits to the business as the ML model requires advanced data integration tools and infrastructure, which may present significant costs to the business to acquire | Gillion (2017), Sapp (2017) | Cost-saving |
| Excessive output | In the case of association rules ML models, the number of rules discovered may be excessive, which may impact the relevance of the output | Kaur (2014) | Relevance |
| Requires lots of time | ML models may take a lot of time to produce outputs, and this may be due to how many times the algorithm needs to run to achieve an accurate result | Witten et al. (2016), Kaur (2014), Ayodele (2010b) | Timeliness |
| Requires adequate data | The risk is that the ML model is inaccurate owing to insufficient data for training | Burrell (2016) | Materiality and faithful representation |
| Trade-off between accuracy, which requires memory and overfitting | A limitation of some ML models is that training them using large feature data sets results in more accurate predictions but requires more memory to store and has an increased risk of overfitting | Witten et al. (2016), Sutton and Mccallum (2011) | Relevance and faithful representation |
| Limitation | Description of limitation | Source | Qualitative accounting objective |
|---|---|---|---|
| Poor interpretability | The limitation relates to the fact that users cannot understand how information is generated by the ML technology owing to the complexity of the ML model | Verifiability and Understandability | |
| Overfitting | The risk is that input features with little modelling benefit are included in the training data. These features may increase the sensitivity of the technology to changes in the inputs, even though they could be excluded with no disadvantages. In such instances, the ML model may be too closely linked to the training data used to train it and unable to classify other data sets appropriately. This increases the risk of misleading representations | Relevance | |
| It takes a long time to train | The risk of increased training or learning times for ML models as the size and complexity of the data sets increase | Timeliness | |
| Complex, which makes it slow | The risk of increased processing time due to the complexity of the ML model. In the case of a classification technique, for example, the model will be slow to classify data | Timeliness | |
| Training rate may be slow depending on available labelled data | In this instance, the training rate is impacted by the available labelled data to train the ML model. The ML model is trained using a labelled data set consisting of examples of input data and labels indicating predicted targets or output data. This type of data is not as prevalent as unstructured data | Timeliness | |
| Requires independent variables | An ML technique such as Naïve Bayes requires independent variables in the data, which implies that the values of the different features of each variable do not influence one another. However, as each variable has a high number of features, it is unlikely that there are no dependencies among them. This may result in incorrect processing and outputs of the ML model | Faithful representation | |
| Training set sensitive | If a feature has a category which was not observed in the training data set, then a zero probability will be assigned to that category, thus resulting in the ML model not being able to make a prediction known as zero frequency | Materiality and faithful representation | |
| Computing intensive | The risk of costs exceeding the financial benefits to the business as the ML model requires advanced data integration tools and infrastructure, which may present significant costs to the business to acquire | Cost-saving | |
| Excessive output | In the case of association rules ML models, the number of rules discovered may be excessive, which may impact the relevance of the output | Relevance | |
| Requires lots of time | ML models may take a lot of time to produce outputs, and this may be due to how many times the algorithm needs to run to achieve an accurate result | Timeliness | |
| Requires adequate data | The risk is that the ML model is inaccurate owing to insufficient data for training | Materiality and faithful representation | |
| Trade-off between accuracy, which requires memory and overfitting | A limitation of some ML models is that training them using large feature data sets results in more accurate predictions but requires more memory to store and has an increased risk of overfitting | Relevance and faithful representation |
Notes:
We have inserted the Table here for ease of review. We have kept to the standard JFRA convention in the “clean copy” version of the manuscript