Optimising sample size for normality testing in geotechnical data using power analysis

Sankar Ganesh, K.S.; Ashok Kumar, T.; Vibinesh, M.

doi:10.1108/MLAG-03-2025-0008

Purpose

This study aims to examine the performance of various goodness-of-fit (GoF) tests in assessing the normality of the data set, a crucial step in conducting probabilistic analyses in the geotechnical domain. The evaluation focuses on the efficacy of these tests when applied to small sample sizes and data sets with varying coefficient of variation (CoV). Identifying the most efficient GoF test based on the statistical characteristics of the data can enhance the reliability of results and minimise the risk of misleading conclusions.

Design/methodology/approach

Multiple GoF tests, including Shapiro–Wilk (S-W), Lilliefors (LL), Anderson–Darling (A-D), Jarque–Bera (J-B), chi-square (CSQ), Cramér–von Mises (CVM) and D’Agostino and Pearson Omnibus (DP) tests, were used for normality assessment. A computational power analysis was performed through Monte Carlo random sample simulation to determine the optimal sample size required to achieve the desired statistical power. Furthermore, the performance and sensitivity of each GoF test were assessed systematically by varying the sample size and effect size (d) to establish the relationship between the minimum required sample size and CoV.

Findings

Power analysis revealed that the S-W tests demonstrated higher effectiveness in detecting normality, followed by the A-D, J-B, LL and DP tests. The degree of skewness and CoV in the data sets plays a crucial role in optimising the sample size requirements. For S-W tests, the minimum required sample sizes varies with the CoV as follows: (a) CoV < 10% requires at least 665 samples, (b) 10% ≤ CoV < 30% requires 665–145 samples, (c) 30% ≤ CoV < 50% requires 145–70, (d) 50%≤ CoV < 100% requires 70–25 samples, and (e) CoV ≥ 100% requires at most 25 samples. In comparison, CSQ and CVM tests demand substantially larger minimum sample sizes, ranging approximately between 115 and 4,400.

Originality/value

This study presents a comparative analysis of GoF tests applied to geotechnical data, determining the required sample size through power analysis, with a target statistical power of 0.8 at a chosen significance level of 0.05. These findings provide practical guidance for selecting appropriate normality tests and the required minimum sample size for geotechnical data with varying CoV.

1. Introduction

In civil engineering practice, the uncertain conditions are not uncommon and lead to complexities in the design due to factors such as material heterogeneity, characterisation methods, construction practices and other environmental factors (Phoon and Kulhawy, 1999; Baecher and Christian, 2005; IRC SP-11, 2012; Zbiciak et al., 2025). Particularly in the geotechnical domain, the stochastic nature of geomaterial formation, in-situ stress history, long-term exposure to various climatic and environmental conditions, and weathering result in inherent heterogeneity and spatial variability (Phoon and Kulhawy, 1999; Fenton and Griffiths, 2008; Cherubini and Vessia, 2010; Wang et al., 2015; Liu et al., 2024). Furthermore, the data collection introduces additional challenges, including sampling bias, limited accessibility and variations in testing methods, which compound measurement errors and data inconsistencies (Akbas and Kulhawy, 2010; Zhao et al., 2013; Li et al., 2021; Kumar et al., 2023). Therefore, growing interest towards the integration of variability necessitates the application of statistical methods in geotechnical design to accurately represent soil behaviour, quality control levels and performance assessment (Fenton and Griffiths, 2008; Phoon and Ching, 2018; Harle and Wankhade, 2025).

Fitting an appropriate probability model (using inferential statistics) is one way to model the uncertainty in the parameter of interest. For instance, normal and lognormal distributions are commonly used in science and engineering practice due to their ability to represent a wide range of inherent variability, making them easy to implement for practical applications (Theocharis et al., 2024). In the context of geotechnical practice, the index and engineering properties like Atterberg limits, compaction characteristics, compressibility parameters, hydraulic conductivity, degree of saturation, shear strength parameter and so on, are often fitted to either normal or lognormal distribution (Elkateb et al., 2003; Baecher and Christian, 2005; Zhai and Benson, 2006; Wang et al., 2015; Galeandro et al., 2017; Toraldo et al., 2018; Nguyen et al., 2023; Theocharis et al., 2024; Ganesh and Kumar, 2025). Several reliability-based approaches assume that geotechnical parameters such as shear strength, consolidation characteristics and permeability follow a normal or lognormal distribution (Phoon et al., 1995; Nowak and Collins, 2000; Baecher and Christian, 2005; Fenton and Griffiths, 2008; Saseendran and Dodagoudar, 2020). However, an incorrect selection of the underlying probability model can distort reliability estimates, leading to either underestimation or overestimation of the probability of failure.

Furthermore, the commonly employed statistical techniques in material quality control include parametric tests such as one-sample t-tests, two-sample t-tests, analysis of variance (ANOVA) and the F-test of homogeneity of variance, regression and correlation, which are routinely applied in practice (Burati et al., 2003; IRC SP-11, 2012; Ganesh and Kumar, 2024). A fundamental assumption underlying these parametric tests is that the sampled data originates from a normally (Gaussian) distributed population (Montgomery and Runger, 2010; Ott and Longnecker, 2015). However, recent observations by Phoon et al. (2023) highlight growing concerns among field practitioners, who insist that many statistical models assume that geotechnical data are homogenous, independent, sufficient and normally distributed, which makes analysis unrealistic. This gap between the statistical assumptions and field behaviour underscores the importance of conducting appropriate goodness-of-fit (GoF) tests.

Normality assessment methods can be broadly categorised into graphical approaches, numerical measures and formal statistical tests (D’Agostino and Stephens, 1986; Thode, 2002; Henderson, 2006; Razali and Wah, 2010). In recent years, researchers have also explored the use of machine learning algorithms as complementary tools in assessing the distributional characteristics of the data. Studies by Sigut et al. (2006), Lee et al. (2019) and Simić (2021) demonstrated that neural network and deep learning models can effectively identify non-normal patterns in complex data sets. However, there remains a lack of comprehensive studies that explicitly examine the normality characteristics of laboratory/field-derived geotechnical data. Given the proven reliability and widespread adoption of formal statistical tests, this study focuses on the application of the most frequently cited and commonly adopted GoF tests within the geotechnical domain, as discussed in detail in the following paragraph.

The formal statistical tests (also known GoF test) can be classified into different categories based on their underlying statistical principles:

empirical distribution test (Anderson–Darling (A-D) test, Kolmogorov–Smirnov (K-S) test, Lilliefors (LL) test, Cramér–von Mises (CVM) test, etc.);
regression and correlation-based tests (Shapiro–Wilk (S-W) test, Ryan–Joiner test, etc.);
chi-square (CSQ) test; and
moment-based tests (Jarque–Bera J-B, DP Omnibus test, etc.).

Among these tests, some selected GoF tests were most frequently used to assess the appropriate probability model for the laboratory and in -situ geotechnical data. For example, Benson (1993) used the K-S test to determine the best-fitting probability model for the data set comprising permeability of remoulded soil samples, with CoV values ranging between 27% and 767%. Previous studies by Haldar and Mahadevan (2000), Burati et al. (2003), Baecher and Christian (2005), Ang and Tang (2007), Fenton and Griffiths (2008), Galeandro et al. (2017) and Theocharis et al. (2024) identified the CSQ, K-S and A-D tests as the most frequently used methods for assessing the normality of geotechnical data sets. Similarly, Phoon and Kulhawy (1999), Baecher and Christian (2005), Akbas and Kulhawy (2010), Galeandro et al. (2017), Toraldo et al. (2018), Nguyen et al. (2023) and Theocharis et al. (2024) documented the CoV values for geotechnical properties ranging from 2% to 105%, except for permeability, which may exhibit high CoV values (200%–300%).

Furthermore, Uzielli et al. (2007) recommended the S-W test over the K-S test for geotechnical applications, citing Thode’s (2002) findings, which were not explicitly focused on geotechnical properties. More recently, Somani et al. (2019) employed the S-W test to evaluate the normality of soil-like fractions reclaimed from landfill mining. However, Ganesh and Kumar (2025) applied the K-S test to assess whether the index and engineering properties of processed geomaterials (pond ash-bentonite mixtures) followed a normal or lognormal distribution. Although various GoF tests, such as K-S, CSQ, S-W and A-D, are commonly applied, the efficacy of these tests for geotechnical data has not been addressed or is unknown.

From a general statistical perspective, Ramachandran and Tsokos (2009), Montgomery and Runger (2010) and Ott and Longnecker (2015) suggested that a sample size of 30 or more is typically sufficient to approximate the normality of the sample mean distribution, in accordance with the central limit theorem. However, they also reported that this threshold is not unalterable because it is derived from extensive simulation studies and may vary depending on the degree of skewness of the data. In addition, the normality of the data should be examined carefully when approximating it with a sample size of ≤30. For instance, Tang et al. (2017) optimise the required minimum sample size to be in the range of 54–458 for identifying the most suitable probability model for geotechnical data using Akaike Information Criterion scores (one of the techniques to assess appropriate distribution fit). Their analysis considered data with CoV varying from 10% to 30%. However, in geotechnical engineering practice, most studies have addressed the constraints of obtaining a sample size of more than 30, which is not uncommon (Wang et al., 2015; Tang et al., 2017).

Earlier studies by Yazici and Yolacan (2007) and Yap and Sim (2011) optimised the sample size for assessing the performance of the selected GoF tests using power analysis. However, these investigations primarily relied on synthetic data and did not account for the actual variability observed in geotechnical properties. Therefore, the present study aims to evaluate the efficacy of commonly used GoF tests in assessing the normality of geotechnical data sets with diverse statistical characteristics (CoV = 6%, 7%, 12%, 15%, 17%, 19%, 29%, 37%, 94% and 196%). To ensure accuracy on statistical decisions, sample size optimisation is performed using power analysis, allowing for an effective comparison of the performance and power of each GoF test. Furthermore, to evaluate the test sensitivity under controlled deviations from normality, skewness is artificially induced into the original data using a nonlinear transformation. Based on the findings, a classification framework will be proposed to recommend the minimum sample size requirement as a function of the CoV, thereby improving the practical applicability of normality testing in geotechnical engineering.

2. Methodology

2.1 Experimentally derived geotechnical data

This study evaluates multiple data sets encompassing natural geomaterials, soil-like fractions from legacy waste deposits, and processed geomaterials to assess their normality, as summarised in Table 1. Natural geomaterial samples were collected from depths of 0.6–1.0 m at the highway construction sites located in the Karaikal and Nagapattinam regions of Puducherry and Tamil Nadu, India. Fifty field samples were collected from natural ground earth, which are classified as Silty Sand (SM) and Clayey Sand (SC) based on the Unified Soil Classification System (USCS), as reported in Table 2. The key geotechnical properties evaluated from these samples included the liquid limit (data set 1), plastic limit (data set 2), compaction characteristics (data sets 3 and 4) and California Bearing Ratio (CBR) (data set 5), following the guidelines of ASTM D1557 (2015), ASTM D698 (2021), ASTM D4318 (2017) and ASTM D1883 (1999).

Table 1

Summary of experimentally derived geotechnical properties evaluated for normality

S. no.	Natural geomaterial					Legacy waste	Pond ash-bentonite mixture
S. no.	Liquid limit (%)	Plastic limit (%)	Dry density (mg/m³)	Moisture content (%)	CBR (%)	% of fines (< 75 µm)	Dry density (mg/m³)	Moisture content (%)	UCS (kPa)	Permeability (m/s)
1	48	23	15.18	8.0	6.10	5	1.62	19.54	37.7	3.50E-07
2	39	16	16.06	9.0	20.0	6	1.65	21.62	29.7	3.70E-07
3	24	14	16.06	9.0	20.6	9	1.67	18.52	48.1	5.80E-07
4	25	16	16.06	10.0	8.10	9	1.67	19.56	95.2	4.10E-08
5	26	20	16.2	10.0	14.4	9	1.69	19.85	108.5	2.60E-08
6	34	19	16.2	10.0	10.9	9	1.71	19.47	123.3	4.00E-08
7	52	24	16.2	10.5	4.20	10	1.71	19.22	182.6	4.30E-09
8	42	21	16.35	11.0	6.00	11	1.72	16.89	165.1	1.20E-09
9	50	18	16.35	11.0	20.9	11	1.72	19.35	136.7	7.20E-09
10	43	16	16.57	11.0	14.3	12	1.73	19.87	73.5	2.60E-07
11	30	18	16.64	11.0	17.1	12	1.74	16.42	55.3	1.00E-07
12	34	18	16.64	11.0	13.6	13	1.75	19.58	49.9	8.70E-08
13	38	17	17.07	11.0	12.1	13	1.75	18.93	137.6	9.70E-09
14	34	18	17.07	11.0	12.9	13	1.77	17.75	145	1.90E-08
15	40	17	17.22	11.5	13.8	13	1.77	17.79	169.6	7.50E-09
16	39	20	17.22	11.5	4.10	14	1.77	18.51	228.2	2.50E-09
17	37	18	17.37	12.0	10.8	14	1.77	19.85	204.1	1.40E-09
18	42	17	17.51	12.0	13.0	14	1.79	18.06	263.7	4.20E-10
19	32	17	17.51	12.0	8.90	14	1.8	14.27	90.5	1.10E-07
20	34	17	17.51	12.0	13.9	14	1.82	17.99	77.5	5.40E-08
21	30	18	17.51	12.0	4.30	14	1.82	15.97	106.3	7.20E-08
22	25	17	17.51	12.0	13.8	14	1.82	14.88	235.1	4.40E-09
23	35	17	17.66	12.0	17.1	14	1.83	15.52	230.2	2.50E-09
24	43	16	17.66	12.5	13.3	15	1.84	14.21	284.2	7.60E-09
25	38	21	17.66	12.5	18.7	15	1.84	13.87	367.8	6.40E-10
26	31	19	17.8	12.5	19.5	15	1.9	13.87	376	3.40E-10
27	22	16	17.8	12.5	6.10	15	1.92	15.25	475.2	2.60E-10
28	30	15	17.8	12.5	20.3	15	1.94	12.61	168.8	8.40E-08
29	40	16	18.09	12.5	14.8	16	1.94	12.21	106.8	5.40E-08
30	28	15	18.09	12.5	18.3	16	1.94	12.66	139.6	2.00E-08
31	41	17	18.24	12.5	17.8	16	1.94	12.79	425.9	2.20E-09
32	41	12	18.24	13.0	13.4	16	1.95	14.34	385.4	4.80E-09
33	33	13	18.38	13.0	13.3	16	1.97	13.23	362.6	1.70E-09
34	30	13	18.38	13.0	14.1	16	1.97	12.70	881.6	2.30E-10
35	30	19	18.53	13.0	14.8	17	1.98	14.03	974.9	1.10E-10
36	37	15	18.53	13.0	33.8	18	1.99	13.68	786.7	3.00E-10
37	20	16	18.53	13.0	–	18	–	–	–	–
38	22	18	18.53	13.0	–	20	–	–	–	–
39	24	21	18.82	13.0	–	21	–	–	–	–
40	32	–	18.82	13.0	–	21	–	–	–	–
41	43	–	19.11	13.0	–	22	–	–	–	–
42	45	–	19.11	13.0	–	22	–	–	–	–
43	42	–	19.11	13.0	–	23	–	–	–	–
44	41	–	19.26	13.0	–	24	–	–	–	–
45	–	–	19.26	13.5	–	-	–	–	–	–
46	–	–	19.26	13.5	–	-	–	–	–	–
47	–	–	19.26	14.0	–	-	–	–	–	–
48	–	–	20.13	14.5	–	-	–	–	–	–
49	–	–	21.15	15.0	–	-	–	–	–	–
50	–	–	21.88	15.5	–	-	–	–	–	–

S. no.	Natural geomaterial					Legacy waste	Pond ash-bentonite mixture
S. no.	Liquid limit (%)	Plastic limit (%)	Dry density (mg/m³)	Moisture content (%)	CBR (%)	% of fines (< 75 µm)	Dry density (mg/m³)	Moisture content (%)	UCS (kPa)	Permeability (m/s)
1	48	23	15.18	8.0	6.10	5	1.62	19.54	37.7	3.50E-07
2	39	16	16.06	9.0	20.0	6	1.65	21.62	29.7	3.70E-07
3	24	14	16.06	9.0	20.6	9	1.67	18.52	48.1	5.80E-07
4	25	16	16.06	10.0	8.10	9	1.67	19.56	95.2	4.10E-08
5	26	20	16.2	10.0	14.4	9	1.69	19.85	108.5	2.60E-08
6	34	19	16.2	10.0	10.9	9	1.71	19.47	123.3	4.00E-08
7	52	24	16.2	10.5	4.20	10	1.71	19.22	182.6	4.30E-09
8	42	21	16.35	11.0	6.00	11	1.72	16.89	165.1	1.20E-09
9	50	18	16.35	11.0	20.9	11	1.72	19.35	136.7	7.20E-09
10	43	16	16.57	11.0	14.3	12	1.73	19.87	73.5	2.60E-07
11	30	18	16.64	11.0	17.1	12	1.74	16.42	55.3	1.00E-07
12	34	18	16.64	11.0	13.6	13	1.75	19.58	49.9	8.70E-08
13	38	17	17.07	11.0	12.1	13	1.75	18.93	137.6	9.70E-09
14	34	18	17.07	11.0	12.9	13	1.77	17.75	145	1.90E-08
15	40	17	17.22	11.5	13.8	13	1.77	17.79	169.6	7.50E-09
16	39	20	17.22	11.5	4.10	14	1.77	18.51	228.2	2.50E-09
17	37	18	17.37	12.0	10.8	14	1.77	19.85	204.1	1.40E-09
18	42	17	17.51	12.0	13.0	14	1.79	18.06	263.7	4.20E-10
19	32	17	17.51	12.0	8.90	14	1.8	14.27	90.5	1.10E-07
20	34	17	17.51	12.0	13.9	14	1.82	17.99	77.5	5.40E-08
21	30	18	17.51	12.0	4.30	14	1.82	15.97	106.3	7.20E-08
22	25	17	17.51	12.0	13.8	14	1.82	14.88	235.1	4.40E-09
23	35	17	17.66	12.0	17.1	14	1.83	15.52	230.2	2.50E-09
24	43	16	17.66	12.5	13.3	15	1.84	14.21	284.2	7.60E-09
25	38	21	17.66	12.5	18.7	15	1.84	13.87	367.8	6.40E-10
26	31	19	17.8	12.5	19.5	15	1.9	13.87	376	3.40E-10
27	22	16	17.8	12.5	6.10	15	1.92	15.25	475.2	2.60E-10
28	30	15	17.8	12.5	20.3	15	1.94	12.61	168.8	8.40E-08
29	40	16	18.09	12.5	14.8	16	1.94	12.21	106.8	5.40E-08
30	28	15	18.09	12.5	18.3	16	1.94	12.66	139.6	2.00E-08
31	41	17	18.24	12.5	17.8	16	1.94	12.79	425.9	2.20E-09
32	41	12	18.24	13.0	13.4	16	1.95	14.34	385.4	4.80E-09
33	33	13	18.38	13.0	13.3	16	1.97	13.23	362.6	1.70E-09
34	30	13	18.38	13.0	14.1	16	1.97	12.70	881.6	2.30E-10
35	30	19	18.53	13.0	14.8	17	1.98	14.03	974.9	1.10E-10
36	37	15	18.53	13.0	33.8	18	1.99	13.68	786.7	3.00E-10
37	20	16	18.53	13.0	–	18	–	–	–	–
38	22	18	18.53	13.0	–	20	–	–	–	–
39	24	21	18.82	13.0	–	21	–	–	–	–
40	32	–	18.82	13.0	–	21	–	–	–	–
41	43	–	19.11	13.0	–	22	–	–	–	–
42	45	–	19.11	13.0	–	22	–	–	–	–
43	42	–	19.11	13.0	–	23	–	–	–	–
44	41	–	19.26	13.0	–	24	–	–	–	–
45	–	–	19.26	13.5	–	-	–	–	–	–
46	–	–	19.26	13.5	–	-	–	–	–	–
47	–	–	19.26	14.0	–	-	–	–	–	–
48	–	–	20.13	14.5	–	-	–	–	–	–
49	–	–	21.15	15.0	–	-	–	–	–	–
50	–	–	21.88	15.5	–	-	–	–	–	–

Note(s):

CBR = California Bearing Ratio; UCS = unconfined compressive strength

Source(s): Table by authors

Table 2

Summary of descriptive statistics of the geotechnical data used for normality testing

Description	Properties	Soil type	Sample size (n)	Average $(\bar{x})$	SD	CoV (%)	Skewness (b₁)	Kurtosis (b₂)
Data set 1	Liquid limit (%)	SM, SC	44	35.1	7.83	22.3	−0.010	−0.613
Data set 2	Plastic limit (%)	SM, SC	39	17.4	2.56	14.7	0.388	0.596
Data set 3	Dry density (mg/m³)	SM, SC	50	1.821	0.14	7.4	0.605	0.896
Data set 4	Moisture content (%)	SM, SC	50	12.1	1.47	12.2	−0.490	0.802
Data set 5	California bearing ratio (%)	SM, SC	35^#	13.3	4.91	37.0	−0.378	−0.613
Data set 6	Fines (%)	SM*	44	14.6	4.28	29.2	0.147	0.141
Data set 7	Dry density (mg/m³)	PA-Bt mixtures	36	1.811	0.11	5.9	0.192	−1.123
Data set 8	Moisture content (%)	PA-Bt mixtures	36	16.5	2.78	16.8	−0.034	−1.454
Data set 9	Unconfined compressive strength (kPa)	PA-Bt mixtures	36	242.5	228.71	94.3	1.954	3.665
Data set 10	Permeability (m/s)	PA-Bt mixtures	36	6.4	1.27E-07	196.8	2.814	8.098

Description	Properties	Soil type	Sample size (n)	Average $(\bar{x})$	SD	CoV (%)	Skewness (b₁)	Kurtosis (b₂)
Data set 1	Liquid limit (%)	SM, SC	44	35.1	7.83	22.3	−0.010	−0.613
Data set 2	Plastic limit (%)	SM, SC	39	17.4	2.56	14.7	0.388	0.596
Data set 3	Dry density (mg/m³)	SM, SC	50	1.821	0.14	7.4	0.605	0.896
Data set 4	Moisture content (%)	SM, SC	50	12.1	1.47	12.2	−0.490	0.802
Data set 5	California bearing ratio (%)	SM, SC	35^#	13.3	4.91	37.0	−0.378	−0.613
Data set 6	Fines (%)	SM*	44	14.6	4.28	29.2	0.147	0.141
Data set 7	Dry density (mg/m³)	PA-Bt mixtures	36	1.811	0.11	5.9	0.192	−1.123
Data set 8	Moisture content (%)	PA-Bt mixtures	36	16.5	2.78	16.8	−0.034	−1.454
Data set 9	Unconfined compressive strength (kPa)	PA-Bt mixtures	36	242.5	228.71	94.3	1.954	3.665
Data set 10	Permeability (m/s)	PA-Bt mixtures	36	6.4	1.27E-07	196.8	2.814	8.098

Note(s):

*Soil-like fraction from legacy was classified as equivalent to SM; PA-Bt. = pond ash-bentonite; ^# Sample size after deducting outlier

Source(s): Table by authors

Soil-like fractions were collected from a municipal solid waste open dumpsite in Karaikal, Puducherry. Forty-four samples collected from depths ranging from 1 to 4.0 m were analysed for particle size distribution, focusing on the percentages of fines (silt and clay-sized fractions) that passed through the 75 µm sieve. Based on UCSC classification (ASTM D2487, 2017), the soil-like fraction was identified as equivalent to Silty Sand (SM). The fines content (%) was designated as data set 6.

The processed geomaterial was prepared by blending pond ash with bentonite at varying inclusion rates of 10%, 20% and 30%. It was then assessed for its suitability as a landfill liner and cover system application. This data set is reported as a laboratory-derived geotechnical data set. Pond ash was sourced from the ash discharge point at the Neyveli Thermal Power Station, Tamil Nadu, and the bentonite was obtained from a commercial supplier. A total of 36 placement conditions, defined by variations in dry density and moisture content, were designated as data sets 7 and 8, respectively. In addition, the unconfined compressive strength (ASTM D2166, 2016) and permeability (ASTM D5084, 2016) of the processed geomaterials were determined and categorised as Data sets 9 and 10. The geotechnical properties determined from laboratory testing were subsequently used for statistical normality analysis.

2.2 Selected goodness-of-fit test for normality evaluation

Among the GoF tests, this study uses the K-S test with Lilliefors’ correction, commonly referred to as the LL test, because the classical K-S test assumes that the mean and variance are known before the analysis. In contrast, these parameters are invariably estimated from sample data in geotechnical applications. Alongside the LL test, analyses include the A-D, CSQ and S-W tests, which are most commonly used for normality assessment in geotechnical data sets. The present study further incorporates J-B, DP and CVM tests, which involve relatively straightforward computational procedures. A detailed description of these normality tests, along with their respective rejection criteria, is summarised in Table 3. The LL test measures the maximum deviation between the empirical cumulative distribution function and the hypothesised cumulative distribution function, calculated based on the sample data. The test is known to be less sensitive in the tails. The A-D test adjusts for this limitation by assigning greater weight to tail discrepancies (Stephens, 1974). The CVM test offers a balanced approach, integrating squared differences across the entire data range to provide a meaningful assessment of the distribution.

Table 3

Overview of selected goodness-of-fit tests, associated test statistics and corresponding rejection criteria

S. no.	Test of normality	Test statistic	Criteria for rejection
1	Lilliefors Test (Lilliefors, 1967)	Test statistic: D_n = max \|F_x(x) – S_n(x)\| S_n = cumulative frequency function F_x = assumed theoretical CDFD_n = maximum difference $D_{n}^{α} = critical values at various α$	If $D_{n} < D_{n}^{α}$ ⁠, the proposed theoretical distribution is acceptable at the specified significance level α
2	J-B test (Jarque and Bera, 1987)	Test statistic: $J B = n (\frac{{(\sqrt{b_{1}})}^{2}}{6} + \frac{{(b_{2} - 3)}^{2}}{24})$ $\sqrt{b 1} = s k e w n e s s$ ⁠; b₂ = kurtosis	The test statistic is compared with 5.9915
3	A-D test (Anderson and Darling, 1954)	$A^{2} = - \sum_{i = 1}^{n} [(2 i - 1) {ln F_{x} (x_{i}) + ln [1 - F_{x} (X_{(n + 1 - i)})]} / n] - n$ $A^{} = A^{2} (1.0 + \frac{0.75}{n} + \frac{2.25}{n^{2}})$ $c_{α} = a_{α} (1 + \frac{b_{0}}{n} + \frac{b_{1}}{n^{2}})$ ⁠, n = no. of sample a_α, b₀, b₁are constants*	The test statistic is compared with the related critical value
4	DP test (D’Agostino and Pearson, 1973)	$D P = z^{2} (\sqrt{b_{1}}) + z^{2}$ (b₂) $\sqrt{b_{1}} = s k e w n e s s, b_{2} = k u r t o s i s$	p-value of DP value is compared with significance level (α)
5	CSQ test (Pearson, 1900)	$CSQ = \sum_{j = 1}^{c} \frac{{(O_{j} - E_{j})}^{2}}{E_{j} .}$ O_j = observed values, E_j = expected values	CSQ value is compared with 5.991
6	CVM test (Cramer, 1928; Smirnov, 1936; Stephens, 1974)	$C V M = \frac{1}{12 n} + {\sum_{i = 1}^{n} (z_{i} - \frac{2 i - 1}{2 n})}^{2}$ $Z_{i} = \frac{X (i) - \bar{X}}{S}$ Z_i = is the standardized observation $\bar{X} = i s t h e s a m p l e m e a n$	The calculated CVM statistics are compared with the critical value at a chosen significance level
7	S-W test (Shapiro and Wilk, 1965)	b = a1(x_n – x₁) + a2(x_n_–1 – x2) +… …, $w = \frac{b^{2}}{(n - 1) S^{2}}$ a1, a2,… are coefficients based on sample size, s = standard deviation of sample data	Calculated W is compared with table W

S. no.	Test of normality	Test statistic	Criteria for rejection
1	Lilliefors Test (Lilliefors, 1967)	Test statistic: D_n = max \|F_x(x) – S_n(x)\| S_n = cumulative frequency function F_x = assumed theoretical CDFD_n = maximum difference $D_{n}^{α} = critical values at various α$	If $D_{n} < D_{n}^{α}$ , the proposed theoretical distribution is acceptable at the specified significance level α
2	J-B test (Jarque and Bera, 1987)	Test statistic: $J B = n (\frac{{(\sqrt{b_{1}})}^{2}}{6} + \frac{{(b_{2} - 3)}^{2}}{24})$ $\sqrt{b 1} = s k e w n e s s$ ; b₂ = kurtosis	The test statistic is compared with 5.9915
3	A-D test (Anderson and Darling, 1954)	$A^{2} = - \sum_{i = 1}^{n} [(2 i - 1) {ln F_{x} (x_{i}) + ln [1 - F_{x} (X_{(n + 1 - i)})]} / n] - n$ $A^{} = A^{2} (1.0 + \frac{0.75}{n} + \frac{2.25}{n^{2}})$ $c_{α} = a_{α} (1 + \frac{b_{0}}{n} + \frac{b_{1}}{n^{2}})$ , n = no. of sample a_α, b₀, b₁are constants*	The test statistic is compared with the related critical value
4	DP test (D’Agostino and Pearson, 1973)	$D P = z^{2} (\sqrt{b_{1}}) + z^{2}$ (b₂) $\sqrt{b_{1}} = s k e w n e s s, b_{2} = k u r t o s i s$	p-value of DP value is compared with significance level (α)
5	CSQ test (Pearson, 1900)	$CSQ = \sum_{j = 1}^{c} \frac{{(O_{j} - E_{j})}^{2}}{E_{j} .}$ O_j = observed values, E_j = expected values	CSQ value is compared with 5.991
6	CVM test (Cramer, 1928; Smirnov, 1936; Stephens, 1974)	$C V M = \frac{1}{12 n} + {\sum_{i = 1}^{n} (z_{i} - \frac{2 i - 1}{2 n})}^{2}$ $Z_{i} = \frac{X (i) - \bar{X}}{S}$ Z_i = is the standardized observation $\bar{X} = i s t h e s a m p l e m e a n$	The calculated CVM statistics are compared with the critical value at a chosen significance level
7	S-W test (Shapiro and Wilk, 1965)	b = a1(x_n – x₁) + a2(x_n_–1 – x2) +… …, $w = \frac{b^{2}}{(n - 1) S^{2}}$ a1, a2,… are coefficients based on sample size, s = standard deviation of sample data	Calculated W is compared with table W

Source(s): Table by authors

Moment-based tests, including the J-B test (Jarque and Bera, 1987) and the DP omnibus test (D’Agostino and Pearson, 1973), confirmed the distributional characteristics based on measures of skewness and kurtosis. In addition, the CSQ test (Pearson, 1900), a classical data frequency-based method, was also employed. The S-W test (Shapiro and Wilk, 1965), which leverages order statistics and variance ratios, was included due to its strong performance in small sample analyses. This comparative evaluation aims to provide insights into the performance of these GoF tests under different probabilistic models, thereby facilitating the selection of an appropriate test for geotechnical engineering applications.

2.3 Protocol to assess the efficacy of the selected GoF tests using power analysis

The power analysis was conducted through extensive simulations using MATLAB to evaluate the effectiveness of selected GoF tests. Furthermore, the Lilliefors test (lillietest function), A–D (adtest function), chi-square goodness-of-fit (chi2gof function) and Jarque–Bera (jbtest function) tests are built-in functions in MATLAB and can be used directly for analysis. However, tests like Shapiro–Wilk (swtest function), Cramér–von Mises (cmtest function) and D’Agostino–Pearson (DagosPtest function) are sourced from external MATLAB file exchange (BenSaïda, 2025a, 2025b; Trujillo-Ortiz, 2025).

The statistical power of the test is defined as the probability of correctly rejecting the null hypothesis (H₀: The distribution is normal) when the alternative hypothesis (H_A: The distribution is not normal) is true. Mathematically, it is expressed as Power = 1 − β, where β represents the type II error (fails to reject H₀ when H_A is true). A high power value minimises Type II error and ensures reliable statistical inference.

In this analysis, the Monte Carlo simulations were carried out to increase the sample size by generating the random realisations of synthetic data based on the moments and marginal distribution of the original sample data set. Key simulation parameters, including significance level, number of iterations, and sample sizes, were then defined. The simulations were conducted for multiple sample sizes, ranging from 5 to 10,000, increasing in increments of 5 (user-defined). A significance level (α) of 0.05 was selected for hypothesis testing across the simulations.

The power of the selected GoF tests was then calculated by following the protocol, as described in Steele (2003) and Arnastauskaite et al. (2021). Step 1: Collection of sample data – the analysis begins with an experimentally obtained data set consisting of observations x₁, x₂,…, x_n. Step 2: Statistical parameter estimation – sample moments and the corresponding parameters of the fitted marginal distributions are computed from the collected data. Step 3: Monte Carlo data generation – using these estimated parameters, surrogate data sets are generated under the specified alternative distribution through Monte Carlo simulation. Step 4: Applying the hypothesis test – the test statistic is computed based on the compatibility hypothesis criteria. If the obtained test statistic (p-value) exceeds the corresponding critical value (chosen significance level of α = 0.05), the null hypothesis (H₀) of the distribution is normal and will be accepted. Step 5: Simulation and repetition – steps 2–4 are repeated for k iterations (in this case, k = 10,000) to ensure a robust evaluation of the test performance. Step 6: Calculating the power of the test – the power (1 − β) is determined as count/k, where count represents the number of times the null hypothesis (H₀) is correctly rejected under the assumed alternative distribution across the k iterations.

For each realisation, seven different normality tests were applied: the S-W test, LL test, A-D test, CVM test, CSQ test, J-B test and DP omnibus test. After computing the power values, the performance of each test was evaluated using its behaviour at the maximum sample size to establish its comparative effectiveness. A higher power indicates a stronger ability to identify deviations from normality, making the corresponding test more reliable for assessing data normality.

3. Results and discussion

3.1 Index and engineering properties of natural/processed geomaterial assessed for normality

The normality assessment for the geotechnical properties, summarised in Table 1, was conducted with a focus on their application in highway subgrade construction and landfill liner and cover materials. Specifically, data sets 1 to 5 were evaluated for their suitability as highway subgrade material with a target CBR value of at least 10%. Meanwhile, data set 6 was assessed to evaluate the suitability of soil-like fractions as backfill material, considering the percentage of fines. Furthermore, data sets 7 to 10 were analysed for their suitability as liner and cover material, targeting an unconfined compressive strength of ≥ 200 kPa and permeability (k) of ≤ 10⁻⁷cm/s to ensure effective containment performance. Furthermore, conducting statistical parametric tests on these physical and mechanical properties provides valuable insight into the feasibility of utilising these materials in the subgrade and landfill containment systems while addressing inherent variability and uncertainty.

3.2 Descriptive statistics of the geotechnical data used for the analysis

Table 2 provides a comprehensive summary of the descriptive statistics of geotechnical data analysed for normality assessment. To proceed further, each data set was initially organised, and outliers were identified and removed by performing a statistical hypothesis test proposed by Grubbs (1969) to ensure data consistency and unbiasedness. Based on the result of Grubbs’ two-tailed test (significance level, α = 0.01), it was observed that only data set 5 (CBR) contained an outlier (CBR = 33.8%), which was identified and excluded from the data set. The excluded data point exhibited statistically significant deviations from the sample mean and was removed before performing the normality and power analyses. The variability in the data, assessed using the coefficient of variation (CoV), ranged from 6% to 196%. According to Harr’s (1987) classification, the variability is categorised based on the sample CoV. Specifically, data sets 3, 4 and 7 exhibit low variability (CoV < 10%), while data sets 1, 2, 6 and 8 display moderate variability (15% < CoV < 30%). In contrast, CBR (data set 5), UCS (data set 9) and permeability (data set 10) demonstrate high variability (CoV > 30%). Higher CoV is common for the geotechnical domain (Benson, 1993; Baecher and Christian, 2005).

In addition, Table 2 reports the sample averages, standard deviations, skewness and kurtosis values, which provide qualitative insights into the data distribution characteristics. The skewness coefficient indicates asymmetry in the data distribution: data sets 1, 4 and 8 exhibit negative skewness, indicating a left-skewed distribution. While data sets 2, 3, 5, 6, 7, 9 and 10 display positive skewness, reflecting a right-skewed distribution. Furthermore, the kurtosis analysis, which measures the sharpness of the distribution, reveals that data sets 1 to 8 exhibit a flat, short-tailed distribution (Platykurtic, kurtosis < 3). In contrast, data sets 9 and 10 are characterised by sharp peaks and long tails (leptokurtic, kurtosis > 3). Ideally, a data set with skewness near zero and a kurtosis value of approximately three indicates a perfectly symmetrical normal distribution (Newell and Hancock, 1984; Thode, 2002). Although the skewness and kurtosis (third and fourth central moments, respectively) serve as functional parameters for assessing normality, relying solely on these metrics may lead to misleading conclusions. Hence, to ensure robustness in the statistical decisions on the normality assessment, the theory-driven normality tests (GoF) discussed in Section 2.2 shall be applied.

3.3 Normality outcomes for original data sets and necessity of conducting power analysis

Table 4 presents a summary of the hypothesis decisions by various GoF tests in assessing the normality of the original data sets, based on the analytical equations provided in Table 3. The assessment of normality was conducted by comparing the test statistics against the corresponding critical values. Accordingly, the LL test and S-W tests rejected the null hypothesis of normality for data sets 4, 5, 9 and 10. In contrast, the moment-based J-B test accepted normality only for data sets 1 to 8. In addition, the CSQ test rejected the null hypothesis for data sets 2, 4, 5, 7, 8 and 10. However, the CVM tests failed to reject the null hypothesis for data sets 1 to 9, indicating variations in sensitivity across different normality tests. A consistent finding across all the GoF tests was the rejection of normality for data sets 9 and 10.

Table 4

Summary of hypothesis inference on the normality tests conducted on original and skewed data sets using selected GoF tests

Data set	LL	A-D	CSQ	J-B	DP	CVM	S-W
Original data set
1	✓	✓	✓	✓	✓	✓	✓
2	✗	✓	✗	✓	✓	✓	✓
3	✓	✓	✓	✓	✓	✓	✓
4	✗	✗	✗	✓	✓	✓	✗
5	✗	✓	✗	✓	✓	✓	✗
6	✗	✓	✓	✓	✓	✓	✓
7	✓	✗	✗	✓	✓	✓	✓
8	✓	✗	✗	✓	✗	✓	✗
9	✗	✗	✓	✗	✗	✓	✗
10	✗	✗	✗	✗	✗	✗	✗
Skewed data set (k = 2)
1	✓	✓	✓	✓	✓	✓	✓
2	✗	✗	✓	✗	✓	✓	✗
3	✓	✓	✓	✗	✗	✓	✗
4	✗	✗	✗	✓	✓	✓	✓
5	✗	✓	✗	✓	✓	✓	✗
6	✗	✗	✓	✗	✓	✓	✗
7	✓	✗	✗	✓	✓	✓	✗
8	✗	✗	✗	✓	✓	✓	✗
9	✗	✗	✗	✗	✗	✗	✗
10	✗	✗	✗	✗	✗	✗	✗
Skewed data set (k = 3)
1	✓	✗	✓	✗	✓	✓	✗
2	✗	✗	✓	✗	✗	✓	✗
3	✓	✗	✓	✗	✗	✓	✗
4	✗	✗	✗	✓	✓	✓	✗
5	✗	✗	✗	✓	✓	✓	✗
6	✗	✗	✗	✗	✗	✗	✗
7	✓	✗	✗	✓	✓	✓	✗
8	✗	✗	✗	✓	✓	✓	✗
9	✗	✗	✗	✗	✗	✗	✗
10	✗	✗	✗	✗	✗	✗	✗

Data set	LL	A-D	CSQ	J-B	DP	CVM	S-W
Original data set
1	✓	✓	✓	✓	✓	✓	✓
2	✗	✓	✗	✓	✓	✓	✓
3	✓	✓	✓	✓	✓	✓	✓
4	✗	✗	✗	✓	✓	✓	✗
5	✗	✓	✗	✓	✓	✓	✗
6	✗	✓	✓	✓	✓	✓	✓
7	✓	✗	✗	✓	✓	✓	✓
8	✓	✗	✗	✓	✗	✓	✗
9	✗	✗	✓	✗	✗	✓	✗
10	✗	✗	✗	✗	✗	✗	✗
Skewed data set (k = 2)
1	✓	✓	✓	✓	✓	✓	✓
2	✗	✗	✓	✗	✓	✓	✗
3	✓	✓	✓	✗	✗	✓	✗
4	✗	✗	✗	✓	✓	✓	✓
5	✗	✓	✗	✓	✓	✓	✗
6	✗	✗	✓	✗	✓	✓	✗
7	✓	✗	✗	✓	✓	✓	✗
8	✗	✗	✗	✓	✓	✓	✗
9	✗	✗	✗	✗	✗	✗	✗
10	✗	✗	✗	✗	✗	✗	✗
Skewed data set (k = 3)
1	✓	✗	✓	✗	✓	✓	✗
2	✗	✗	✓	✗	✗	✓	✗
3	✓	✗	✓	✗	✗	✓	✗
4	✗	✗	✗	✓	✓	✓	✗
5	✗	✗	✗	✓	✓	✓	✗
6	✗	✗	✗	✗	✗	✗	✗
7	✓	✗	✗	✓	✓	✓	✗
8	✗	✗	✗	✓	✓	✓	✗
9	✗	✗	✗	✗	✗	✗	✗
10	✗	✗	✗	✗	✗	✗	✗

Note(s):

✓ Fails to reject H₀ (Normality assumption accepted);

✗ Rejecting H₀ (Normality assumption rejected)

Source(s): Table by authors

Notably, these data sets (9 and 10) exhibit high CoV of 94% and 196%, respectively, and are characterised by long-tailed distributions. Since the lognormal distribution is more commonly used in geotechnical engineering, Monte Carlo simulation studies were conducted to further investigate the boundary conditions under which data conform to a lognormal distribution. To achieve this, a random variable was generated synthetically while systematically varying CoV and kurtosis, keeping skewness constant (zero). The generated data sets were then subjected to GoF tests to determine whether they adhered to a normal (Gaussian) or lognormal distribution. Figure 1 visually represents the progressive shift in data distribution from normal to lognormal as variability increases. The analysis revealed a critical threshold: when a data set exhibits a CoV greater than 30% with a kurtosis value exceeding 3, the lognormal distribution provides a superior fit. These findings reinforce the importance of considering CoV and kurtosis when selecting an appropriate probability distribution for geotechnical data.

Figure 1

Scatter plot comparing the coefficient of kurtosis against the coefficient of variation, with data points categorized and fitted lines indicating lognormal and normal distributions.

View large Download slide

The scatter plot displays the relationship between the coefficient of kurtosis, ranging from zero to ten on the vertical axis, and the coefficient of variation, which extends from zero to seventy percent on the horizontal axis. The data points are represented by various symbols that may indicate different categories or groups. Two distinct regions are highlighted with red dashed boxes, one indicating data fitted to a lognormal distribution and another showing data fitted to a normal distribution. The plot includes annotations pointing to these fitted distributions for clarity, without interpreting the data itself. The axes are labelled clearly, and the increments on the horizontal axis are marked to facilitate understanding of the data spread.

Effect of the coefficient of variation and kurtosis value on fitting the data to the lognormal distribution

Source: Figure by authors

Since the data sets analysed varied in sample sizes, and different GoF use different statistical measures, drawing a definitive conclusion on the superior performance of any particular GoF test is challenging. Generally, the assumption of normality becomes less critical for sample sizes n ≥ 30 (Uttley, 2019), as supported by the central limit theorem. Furthermore, there are reported cases where small sample data sets that are genuinely drawn from a normally distributed population may still pass normality tests. However, this outcome is not necessarily meaningful, as it often results from the test’s lack of statistical power (Öztuna et al., 2006; Ghasemi and Zahediasl, 2012). For small sample sizes, normality tests often lack the sensitivity needed to detect deviations from normality, increasing the likelihood of Type II errors (false acceptance of normality). Consequently, without conducting a power analysis, determining the appropriate sample size remains uncertain, which may lead to biased conclusions regarding normality.

3.4 Identification of best-fit distributions and selection of surrogate models for power analysis

Before conducting the power analysis through Monte Carlo Simulations, it is necessary to define the alternative distribution corresponding to the alternative hypothesis. Accordingly, each data set was evaluated against several theoretical distributions, Normal (N), Lognormal (LN), Weibull (WB), Gamma (GA), Generalised Extreme Value (GEV), Gumbel (GB) and Exponential (Exp), as presented in Figure 2. The GoF for each distribution was quantified using mean P-values derived from seven selected GoF tests. The test results revealed that data sets 1, 4 and 10 were well-fitted with the WB distribution, data sets 3, 7 and 9 with the LN distribution, data sets 5, 6 and 8 with the GEV distribution, and data set 2 with the GA distribution, respectively. The GB and Exp distributions were found to be the least suitable for the selected data sets. Notably, analysis indicated that data sets with higher CoV fitted well with LN, WB, GEV and GA, respectively. Consequently, these four models were adopted as surrogate models for the subsequent power analysis to evaluate the performance of GoF tests and determine the sample size.

Figure 2

A bar graph displays the mean p-value of various best fit distributions across ten datasets, with each distribution represented by different colored bars.

View large Download slide

The image illustrates a bar graph depicting the mean p value for different best fit distributions across ten datasets. The vertical axis is labelled MEAN P VALUE, ranging from zero to one, while the horizontal axis is marked with Dataset 1 through Dataset 10. Each dataset displays multiple coloured bars, representing different types of distributions, Normal, grey, Lognormal, green, Weibull, blue, Gamma, red, Generalized Extreme Value, gold, Gumbel, orange, and Exponential, dark grey. Each distribution type has an associated legend in the top right corner for identification. The bars are arranged to highlight the comparisons across the datasets without presenting any trends, making it clear which dataset corresponds to each distribution.

Best-fitting probability model for the original data sets based on the mean P-value estimated from the seven GoF tests

Source: Figure by authors

3.5 Power evaluation of selected GoF test through Monte Carlo simulation

The power analysis revealed key insights into the performance of normality tests applied to geotechnical data. Figure 3(a)–(d) illustrates the variation in the power values with increasing sample sizes for data set 5 (CBR), evaluated under alternate distributions of LN, WB, GA and GEV. The results indicated that the power value consistently increases with sample size and asymptotically approaches 1.0 as the sample size becomes sufficiently large. Moreover, the sample size required to achieve the power value of 1.0 varied considerably across the alternative distributions, ranging from 90 to over 1,000. For instance, at a sample size of 100, using the S-W GoF test, the power values for LN, GA and GEV approached 1.0, whereas the power value of WB is relatively lower, at around 0.1. This observation highlights the strong dependence of power and corresponding sample size requirements on the shape and location parameters of the underlying alternative marginal distributions.

Figure 3

This image displays multiple plots showing the relationship between power and sample size across different statistical methods, organised in a grid format.

View large Download slide

The image consists of a grid of eight plots showing the power, one minus beta, against sample size for various statistical methods. The plots are categorised into four rows labelled a to d, with two subplots per category. The categories include L N, W B, G A, and G E V. Each plot features multiple lines representing different methods, L L, A D, C S Q, J B, D P, C V M, and S W, indicated by distinct markers and colours. The horizontal axis represents sample size, ranging from one to ten thousand, with logarithmic scaling. The vertical axis represents power with values ranging from zero to one. The upper left corner contains legend keys identifying the methods. Graph grid lines enhance readability. Each plot's data points resemble cumulative distribution functions, reflecting the performance of statistical tests as influenced by sample size.

Power curves for the original data set (CBR – data set 5) and the skewness-induced data sets. Subplots (a)–(d) correspond to the original data, while (a₁)–(d₁) and (a₂)–(d₂) represent the data sets transformed with exponents k = 2 and k = 3, respectively

Source: Figure by authors

Conversely, the type of GoF tests used had a marked influence on the power values obtained across the different sample sizes. As observed from Figure 3, the S-W, A-D, LL and J-B tests exhibited relatively higher power values, whereas the CSQ and CVM tests demonstrated comparatively lower power. Consequently, to determine whether the observed differences in power values among the selected normality tests were statistically significant under various distributional conditions (lognormal, gamma, Weibull and GEV), a non-parametric Friedman test was performed. According to Corder and Foreman (2014), the null hypothesis (H₀) is postulated that all the GoF tests yield equivalent power values, while the alternative hypothesis (H_A) proposed that at least one test performs significantly different. Table 5 summarises the computed power values for data set 5 across various GoF tests and sample sizes. Three representative sample sizes were included in the comparative analysis, and test statistics for each case were calculated using equation (1):

F_{r} = [\frac{12}{l m (l + 1)} \sum_{i = 1}^{k} (R_{i}^{2})] - 3 l (m + 1)

(1)

Table 5

Friedman test results comparing mean power values of the GoF tests across different parametric surrogate distributions

PDF	Effect size	LL	J-B	A-D	S-W	DP	CVM	CSQ	F_r (test statistic)	F_c (critical)
Sample size (n) = 25
LN	0.22	0.382	0.497	0.521	0.597	0.412	0.010	0.046	22.6	11.6
WB	0.15	0.044	0.029	0.046	0.051	0.018	0.000	0.001
GA	0.18	0.167	0.229	0.217	0.273	0.176	0.000	0.007
GEV	0.16	0.164	0.216	0.234	0.304	0.156	0.000	0.007
Sample size (n) = 50
LN	0.22	0.3820	0.4968	0.5210	0.5966	0.4116	0.0102	0.0456	22.4	11.6
WB	0.15	0.0440	0.0294	0.0464	0.0506	0.0184	0.0000	0.0014
GA	0.18	0.1672	0.2286	0.2170	0.2728	0.1764	0.0004	0.0070
GEV	0.16	0.1638	0.2158	0.2340	0.3036	0.1560	0.0000	0.0072
Sample size (n) = 100
LN	0.22	0.3820	0.4968	0.5210	0.5966	0.4116	0.0102	0.0456	22.7	11.6
WB	0.15	0.0440	0.0294	0.0464	0.0506	0.0184	0.0000	0.0014
GA	0.18	0.1672	0.2286	0.2170	0.2728	0.1764	0.0004	0.0070
GEV	0.16	0.1638	0.2158	0.2340	0.3036	0.1560	0.0000	0.0072

PDF	Effect size	LL	J-B	A-D	S-W	DP	CVM	CSQ	F_r (test statistic)	F_c (critical)
Sample size (n) = 25
LN	0.22	0.382	0.497	0.521	0.597	0.412	0.010	0.046	22.6	11.6
WB	0.15	0.044	0.029	0.046	0.051	0.018	0.000	0.001
GA	0.18	0.167	0.229	0.217	0.273	0.176	0.000	0.007
GEV	0.16	0.164	0.216	0.234	0.304	0.156	0.000	0.007
Sample size (n) = 50
LN	0.22	0.3820	0.4968	0.5210	0.5966	0.4116	0.0102	0.0456	22.4	11.6
WB	0.15	0.0440	0.0294	0.0464	0.0506	0.0184	0.0000	0.0014
GA	0.18	0.1672	0.2286	0.2170	0.2728	0.1764	0.0004	0.0070
GEV	0.16	0.1638	0.2158	0.2340	0.3036	0.1560	0.0000	0.0072
Sample size (n) = 100
LN	0.22	0.3820	0.4968	0.5210	0.5966	0.4116	0.0102	0.0456	22.7	11.6
WB	0.15	0.0440	0.0294	0.0464	0.0506	0.0184	0.0000	0.0014
GA	0.18	0.1672	0.2286	0.2170	0.2728	0.1764	0.0004	0.0070
GEV	0.16	0.1638	0.2158	0.2340	0.3036	0.1560	0.0000	0.0072

Source(s): Table by authors

Here, m denotes the number of GoF tests considered, l represents the number of distributional conditions (blocks), and R_i corresponds to the sum of ranks assigned to each GoF test. Based on the ranking procedure across the distributional block, the calculated Friedman test statistic values (F_r) were 22.6, 22.4 and 22.7, respectively. The critical value for the present case was obtained as 11.6 (where α = 0.05, l = 4, m = 7) from the updated and extended critical values proposed by López-Vázquez and Hochsztain (2019). Since the computed test statistic value exceeds the critical value, it can be inferred that a statistically significant difference exists among the GoF tests with respect to their power values. Given that the significance level (α) was maintained as 0.05 throughout the power analysis, the changes in the power values must come from the sample size (shown in Figure 3) and effect size. The effect size measures how much the observed data deviates from a perfect normal distribution (Cohen, 1992); a detailed discussion of effect size is presented in the subsequent section.

3.6 Assessing the influence of effect size on the power values of selected GoF tests

As the required sample size to arrive at the power value of 1.0 was influenced not only by the specific GoF tests but also by the statistical characteristics of the data set, particularly the CoV (effect size). Hence, to further examine the sensitivity of test methods in detecting deviations from normality, all the selected GoF tests were applied to the original data sets with the induced skewness, thereby varying the effect size. For this purpose, the original data were systematically transferred to introduce a controlled level of skewness and mean shift, thereby enabling detailed evaluation of each test decision-making capability. Skewness was induced using a nonlinear power transformation of the form, Y = X^k, where Y is the transformed random variable, X is the original random variable, and k is the exponent. In this study, values of k = 2 and k = 3 were adopted to capture the sensitivity of different GoF tests under varying effect sizes. Figure 4 illustrates the standard normal probability plots of the original data set and the data sets induced with skewness.

Figure 4

Three graphs showing probability distributions for original and skewed data with varying z values, labelled a, b, and c, illustrating changes in distribution shape.

View large Download slide

The image consists of three graphs, labelled a, b, and c, depicting probability distributions. Graph a presents the original data with a bell shaped curve plotted against z values ranging from negative three point five to positive three point five, with probability on the vertical axis. Graph b illustrates skewed data with a parameter k equal to two, maintaining the z value range and showing a shift in the distribution shape. Graph c further represents skewed data with a parameter k equal to three, again within the same z value range, demonstrating an increased skew to the right. Each graph includes vertical bars representing distribution frequency, with arrows indicating the transformation from original to skewed data. The graphs utilise a consistent scale on both axes for easy comparison.

Standard normal probability plots for: (a) original data set 2 (plastic limit; CoV =15%), (b) skewed data set (k = 2) and (c) skewed data set (k = 3)

Source: Figure by authors

By systematically introducing the controlled skewness into the data set, the CoV (%) of the data sets increased twice for k = 2 and thrice for k = 3 compared to the original data set’s CoV values. Further, For k = 2, the descriptive statistics of original data sets were transformed to: CoV (%) = {43, 30, 15, 23, 88, 56, 12, 33, 198, 315}; Skewness = {0.45, 0.88, 0.87, −0.03, 2.85, 0.94, 0.27, 0.12, 2.98, 4.21}; and Kurtosis = {−0.19, 1.23, 1.56, 0.67, 12.27, 0.57, −1.14, −1.37, 8.49, 19.41}. Likewise, for k = 3, the transformed statistics were: CoV (%) = {62, 46, 23, 34, 153, 84, 18, 47, 276, 392}; Skewness = {0.92, 1.34, 1.16, 0.42, 4.51, 1.49, 0.34, 0.28, 3.44, 5.13}; and Kurtosis = {0.79, 2.33, 2.43, 1.06, 23.83, 1.67, −1.14, −1.18, 11.52, 28.09}.

Consequently, the effect size was quantified using Cohen’s d, expressed as $d = \frac{| μ - μ_{0} |}{s}$ ⁠, where the numerator |μ – μ₀| indicates the difference between the means under the null and alternative hypotheses, and s denotes the pooled standard deviation. Generally, a larger effect size (data exhibiting higher CoV) represents a more pronounced deviation from normality, making it easier to detect non-normality. Conversely, a smaller effect size indicates a subtle deviation, requiring a larger sample size to achieve the same statistical power. The computed values indicated an increase in effect size for data sets with the introduction of skewness (following power transformation). For instance, for data set 5 fitted with LN, the effect size of the original data set was 0.223, which increased to 0.371 and 0.698 for transformation exponents k = 2 and k = 3, respectively, in the transformed data. Figure 5 presents the responses of various GoF tests at different levels of effect size (d) for both the original and transformed data sets. The effect size ranged from 0.08 for the original data sets to approximately 0.7 for the skewed data sets. Across all sample sizes (n = 25, 50, and 100), the S-W test demonstrated superior performance and higher sensitivity to the effect size variations, followed by the A-D and J-B tests. Moreover, the results revealed that increasing the sample size led to a reduction in the effect size, thereby validating the central limit theorem. Even at small effect sizes, a power value of 1.0 was achieved at larger sample sizes, confirming the dependency of test sensitivity on sample size.

Figure 5

Plot showing statistical power across different sample sizes (n) for various tests, including LN, WB, GA, and GEV. Each subplot illustrates power trends based on effect size.

View large Download slide

The image depicts four main sections of statistical power analysis in the context of sample sizes, arranged in a grid format. The first figure, a, represents the log normal distribution, L N, with sample sizes of 25, 50, and 100. Each subplot, such as a 1 and a 2, includes distinct curves for different statistical tests identified by symbols and colours, L L, A D, C S Q, J B, D P, C V M, and S W. The second section, b, illustrates the W B test under similar sample size conditions. Subsequent panels, c and d, showcase the G A and G E V tests respectively, with emphasis on the relationship between power, one minus beta, and effect size, d, across various tests. Each plot includes grid lines for clarity and distinct axes for measuring effect size and power, while sample size indicators guide interpretation. The layout allows for easy comparison between sample settings and their corresponding statistical power outcomes.

Sensitivity of the selected GoF tests illustrated through their power responses across varying sample sizes (n) and effect size conditions

Source: Figure by authors

Furthermore, the results summarised in Table 4 reveal a substantial shift in statistical outcome. For moderately skewed data (k = 2), the A-D, S-W, and LL tests demonstrated higher sensitivity to nonlinear transformation, with several data sets that initially satisfied the normality assumption being rejected once skewness was introduced. In contrast, the CVM tests showed minimal sensitivity under the same conditions, rejecting the normality only for data set 9. As expected, the moment-based test also responded to the transformation, owing to its direct dependence on skewness and kurtosis. With a further increase in the skewness (k = 3), nearly all the GoF became more sensitive in detecting deviations from normality, as shown in Table 4.

3.7 Optimising the minimum sample size required for normality evaluation

It can be seen from Figures 3 and 5 that, even with a sample size of 100, most of the selected GoF did not achieve a power value of 1.0 for most of the data sets. This suggests that evaluating the effectiveness of the GoF solely based on achieving a power value of 1.0 may require an impractically large sample size. Consequently, researchers have emphasised the importance of setting an optimal power threshold to balance statistical reliability and sample size limitations. Cohen (1992) reported that selecting a power value less than 0.8 introduces an unacceptably high risk of committing a Type II error, which can potentially lead to erroneous conclusions in statistical analysis. Conventionally, with a significance level (α) of 0.05, a power of 0.80 results in a Type II error probability (β) of 0.2, yielding a β:α ratio of 4:1 (Cohen, 1992). Moreover, Murphy and Myors (2004) noted that a power level of 0.8 implies that the probability of correctly detecting an effect is four times greater than failing to do so, whereas a power level of 0.9 increases this likelihood to nine times. Considering these statistical principles, the effectiveness of GoF tests in this study was assessed based on the minimum power values of 0.8, ensuring a balance between minimising Type II errors and maintaining a feasible sample size.

Based on the power value of 0.8 and the chosen significance level of 0.05, the minimum sample sizes required for normality assessment are summarised in Table 6. Furthermore, the sample size obtained from the power analysis indicates that the CSQ and CVM tests require substantially larger samples, typically between 115 and 4,400, which explains their inability to identify non-normality for data sets 1–9, given the original data set’s available sample size of 36–50. In contrast, the S-W and J-B tests exhibit higher efficiency, with minimum sample size requirements ranging from 8 to 65 for data sets exhibiting strong deviations (higher variability) from normality (e.g. data sets 9 and 10). However, for data sets 1–8, the required sample sizes fall within the range of 50–2,500, indicating that any normality decision drawn from these limited samples is statistically unreliable. In addition, for LL, A-D and DP tests, the required sample size approximately ranges from 12 to 5,000.

Table 6

Summary of minimum sample size requirements for various GoF tests when assessing normality under different surrogate alternative distributions

Data set	LL	A-D	J-B	DP	S-W	LL	A-D	J-B	DP	S-W
Original data
	LN (α = 0.05, Power = 0.8)					WB (α = 0.05, Power = 0.8)
1	262	144	130	245	120	1135	741	693	1175	625
2	642	414	328	557	309	476	311	274	501	245
3	2511	1667	1222	2071	1213	227	148	134	226	117
4	834	551	422	725	405	300	195	177	307	153
5	70	46	48	73	39	3696	1986	1214	1532	1019
6	133	90	85	134	72	5170	3495	2496	2352	2394
7	3963	2551	1853	3229	1909	180	117	107	174	94
8	472	309	257	416	237	476	312	274	501	245
9	25	19	23	29	17	49	65	41	65	28
10	12	11	11	14	10	13	12	13	15	11
	GA (α = 0.05, Power = 0.8)					GEV (α = 0.05, Power = 0.8)
1	611	390	315	545	287	5045	3025	1745	1875	1485
2	767	956	727	1174	717	941	616	574	998	528
3	5045	3779	2732	4746	2685	5045	3779	2732	4746	2685
4	2050	1313	982	1679	955	2033	1145	794	1146	636
5	180	115	109	182	87	157	94	103	189	70
6	337	215	185	319	161	5432	3173	2416	2420	2284
7	5133	5151	4405	5199	4334	5089	3673	2183	2108	1927
8	1195	721	549	946	528	571	330	287	496	217
9	49	32	39	62	27	22	17	20	22	15
10	14	12	15	20	12	10	9	9	13	8
Transformed data (K = 2)
	LN (α = 0.05, Power = 0.8)					WB (α = 0.05, Power = 0.8)
1	70	45	48	72	39	821	467	395	652	294
2	169	106	99	162	86	6123	3875	2269	2155	2025
3	649	417	335	562	311	582	388	348	618	312
4	218	146	126	205	109	1403	1017	971	1147	896
5	23	17	21	26	14	102	62	72	131	48
6	40	27	32	44	24	200	115	122	227	85
7	1012	646	493	846	486	304	194	176	317	155
8	121	83	79	123	63	6378	3873	2265	2154	2004
9	13	12	13	14	12	14	13	15	19	12
10	9	8	9	13	7	9	8	9	13	7
	GA (α = 0.05, Power = 0.8)					GEV (α = 0.05, Power = 0.8)
1	163	105	100	171	80	376	239	210	378	184
2	380	283	203	357	178	212	135	126	207	107
3	1187	933	722	1164	683	198	134	118	197	104
4	540	340	273	471	251	3411	2987	1255	1156	1217
5	57	37	44	70	31	814	571	509	905	480
6	94	61	67	106	49	115	78	76	109	61
7	1660	1464	1490	1850	1044	1075	943	893	1160	811
8	292	187	159	277	140	5206	3630	2083	2095	1785
9	16	13	17	24	13	12	11	12	14	11
10	11	9	11	14	10	9	7	7	13	7
Transformed data (K = 3)
	LN (α = 0.05, Power = 0.8)					WB (α = 0.05, Power = 0.8)
1	34	24	29	39	22	132	80	89	164	62
2	77	50	52	80	42	500	294	258	755	195
3	288	189	165	270	148	5206	3630	2083	2095	1785
4	101	59	59	90	50	4935	1107	1098	1166	989
5	14	13	14	15	12	35	32	32	48	21
6	22	15	20	24	14	55	35	44	73	30
7	460	286	237	393	218	616	407	366	647	332
8	57	40	43	61	34	498	290	256	445	194
9	11	10	11	13	10	12	11	12	14	10
10	8	7	7	11	6	8	7	7	10	6
	GA (α = 0.05, Power = 0.8)					GEV (α = 0.05, Power = 0.8)
1	78	50	57	90	42	73	51	52	72	43
2	174	109	104	177	83	87	61	60	85	51
3	625	422	336	563	318	123	85	79	123	68
4	249	159	142	244	120	710	469	422	764	389
5	31	22	29	42	20	45	34	36	45	30
6	47	31	38	59	26	36	26	29	36	23
7	1085	665	527	877	485	399	262	232	408	200
8	137	84	86	146	68	339	233	197	351	171
9	13	11	13	15	11	10	8	9	13	8
10	10	7	8	12	8	8	6	6	11	6

Data set	LL	A-D	J-B	DP	S-W	LL	A-D	J-B	DP	S-W
Original data
	LN (α = 0.05, Power = 0.8)					WB (α = 0.05, Power = 0.8)
1	262	144	130	245	120	1135	741	693	1175	625
2	642	414	328	557	309	476	311	274	501	245
3	2511	1667	1222	2071	1213	227	148	134	226	117
4	834	551	422	725	405	300	195	177	307	153
5	70	46	48	73	39	3696	1986	1214	1532	1019
6	133	90	85	134	72	5170	3495	2496	2352	2394
7	3963	2551	1853	3229	1909	180	117	107	174	94
8	472	309	257	416	237	476	312	274	501	245
9	25	19	23	29	17	49	65	41	65	28
10	12	11	11	14	10	13	12	13	15	11
	GA (α = 0.05, Power = 0.8)					GEV (α = 0.05, Power = 0.8)
1	611	390	315	545	287	5045	3025	1745	1875	1485
2	767	956	727	1174	717	941	616	574	998	528
3	5045	3779	2732	4746	2685	5045	3779	2732	4746	2685
4	2050	1313	982	1679	955	2033	1145	794	1146	636
5	180	115	109	182	87	157	94	103	189	70
6	337	215	185	319	161	5432	3173	2416	2420	2284
7	5133	5151	4405	5199	4334	5089	3673	2183	2108	1927
8	1195	721	549	946	528	571	330	287	496	217
9	49	32	39	62	27	22	17	20	22	15
10	14	12	15	20	12	10	9	9	13	8
Transformed data (K = 2)
	LN (α = 0.05, Power = 0.8)					WB (α = 0.05, Power = 0.8)
1	70	45	48	72	39	821	467	395	652	294
2	169	106	99	162	86	6123	3875	2269	2155	2025
3	649	417	335	562	311	582	388	348	618	312
4	218	146	126	205	109	1403	1017	971	1147	896
5	23	17	21	26	14	102	62	72	131	48
6	40	27	32	44	24	200	115	122	227	85
7	1012	646	493	846	486	304	194	176	317	155
8	121	83	79	123	63	6378	3873	2265	2154	2004
9	13	12	13	14	12	14	13	15	19	12
10	9	8	9	13	7	9	8	9	13	7
	GA (α = 0.05, Power = 0.8)					GEV (α = 0.05, Power = 0.8)
1	163	105	100	171	80	376	239	210	378	184
2	380	283	203	357	178	212	135	126	207	107
3	1187	933	722	1164	683	198	134	118	197	104
4	540	340	273	471	251	3411	2987	1255	1156	1217
5	57	37	44	70	31	814	571	509	905	480
6	94	61	67	106	49	115	78	76	109	61
7	1660	1464	1490	1850	1044	1075	943	893	1160	811
8	292	187	159	277	140	5206	3630	2083	2095	1785
9	16	13	17	24	13	12	11	12	14	11
10	11	9	11	14	10	9	7	7	13	7
Transformed data (K = 3)
	LN (α = 0.05, Power = 0.8)					WB (α = 0.05, Power = 0.8)
1	34	24	29	39	22	132	80	89	164	62
2	77	50	52	80	42	500	294	258	755	195
3	288	189	165	270	148	5206	3630	2083	2095	1785
4	101	59	59	90	50	4935	1107	1098	1166	989
5	14	13	14	15	12	35	32	32	48	21
6	22	15	20	24	14	55	35	44	73	30
7	460	286	237	393	218	616	407	366	647	332
8	57	40	43	61	34	498	290	256	445	194
9	11	10	11	13	10	12	11	12	14	10
10	8	7	7	11	6	8	7	7	10	6
	GA (α = 0.05, Power = 0.8)					GEV (α = 0.05, Power = 0.8)
1	78	50	57	90	42	73	51	52	72	43
2	174	109	104	177	83	87	61	60	85	51
3	625	422	336	563	318	123	85	79	123	68
4	249	159	142	244	120	710	469	422	764	389
5	31	22	29	42	20	45	34	36	45	30
6	47	31	38	59	26	36	26	29	36	23
7	1085	665	527	877	485	399	262	232	408	200
8	137	84	86	146	68	339	233	197	351	171
9	13	11	13	15	11	10	8	9	13	8
10	10	7	8	12	8	8	6	6	11	6

Source(s): Table by authors

The above findings are consistent with observations by Yap and Sim (2011), who reported that lognormal alternatives generally require smaller sample sizes to achieve a given power compared to Weibull alternatives. However, while Yap and Sim (2011) reported that achieving a power of 1.0 typically requires sample sizes in the range of 100–2,000 (for different alternative surrogate distributions), the corresponding values obtained in this study are comparatively higher, i.e. approximately 100–5,000. This observed difference is justifiable, as synthetic data sets exhibit a controlled level of variability, enabling deviations from normality to be detected with a minimum number of observations. However, field- or laboratory-derived geotechnical data sets inherently exhibit greater variability and uncertainty, thereby necessitating larger sample sizes to achieve equivalent power levels.

3.8 Establishing the relationship between the coefficient of variation and minimum required sample size

As outlined in the preceding sections, a consistent observation was that data sets with higher variability require a smaller sample size to assess or detect deviations from normality. For instance, in the original data (k = 1), data sets 9 and 10 required only 8–28 samples, whereas data sets 3 and 7 required between 94 and 2,685 samples when assessed using S-W tests. Similarly, increasing k to 2 and 3 (inducing skewness) led to greater variability, which further reduced the sample size for normality assessment. Given the wide disparity in both CoV and sample sizes, a common logarithmic transformation was applied to both independent (predictor) and dependent (response) variables. The resulting log-log relationship between the CoV and the required sample size is illustrated in Figure 6. As depicted in Figure 6, as the CoV increased, the sample size required for detecting non-normality decreased. A strong negative linear correlation was observed for LL, A-D, J-B, DP and S-W GoF tests, with a correlation coefficient of (r = −0.93), indicating the strong linear dependency on CoV. Subsequently, a predictive model was developed to estimate the minimum sample size (n) as a function of CoV. Linear log-log models were fitted for LL, A-D, J-B, DP and S-W tests, expressed as:

lo g_{10} (n_{L L}) = 4.688 - 1.536 l o g_{10} (C o V)

(2)

lo g_{10} (n_{A D}) = 4.437 - 1.473 lo g_{10} (C o V)

(3)

lo g_{10} (n_{J B}) = 4.236 - 1.362 lo g_{10} (C o V)

(4)

lo g_{10} (n_{D P}) = 4.514 - 1.417 lo g_{10} (C o V)

(5)

lo g_{10} (n_{S W}) = 4.220 - 1.397 lo g_{10} (C o V)

(6)

Figure 6

A multi-panel graph displays relationships between log N and log CoV for different categories. Each panel includes data points, trend lines, and confidence limits.

View large Download slide

The image features a multi panel graph presenting the relationship between log N and log C o V across five distinct panels labelled L L, A D, J B, D P, and S W. Each panel contains scatter plots of different data points represented by various symbols, along with fitted regression lines shown in blue. The dashed red lines indicate the upper and lower 95 percent confidence limits for each dataset. Each panel includes an equation indicating the linear relationship, along with an R 2 value to show the goodness of fit. The y axis represents log N and the x axis represents log C o V, with numerical labels spanning from zero to four on the y axis and from zero to three on the x axis. The data is arranged vertically across the five panels, making navigation straightforward from top to bottom.

Relation between CoV and required sample size fitting using a log-log linear regression model

Source: Figure by authors

where the regression coefficient (b₁) varies from −1.362 to −1.536. Since this model was developed from the given sample, it is necessary to verify whether it accurately represents the population model (μ_y = β₀ + β₁X + ε) (Ott and Longnecker, 2015). For this purpose, a hypothesis test was conducted against population regression coefficients (β₁) using this test statistic $t_{s} = (b_{1} - 0) / S E_{b_{1}}$ ⁠. The hypotheses were formulated as: Null Hypothesis (H₀): β₁ = 0 (Mean sample size is not linearly related to CoV)_; and Alternate Hypothesis (H_A): β₁ ≠ 0 (Mean sample size is linearly related to CoV). For all the regression equations (2)–(6), the calculated t_s lies between −25 and −28, with SE_b1 almost equal to 0.06, and was compared against the critical t-value of −2. Since t_s -exceeded critical t-values, the null hypothesis was rejected, indicating that β₁ ≠ 0. This confirms that variation in the sample size can be reliably explained by CoV.

Moreover, the direct use of the log-log equations (2)–(6) is not convenient for the practical prediction of the sample sizes. Accordingly, the back-transformed predictive equations (7)–(11) are suggested for operational application, as they allow direct estimation of the minimum sample size (n) for reliable normality assessment when the CoV is provided in percentage terms:

n_{L}_{L} = 4.88 \times 10^{4} {(C o V)}^{- 1.536}

(7)

n_{A}_{D} = 2.73 \times 10^{4} {(C o V)}^{- 1.473}

(8)

n_{J}_{B} = 1.72 \times 10^{4} {(C o V)}^{- 1.362}

(9)

n_{D}_{P} = 3.27 \times 10^{4} {(C o V)}^{- 1.417}

(10)

n_{S}_{W} = 1.66 \times 10^{4} {(C o V)}^{- 1.397}

(11)

Since the S-W test exhibited higher power at lower sample sizes and performed well with all types of data sets, they are taken into primary consideration in the context of sample size optimisation with CoV. Table 7 presents recommended sample sizes for different CoV levels, providing practical guidance for data collection based on the variability of the data. For CoV values ranging from 10% to 100%, the mean fit approximation indicates that 25–665 samples are adequate to draw conclusions on normality using the S-W tests. However, it is important to note that these values reflect the general trend that higher CoV often coincides with greater skewness. Data sets with similar CoV can still differ in skewness; in such cases, the same estimated sample size acts as a 95% lower confidence bound when skewness is higher, and as a 95% upper confidence bound when skewness is lower, as presented in Table 7. Furthermore, for the J-B, A-D, LL and DP tests, the required sample sizes for 10% ≤ CoV ≤100% were estimated to be 750–30, 920–30, 1400–40 and 1250–50, respectively.

Table 7

Recommended GoF tests and corresponding minimum sample size requirements based on data set CoV

Coefficient of variation	Required sample size (Power ≥ 0.8, α = 0.05)			GoF test
Coefficient of variation	Mean estimate	Lower CB (95%)	Upper CB (95%)	GoF test
< 10%	>665	>334	>1323	S-W
10% ≤ CoV < 30%	665-145	334-64	1323-322
30% ≤ CoV < 50%	145-70	64-30	322-167
50% ≤ CoV < 100%	70-25	30-10	167-68
≥100%	≤25	≤10	≤68
10% ≤ CoV ≤100%	750-30	394-14	1418-78	J-B
	920-30	456-12	1865-81	A-D
	1400-40	715-16	2817-106	LL
	1250-50	634-19	2466-121	DP

Coefficient of variation	Required sample size (Power ≥ 0.8, α = 0.05)			GoF test
Coefficient of variation	Mean estimate	Lower CB (95%)	Upper CB (95%)	GoF test
< 10%	>665	>334	>1323	S-W
10% ≤ CoV < 30%	665-145	334-64	1323-322
30% ≤ CoV < 50%	145-70	64-30	322-167
50% ≤ CoV < 100%	70-25	30-10	167-68
≥100%	≤25	≤10	≤68
10% ≤ CoV ≤100%	750-30	394-14	1418-78	J-B
	920-30	456-12	1865-81	A-D
	1400-40	715-16	2817-106	LL
	1250-50	634-19	2466-121	DP

Note(s):

CB = confidence bound

Source(s): Table by authors

4. Recommendations on the use of a specific GoF test for normality assessment

In statistical analysis, sample size plays a crucial role in obtaining reliable point estimates of the parameters of interest, which should satisfy key statistical properties such as unbiasedness, consistency, efficiency and sufficiency (Haldar and Mahadevan, 2000; Ramachandran and Tsokos, 2009). In particular, sufficiency refers to the ability of the samples to capture all available information about the population parameter (e.g. shear strength, permeability, CBR, Atterberg limits). The present study found that the S-W performed well when the sample size ranged from 665 to 145 and the sample CoV was between 10% and 30%. However, when normality was assessed with a sample size of 20 or 30, these tests still indicated normality, even for data that exhibited non-normality. Therefore, acceptance of normality at such a small sample size should not be taken as evidence that the sample adequately represents the population.

In geotechnical quality control, standard statistical tools include confidence interval estimation, control charts (e.g. range charts, standard deviation charts and average charts), and hypothesis testing. When quality control measures focus on the mean (point estimate), decisions heavily rely on the central tendency of the data. As such, data sets with 35 samples and 20 samples may yield different statistical conclusions. In cases where the mean is a critical parameter, the LL test can be used, particularly for assessing normality, as it evaluates the overall agreement between the empirical distribution and the theoretical distribution, with sensitivity to deviations in the central portion of the distribution.

Conversely, in reliability-based analysis of the geotechnical structures, where the probability of failure is often governed by the most critical values distributed in the tail region, the A-D test is generally more appropriate. Its increased sensitivity to deviations in the tails enhances the accuracy of assessing whether critical geotechnical data conform to assumed probability models. In the present study, for example, the nonlinear power transformation introduced skewness primarily in the tail region, which was effectively detected by the A-D test (Table 4): skewed data (k = 2 and k = 3), whereas the LL and CVM tests were least sensitive to capture the deviations. Furthermore, for the chi-square test, there are mixed conclusions regarding the use of chi-square tests for assessing univariate distributions. According to Thode (2002), the chi-square test is not recommended because it is challenging to implement compared to other tests. Moreover, this test may not be attractive to beginners, as there is a high chance of incorrectly rejecting the normality assumption due to computational procedures. In contrast, the DP (sample size = 50–1,250) and J-B (sample size = 30–750) tests are simpler to apply and may serve effectively as complementary tests.

5. Summary and conclusions

The performance of selected GoF tests in assessing the normality of experimentally derived geotechnical data, including percentage of fines, liquid limit, plastic limit, dry density, moisture content, CBR, unconfined compressive strength and permeability, was evaluated using power analysis. Based on statistical estimates and accepted marginal distributions, a Monte Carlo simulation was used to generate synthetic random samples for assessing the normality and accurate power of the selected GoF test. The analysis provides insights into the sensitivity and reliability of these tests when applied to geotechnical data sets with diverse statistical characteristics. Based on the study, the following conclusions are drawn:

The initial findings from the power analysis indicate that when the available sample size is smaller than the minimum required, the GoF tests may still accept normality (e.g. data set 1) due to lack of statistical power, thereby increasing the risk of Type II error, i.e. falsely accepting normality. Conversely, rejecting normality under the same smaller sample size conditions (e.g. data set 5) remains significant, as the conclusion reflects a detectable deviation from the normality. Since Type I error is controlled at α = 0.05, rejection under reduced power provides reliable evidence against the null hypothesis of normality.
For the original data sets, the selected GoF tests achieved a power value of 1.0 at sample sizes ranging from 25 to over 2,000. This considerable variation in the sample size is mainly attributable to the use of different surrogate distributions (LN, WB, GA and GEV) in the alternative hypothesis during the power analysis. Additional analyses indicated that the power values were not only governed by sample size but also by the effect size. Among all the selected GoF tests, the S-W, A-D and J-B tests demonstrated the highest predictive capacity to detect non-normality and required comparatively smaller sample sizes. Notably, the minimum sample sizes derived for field-based geotechnical data sets were considerably larger than those typically reported for synthetic data sets, reflecting the higher inherent variability and uncertainty associated with real-world soil data.
The minimum sample size required to achieve the desired statistical power value of 0.8 at a chosen significance level of 0.05 was determined for each GoF test. Subsequent analyses were conducted to examine the relationship between optimum sample size and the data’s CoV. A clear negative correlation (r = −0.93) was observed, indicating that higher variability reduces the number of sample sizes required to detect deviations from normality. Using the fitted linear log-log regression model, the estimated sample sizes required to detect non-normality reliably were approximately 665-25 for S-W, 750-30 for J-B, 920-30 for A-D, 1400-40 for LL, and 1250-50 for DP, applicable within the CoV range of 10%–100%.
Furthermore, the J-B and DP tests, a moment-based method not frequently used in geotechnical data analysis, also exhibited strong performance, followed by the S-W and A-D tests. The LL test exhibited moderate sensitivity and required a slightly higher sample size than moment-based methods. The CSQ and CVM were found to be more effective for larger sample sizes, further reinforcing the importance of sample size considerations in statistical assessments of geotechnical properties.
Notably, the GoF used for normality assessment requires only the key statistical parameters such as mean, skewness, kurtosis and CoV. As such, the performance and reliability of these tests are influenced primarily by the sample size and statistical characteristics of the geotechnical data. The findings of this study are therefore applicable not only to the data sets analysed but also to other non-geotechnical data sets with CoV ranging from 10% to 100%.

Acknowledgements

The authors thank the anonymous reviewers for their valuable comments and suggestions.

References

Akbas

,

S.O.

and

Kulhawy

,

F.H.

(

2010

), “

Characterization and estimation of geotechnical variability in Ankara clay: a case history

”,

Geotechnical and Geological Engineering

, Vol.

28

No.

5

, pp.

619

-

631

, doi:

https://doi.org/10.1007/s10706-010-9320-x

.

Google Scholar

Crossref

Anderson

,

T.W.

and

Darling

,

D.A.

(

1954

), “

A test of goodness of fit

”,

Journal of the American Statistical Association

, Vol.

49

No.

268

, pp.

765

-

769

, doi:

https://doi.org/10.1080/01621459.1954.10501232

.

Google Scholar

Crossref

Ang

,

A.H.S.

and

Tang

,

W.H.

(

2007

),

Probability Concepts in Engineering: Emphasis on Applications to Civil and Environmental Engineering

,

John Willey and Sons

,

Hoboken, NJ

.

Google Scholar

Arnastauskaite

,

J.

,

Ruzgas

,

T.

and

Brazenas

,

M.

(

2021

), “

An exhaustive power comparison of normality tests

”,

Mathematics

, Vol.

9

No.

7

, p.

788

, doi:

https://doi.org/10.3390/math9070788

.

Google Scholar

Crossref

ASTM D1557

(

2015

), “

Standard test methods for laboratory compaction characteristics using modified effort [56,000 ft-lbf/ft3 (2,700 kN-m/m3)]

”,

West Conshohocken, PA

.

ASTM D1883

(

1999

),

Standard Test Methods for CBR (CA Bearing Ratio) of Laboratory Compacted Soils

,

ASTM International

,

West Conshohocken, PA

.

ASTM D2166

(

2016

), “

Standard test method for unconfined compressive strength of cohesive soil

”,

West Conshohocken, PA

.

ASTM D2487

(

2017

), “

Standard practice for classification of soils for engineering purposes (unified soil classification system)

”,

West Conshohocken, PA

.

ASTM D4318

(

2017

), “

Standard test methods for liquid limit, plastic limit, and plasticity index of soils

”,

West Conshohocken, PA

.

ASTM D5084

(

2016

), “

Standard test methods for measurement of hydraulic conductivity of saturated porous materials using a flexible wall permeameter

”,

West Conshohocken, PA

.

ASTM D698

(

2021

), “

Standard test methods for laboratory compaction characteristics using standard effort [12,400 ft-lbf/ft3 (600 kN-m/m3)]

”,

West Conshohocken, PA

.

Baecher

,

G.B.

and

Christian

,

J.T.

(

2005

),

Reliability and Statistics in Geotechnical Engineering

,

Wiley

,

Hoboken, NJ

.

Google Scholar

BenSaïda

,

A.

(

2025

a), “

Shapiro-Wilk and Shapiro-Francia normality tests”, MATLAB Central file exchange

”,

available at:

Shapiro-Wilk and Shapiro-Francia normality tests”, MATLAB Central file exchangeLink to the cited article.

Google Scholar

BenSaïda

,

A.

(

2025

b), “

Cramer-von mises test”, MATLAB Central file exchange

”,

available at:

Cramer-von mises test”, MATLAB Central file exchangeLink to the cited article.

Google Scholar

Benson

,

C.H.

(

1993

), “

Probability distributions for hydraulic conductivity of compacted soil liners

”,

Journal of Geotechnical Engineering

, Vol.

119

No.

3

, pp.

471

-

486

, doi:

https://doi.org/10.1061/(ASCE)0733-9410(1993)119:3(471)

.

Google Scholar

Crossref

Burati

,

J.L.

,

Weed

,

R.M.

,

Hughes

,

C.S.

and

Hill

,

H.S.

(

2003

), “

Optimal procedures for quality assurance specifications (No. FHWA-RD-02-095

)”,

Turner-Fairbank Highway Research Center

.

Google Scholar

Cherubini

,

C.

and

Vessia

,

G.

(

2010

), “

Reliability-based pile design in sandy soils by CPT measurements

”,

Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards

, Vol.

4

No.

1

, pp.

2

-

12

, doi:

https://doi.org/10.1080/17499510902798156

.

Google Scholar

Crossref

Cohen

,

J.

(

1992

), “

A power primer: quantitative methods in psychology

”,

Psychological Bulletin

, Vol.

112

No.

1

, pp.

155

-

159

.

Google Scholar

Crossref

PubMed

Corder

,

G.W.

and

Foreman

,

D.I.

(

2014

),

Nonparametric statistics: A step-bystep approach

,

John Wiley & Sons, Hoboken

,

New Jersey

.

Google Scholar

Cramer

,

H.

(

1928

), “

On the composition of elementary errors

”,

Scandinavian Actuarial Journal

, Vol.

1928

No.

1

, pp.

13

-

74

.

Google Scholar

Crossref

D’Agostino

,

R.

and

Pearson

,

E.S.

(

1973

), “

Tests for departure from normality. Empirical results for the distributions of b 2 and √b1

”,

Biometrika

, Vol.

60

No.

3

, pp.

613

-

622

, doi:

https://doi.org/10.1093/biomet/60.3.613

.

Google Scholar

Crossref

D’Agostino

,

R.B.

and

Stephens

,

M.A.

(

1986

),

Goodness-of-Fit Techniques

,

Marcel Dekker

,

New York, NY

.

Google Scholar

Elkateb

,

T.

,

Chalaturnyk

,

R.

and

Robertson

,

P.K.

(

2003

), “

An overview of soil heterogeneity: quantification and implications on geotechnical field problems

”,

Canadian Geotechnical Journal

, Vol.

40

No.

1

, pp.

1

-

15

, doi:

https://doi.org/10.1139/t02-090

.

Google Scholar

Crossref

Fenton

,

G.A.

and

Griffiths

,

D.V.

(

2008

),

Risk Assessment in Geotechnical Engineering

,

John Wiley and Sons

,

New York, NY

.

Google Scholar

Crossref

Galeandro

,

A.

,

Doglioni

,

A.

and

Simeone

,

V.

(

2017

), “

Statistical analyses of inherent variability of soil strength and effects on engineering geology design

”,

Bulletin of Engineering Geology and the Environment

, Vol.

76

No.

2

, pp.

587

-

600

, doi:

https://doi.org/10.1007/s10064-016-0859-5

.

Google Scholar

Crossref

Ganesh

,

K.S.S.

and

Kumar

,

T.A.

(

2024

), “

Statistical assessment of liquid limit of pond Ash-Based processed geomaterial

”,

Proceedings of the 2nd International Conference on Geotechnical Issues in Energy, Infrastructure and Disaster Management

.

Patna, India

, pp.

461

-

469

, doi:

https://doi.org/10.1007/978-981-97-1757-6_33

Google Scholar

Ganesh

,

K.S.S.

and

Kumar

,

T.A.

(

2025

), “

Statistical analysis of pond Ash-Bentonite mixtures: implications for liner and cover material design

”,

Quarterly Journal of Engineering Geology and Hydrogeology

, Vol.

58

No.

2

, doi:

https://doi.org/10.1144/qjegh2024-175

.

Google Scholar

Ghasemi

,

A.

and

Zahediasl

,

S.

(

2012

), “

Normality tests for statistical analysis: a guide for non-statisticians

”,

International Journal of Endocrinology and Metabolism

, Vol.

10

No No.

2

, pp.

486

-

489

, doi:

https://doi.org/10.5812/ijem.3505

.

Google Scholar

Crossref

PubMed

Grubbs

,

F.E.

(

1969

), “

Procedures for detecting outlying observations in samples

”,

Technometrics

, Vol.

11

No.

1

, pp.

1

-

21

, doi:

https://doi.org/10.1080/00401706.1969.10490657

.

Google Scholar

Crossref

Haldar

,

A.

and

Mahadevan

,

S.

(

2000

),

Probability, Reliability and Statistical Methods in Engineering Design

,

Wiley

,

New York, NY

.

Google Scholar

Harle

,

S.M.

and

Wankhade

,

R.L.

(

2025

), “

Machine learning techniques for predictive modelling in geotechnical engineering: a succinct review

”,

Discover Civil Engineering

, Vol.

2

No.

1

, pp.

1

-

21

, doi:

https://doi.org/10.1007/s44290-025-00224-w

.

Google Scholar

Crossref

Harr

,

M.E.

(

1987

),

Reliability-Based Design in Civil Engineering

,

McGraw-Hill

,

New York, NY

.

Google Scholar

Henderson

,

A.R.

(

2006

), “

Testing experimental data for univariate normality

”,

Clinica Chimica Acta

, Vol.

366

Nos

1-2

, pp.

112

-

129

, doi:

https://doi.org/10.1016/j.cca.2005.11.007

.

Google Scholar

Crossref

IRC SP-11

(

2012

),

Handbook of Quality Control for Construction of Roads and Runways

,

Indian Road Congress

,

New Delhi, India

.

Jarque

,

C.M.

and

Bera

,

A.K.

(

1987

), “

A test for normality of observations and regression residuals

”,

International Statistical Review / Revue Internationale de Statistique

, Vol.

55

No.

2

, pp.

163

-

172

, doi:

https://doi.org/10.2307/1403192

.

Google Scholar

Crossref

Kumar

,

T.A.

,

Saseendran

,

R.

and

Sundaravel

,

V.

(

2023

), “

Engineering characterization of intermediate geomaterials-A review

”,

Geomechanics and Engineering

, Vol.

33

No.

5

, pp.

453

-

462

, doi:

https://doi.org/10.12989/gae.2023.33.5.453

.

Google Scholar

Lee

,

S.

,

Lee

,

G.

and

Jeon

,

G.

(

2019

), “

Statistical approaches based on deep learning regression for verification of normality of blood pressure estimates

”,

Sensors

, Vol.

19

No.

9

, p.

2137

, doi:

https://doi.org/10.3390/s19092137

.

Google Scholar

Crossref

PubMed

Li

,

Z.

,

Gong

,

W.

,

Li

,

T.

,

Juang

,

C.H.

,

Chen

,

J.

and

Wang

,

L.

(

2021

), “

Probabilistic back analysis for improved reliability of geotechnical predictions considering parameters uncertainty, model bias, and observation error

”,

Tunnelling and Underground Space Technology

, Vol.

115

, p.

104051

.

Google Scholar

Crossref

Liu

,

H.

,

Su

,

H.

,

Sun

,

L.

and

Dias-da-Costa

,

D.

(

2024

), “

State-of-the-art review on the use of AI-enhanced computational mechanics in geotechnical engineering

”,

Artificial Intelligence Review

, Vol.

57

No.

8

, p.

196

, doi:

https://doi.org/10.1007/s10462-024-10836-w

.

Google Scholar

Crossref

Lilliefors

,

H.W.

(

1967

), “

On the Kolmogorov-Smirnov test for normality with mean and variance unknown

”,

Journal of the American Statistical Association

, Vol.

62

No.

318

, pp.

399

-

402

, doi:

https://doi.org/10.1080/01621459.1967.10482916

.

Google Scholar

Crossref

López-Vázquez

,

C.

and

Hochsztain

,

E.

(

2019

), “

Extended and updated tables for the Friedman rank test

”,

Communications in Statistics - Theory and Methods

, Vol.

48

No.

2

, pp.

268

-

281

.

Google Scholar

Crossref

Montgomery

,

D.C.

and

Runger

,

G.C.

(

2010

),

Applied Statistics and Probability for Engineers

,

John Wiley and Sons

,

New York, NY

.

Google Scholar

Murphy

,

K.R.

and

Myors

,

B.

(

2004

),

Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests

,

Lawrence Erlbaum Associates, Inc

.,

Mahwah, NJ

, doi:

https://doi.org/10.4324/9780203843093

.

Google Scholar

Crossref

Newell

,

K.M.

and

Hancock

,

P.A.

(

1984

), “

Forgotten moments: a note on skewness and kurtosis as influential factors in inferences extrapolated from response distributions

”,

Journal of Motor Behavior

, Vol.

16

No.

3

, pp.

320

-

335

, doi:

https://doi.org/10.1080/00222895.1984.10735324

.

Google Scholar

Crossref

PubMed

Nguyen

,

T.S.

,

Ngamcharoen

,

K.

and

Likitlersuang

,

S.

(

2023

), “

Statistical characterisation of the geotechnical properties of Bangkok subsoil

”,

Geotechnical and Geological Engineering

, Vol.

41

No.

3

, pp.

2043

-

2063

, doi:

https://doi.org/10.1007/s10706-023-02390-z

.

Google Scholar

Crossref

Nowak

,

A.S.

and

Collins

,

K.R.

(

2000

), “

Reliability of structures

”,

CRC press

,

London

.

Google Scholar

Ott

,

R.L.

and

Longnecker

,

M.

(

2015

),

An Introduction to Statistical Methods and Data Analysis

,

Cengage Learning Inc

,

Boston, USA

.

Google Scholar

Öztuna

,

D.

,

Elhan

,

A.H.

and

Tüccar

,

E.

(

2006

), “

Investigation of four different normality tests in terms of type 1 error rate and power under different distributions

”,

Turkish Journal of Medical Sciences

, Vol.

36

No.

3

, pp.

171

-

176

,

available at:

Investigation of four different normality tests in terms of type 1 error rate and power under different distributionsLink to the cited article.

Google Scholar

Pearson

,

K.

(

1900

), “

On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonable supposed to have arisen from random sampling

”,

Philosophical Magazine

, Vol.

50

No.

302

, p.

157

.

Google Scholar

Phoon

,

K.K.

and

Ching

,

J.

(Eds.) (

2018

),

Risk and Reliability in Geotechnical Engineering

,

CRC Press

,

Boca Raton, USA

.

Google Scholar

Crossref

Phoon

,

K.K.

and

Kulhawy

,

F.H.

(

1999

), “

Characterization of geotechnical variability

”,

Canadian Geotechnical Journal

, Vol.

36

No.

4

, pp.

612

-

624

, doi:

https://doi.org/10.1139/t99-038

.

Google Scholar

Crossref

Phoon

,

K.K.

,

Kulhawy

,

F.H.

and

Grigoriu

,

M.D.

(

1995

), “

Reliability-based design of foundations for transmission line structures

”,

Electric Power Research Institute (EPRI TR-105000), Research Project 1493-04

,

Cornell University

,

New York, NY

.

Google Scholar

Phoon

,

K.K.

,

Shuku

,

T.

and

Ching

,

J.

(

2023

),

Uncertainty, Modeling, and Decision Making in Geotechnics

,

CRC Press

,

Boca Raton, USA

.

Google Scholar

Crossref

Razali

,

N.M.

and

Wah

,

Y.B.

(

2010

), “

Power comparisons of some selected normality tests

”,

Proceedings of the Regional Conference on Statistical Sciences

, pp.

126

-

38

.

Google Scholar

Ramachandran

,

K.M.

and

Tsokos

,

C.P.

(

2009

),

Mathematical Statistics with Applications in R

,

Elsevier Academic Press

,

USA

.

Google Scholar

Saseendran

,

R.

and

Dodagoudar

,

G.R.

(

2020

), “

Reliability analysis of slopes stabilised with piles using response surface method

”,

Geomechanics and Engineering

, Vol.

21

No

6

, pp.

513

-

525

.

Google Scholar

Shapiro

,

S.S.

and

Wilk

,

M.B.

(

1965

), “

An analysis of variance test for normality

”,

Biometrika

, Vol.

52

Nos

3-4

, pp.

591

-

611

, doi:

https://doi.org/10.1093/biomet/52.3-4.591

.

Google Scholar

Crossref

Sigut

,

J.

,

Piñeiro

,

J.

,

Estévez

,

J.

and

Toledo

,

P.

(

2006

), “

A neural network approach to normality testing

”,

Intelligent Data Analysis

, Vol.

10

No.

6

, pp.

509

-

519

, doi:

https://doi.org/10.3233/IDA-2006-10603

.

Google Scholar

Crossref

Simić

,

M.

(

2021

), “

Testing for normality with neural networks

”,

Neural Computing and Applications

, Vol.

33

No.

23

, pp.

16279

-

16313

, doi:

https://doi.org/10.1007/s00521-021-06229-7

.

Google Scholar

Crossref

Somani

,

M.

,

Datta

,

M.

,

Ramana

,

G.V.

and

Sreekrishnan

,

T.R.

(

2019

), “

Leachate characteristics of aged soil-like material from MSW dumps: sustainability of landfill mining

”,

Journal of Hazardous, Toxic, and Radioactive Waste

, Vol.

23

No.

4

, p.

4019014

, doi:

https://doi.org/10.1061/(ASCE)HZ.2153-5515.0000452

.

Google Scholar

Crossref

Steele

,

C.M.

(

2003

), “

The power of categorical Goodness-Of-Fit statistics

”, (Ph.D. Dissertation,

Griffith University)

,

Victoria, Australia

, doi:

https://doi.org/10.25904/1912/1481

.

Google Scholar

Stephens

,

M.A.

(

1974

), “

EDF statistics for goodness of fit and some comparisons

”,

Journal of the American Statistical Association

, Vol.

69

No.

347

, pp.

730

-

737

, doi:

https://doi.org/10.1080/01621459.1974.10480196

.

Google Scholar

Crossref

Smirnov

,

N.V.

(

1936

), “

Sui la distribution de w2 (Criterium de M.R.v. Mises)”, C.R. (Paris

),

202

, pp.

449

-

452

.

Google Scholar

Tang

,

X.S.

,

Li

,

D.Q.

,

Cao

,

Z.J.

and

Phoon

,

K.K.

(

2017

), “

Impact of sample size on geotechnical probabilistic model identification

”,

Computers and Geotechnics

, Vol.

87

, pp.

229

-

240

, doi:

https://doi.org/10.1016/j.compgeo.2017.02.019

.

Google Scholar

Crossref

Theocharis

,

A.I.

,

Zevgolis

,

I.E.

,

Roumpos

,

C.

and

Koukouzas

,

N.C.

(

2024

), “

Probability distributions of geotechnical properties for heterogeneous lignite mine spoils

”,

International Journal of Geotechnical Engineering

, Vol.

18

No.

5

, pp.

528

-

536

, doi:

https://doi.org/10.1080/19386362.2024.2398328

.

Google Scholar

Crossref

Thode

,

H.C.

(

2002

),

Testing for Normality

,

Marcel Dekker, Inc

,.,

New York, NY

.

Google Scholar

Crossref

Toraldo

,

C.

,

Modoni

,

G.

,

Ochmański

,

M.

and

Croce

,

P.

(

2018

), “

The characteristic strength of jet-grouted material

”,

Géotechnique

, Vol.

68

No.

3

, pp.

262

-

279

, doi:

https://doi.org/10.1680/jgeot.16.P.320

.

Google Scholar

Crossref

Trujillo-Ortiz

,

A.

(

2025

), “

“DagosPtest”, MATLAB Central file exchange

”,

available at:

“Link to the cited article.

Google Scholar

Uttley

,

J.

(

2019

), “

Power analysis, sample size, and assessment of statistical assumptions—improving the evidential value of lighting research

”,

LEUKOS

, Vol.

15

Nos

2-3

, pp.

143

-

162

, doi:

https://doi.org/10.1080/15502724.2018.1533851

.

Google Scholar

Crossref

Uzielli

,

M.

,

Lacasse

,

S.

,

Nadim

,

F.

and

Phoon

,

K.K.

(

2007

), “

Soil variability analysis for geotechnical practice

”,

Proceedings of the 2nd International Workshop on Characterisation and Engineering Properties of Natural Soils

,

Taylor and Francis

,

London

, pp.

1653

-

1752

.

Google Scholar

Wang

,

Y.

,

Zhao

,

T.

and

Cao

,

Z.

(

2015

), “

Site-specific probability distribution of geotechnical properties

”,

Computers and Geotechnics

, Vol.

70

, pp.

159

-

168

, doi:

https://doi.org/10.1016/j.compgeo.2015.08.002

.

Google Scholar

Crossref

Yazici

,

B.

and

Yolacan

,

S.

(

2007

), “

A comparison of various tests of normality

”,

Journal of Statistical Computation and Simulation

, Vol.

77

No.

2

, pp.

175

-

183

, doi:

https://doi.org/10.1080/10629360600678310

.

Google Scholar

Crossref

Yap

,

B.W.

and

Sim

,

C.H.

(

2011

), “

Comparisons of various types of normality tests

”,

Journal of Statistical Computation and Simulation

, Vol.

81

No.

12

, pp.

2141

-

2155

, doi:

https://doi.org/10.1080/00949655.2010.520163

.

Google Scholar

Crossref

Zhai

,

H.

and

Benson

,

C.H.

(

2006

), “

The log-normal distribution for hydraulic conductivity of compacted clays: two or three parameters?

”,

Geotechnical and Geological Engineering

, Vol.

24

No.

5

, pp.

1149

-

1162

, doi:

https://doi.org/10.1007/s10706-005-1135-9

.

Google Scholar

Crossref

Zhao

,

H.F.

,

Zhang

,

L.M.

,

Xu

,

Y.

and

Chang

,

D.S.

(

2013

), “

Variability of geotechnical properties of a fresh landslide soil deposit

”,

Engineering Geology

, Vol.

166

, pp.

1

-

10

, doi:

https://doi.org/10.1016/j.enggeo.2013.08.006

.

Google Scholar

Crossref

Zbiciak

,

A.

,

Volchok

,

D.

,

Kozyra

,

Z.

,

Michalczyk

,

R.

and

Al Garssi

,

N.

(

2025

), “

Fuzzy-Modulus-Based layered elastic analysis of asphalt pavements for enhanced fatigue life prediction

”,

Materials

, Vol.

18

No.

13

, p.

3034

, doi:

https://doi.org/10.3390/ma18133034

.

Google Scholar

Crossref

PubMed

Optimising sample size for normality testing in geotechnical data using power analysis

1. Introduction

2. Methodology

2.1 Experimentally derived geotechnical data

2.2 Selected goodness-of-fit test for normality evaluation

2.3 Protocol to assess the efficacy of the selected GoF tests using power analysis

3. Results and discussion

3.1 Index and engineering properties of natural/processed geomaterial assessed for normality

3.2 Descriptive statistics of the geotechnical data used for the analysis

3.3 Normality outcomes for original data sets and necessity of conducting power analysis

3.4 Identification of best-fit distributions and selection of surrogate models for power analysis

3.5 Power evaluation of selected GoF test through Monte Carlo simulation

3.6 Assessing the influence of effect size on the power values of selected GoF tests

3.7 Optimising the minimum sample size required for normality evaluation

3.8 Establishing the relationship between the coefficient of variation and minimum required sample size

4. Recommendations on the use of a specific GoF test for normality assessment

5. Summary and conclusions

Acknowledgements

References

Further reading

Email Alerts

Cited By

Optimising sample size for normality testing in geotechnical data using power analysis Open Access

1. Introduction

2. Methodology

2.1 Experimentally derived geotechnical data

2.2 Selected goodness-of-fit test for normality evaluation

2.3 Protocol to assess the efficacy of the selected GoF tests using power analysis

3. Results and discussion

3.1 Index and engineering properties of natural/processed geomaterial assessed for normality

3.2 Descriptive statistics of the geotechnical data used for the analysis

3.3 Normality outcomes for original data sets and necessity of conducting power analysis

3.4 Identification of best-fit distributions and selection of surrogate models for power analysis

3.5 Power evaluation of selected GoF test through Monte Carlo simulation

3.6 Assessing the influence of effect size on the power values of selected GoF tests

3.7 Optimising the minimum sample size required for normality evaluation

3.8 Establishing the relationship between the coefficient of variation and minimum required sample size

4. Recommendations on the use of a specific GoF test for normality assessment

5. Summary and conclusions

Acknowledgements

References

Further reading

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable

Optimising sample size for normality testing in geotechnical data using power analysis