Skip to Main Content
Purpose

This study aims to examine the performance of various goodness-of-fit (GoF) tests in assessing the normality of the data set, a crucial step in conducting probabilistic analyses in the geotechnical domain. The evaluation focuses on the efficacy of these tests when applied to small sample sizes and data sets with varying coefficient of variation (CoV). Identifying the most efficient GoF test based on the statistical characteristics of the data can enhance the reliability of results and minimise the risk of misleading conclusions.

Design/methodology/approach

Multiple GoF tests, including Shapiro–Wilk (S-W), Lilliefors (LL), Anderson–Darling (A-D), Jarque–Bera (J-B), chi-square (CSQ), Cramér–von Mises (CVM) and D’Agostino and Pearson Omnibus (DP) tests, were used for normality assessment. A computational power analysis was performed through Monte Carlo random sample simulation to determine the optimal sample size required to achieve the desired statistical power. Furthermore, the performance and sensitivity of each GoF test were assessed systematically by varying the sample size and effect size (d) to establish the relationship between the minimum required sample size and CoV.

Findings

Power analysis revealed that the S-W tests demonstrated higher effectiveness in detecting normality, followed by the A-D, J-B, LL and DP tests. The degree of skewness and CoV in the data sets plays a crucial role in optimising the sample size requirements. For S-W tests, the minimum required sample sizes varies with the CoV as follows: (a) CoV < 10% requires at least 665 samples, (b) 10% ≤ CoV < 30% requires 665–145 samples, (c) 30% ≤ CoV < 50% requires 145–70, (d) 50%≤ CoV < 100% requires 70–25 samples, and (e) CoV ≥ 100% requires at most 25 samples. In comparison, CSQ and CVM tests demand substantially larger minimum sample sizes, ranging approximately between 115 and 4,400.

Originality/value

This study presents a comparative analysis of GoF tests applied to geotechnical data, determining the required sample size through power analysis, with a target statistical power of 0.8 at a chosen significance level of 0.05. These findings provide practical guidance for selecting appropriate normality tests and the required minimum sample size for geotechnical data with varying CoV.

In civil engineering practice, the uncertain conditions are not uncommon and lead to complexities in the design due to factors such as material heterogeneity, characterisation methods, construction practices and other environmental factors (Phoon and Kulhawy, 1999; Baecher and Christian, 2005; IRC SP-11, 2012; Zbiciak et al., 2025). Particularly in the geotechnical domain, the stochastic nature of geomaterial formation, in-situ stress history, long-term exposure to various climatic and environmental conditions, and weathering result in inherent heterogeneity and spatial variability (Phoon and Kulhawy, 1999; Fenton and Griffiths, 2008; Cherubini and Vessia, 2010; Wang et al., 2015; Liu et al., 2024). Furthermore, the data collection introduces additional challenges, including sampling bias, limited accessibility and variations in testing methods, which compound measurement errors and data inconsistencies (Akbas and Kulhawy, 2010; Zhao et al., 2013; Li et al., 2021; Kumar et al., 2023). Therefore, growing interest towards the integration of variability necessitates the application of statistical methods in geotechnical design to accurately represent soil behaviour, quality control levels and performance assessment (Fenton and Griffiths, 2008; Phoon and Ching, 2018; Harle and Wankhade, 2025).

Fitting an appropriate probability model (using inferential statistics) is one way to model the uncertainty in the parameter of interest. For instance, normal and lognormal distributions are commonly used in science and engineering practice due to their ability to represent a wide range of inherent variability, making them easy to implement for practical applications (Theocharis et al., 2024). In the context of geotechnical practice, the index and engineering properties like Atterberg limits, compaction characteristics, compressibility parameters, hydraulic conductivity, degree of saturation, shear strength parameter and so on, are often fitted to either normal or lognormal distribution (Elkateb et al., 2003; Baecher and Christian, 2005; Zhai and Benson, 2006; Wang et al., 2015; Galeandro et al., 2017; Toraldo et al., 2018; Nguyen et al., 2023; Theocharis et al., 2024; Ganesh and Kumar, 2025). Several reliability-based approaches assume that geotechnical parameters such as shear strength, consolidation characteristics and permeability follow a normal or lognormal distribution (Phoon et al., 1995; Nowak and Collins, 2000; Baecher and Christian, 2005; Fenton and Griffiths, 2008; Saseendran and Dodagoudar, 2020). However, an incorrect selection of the underlying probability model can distort reliability estimates, leading to either underestimation or overestimation of the probability of failure.

Furthermore, the commonly employed statistical techniques in material quality control include parametric tests such as one-sample t-tests, two-sample t-tests, analysis of variance (ANOVA) and the F-test of homogeneity of variance, regression and correlation, which are routinely applied in practice (Burati et al., 2003; IRC SP-11, 2012; Ganesh and Kumar, 2024). A fundamental assumption underlying these parametric tests is that the sampled data originates from a normally (Gaussian) distributed population (Montgomery and Runger, 2010; Ott and Longnecker, 2015). However, recent observations by Phoon et al. (2023) highlight growing concerns among field practitioners, who insist that many statistical models assume that geotechnical data are homogenous, independent, sufficient and normally distributed, which makes analysis unrealistic. This gap between the statistical assumptions and field behaviour underscores the importance of conducting appropriate goodness-of-fit (GoF) tests.

Normality assessment methods can be broadly categorised into graphical approaches, numerical measures and formal statistical tests (D’Agostino and Stephens, 1986; Thode, 2002; Henderson, 2006; Razali and Wah, 2010). In recent years, researchers have also explored the use of machine learning algorithms as complementary tools in assessing the distributional characteristics of the data. Studies by Sigut et al. (2006), Lee et al. (2019) and Simić (2021) demonstrated that neural network and deep learning models can effectively identify non-normal patterns in complex data sets. However, there remains a lack of comprehensive studies that explicitly examine the normality characteristics of laboratory/field-derived geotechnical data. Given the proven reliability and widespread adoption of formal statistical tests, this study focuses on the application of the most frequently cited and commonly adopted GoF tests within the geotechnical domain, as discussed in detail in the following paragraph.

The formal statistical tests (also known GoF test) can be classified into different categories based on their underlying statistical principles:

  • empirical distribution test (Anderson–Darling (A-D) test, Kolmogorov–Smirnov (K-S) test, Lilliefors (LL) test, Cramér–von Mises (CVM) test, etc.);

  • regression and correlation-based tests (Shapiro–Wilk (S-W) test, Ryan–Joiner test, etc.);

  • chi-square (CSQ) test; and

  • moment-based tests (Jarque–Bera J-B, DP Omnibus test, etc.).

Among these tests, some selected GoF tests were most frequently used to assess the appropriate probability model for the laboratory and in -situ geotechnical data. For example, Benson (1993) used the K-S test to determine the best-fitting probability model for the data set comprising permeability of remoulded soil samples, with CoV values ranging between 27% and 767%. Previous studies by Haldar and Mahadevan (2000), Burati et al. (2003), Baecher and Christian (2005), Ang and Tang (2007), Fenton and Griffiths (2008), Galeandro et al. (2017) and Theocharis et al. (2024) identified the CSQ, K-S and A-D tests as the most frequently used methods for assessing the normality of geotechnical data sets. Similarly, Phoon and Kulhawy (1999), Baecher and Christian (2005), Akbas and Kulhawy (2010), Galeandro et al. (2017), Toraldo et al. (2018), Nguyen et al. (2023) and Theocharis et al. (2024) documented the CoV values for geotechnical properties ranging from 2% to 105%, except for permeability, which may exhibit high CoV values (200%–300%).

Furthermore, Uzielli et al. (2007) recommended the S-W test over the K-S test for geotechnical applications, citing Thode’s (2002) findings, which were not explicitly focused on geotechnical properties. More recently, Somani et al. (2019) employed the S-W test to evaluate the normality of soil-like fractions reclaimed from landfill mining. However, Ganesh and Kumar (2025) applied the K-S test to assess whether the index and engineering properties of processed geomaterials (pond ash-bentonite mixtures) followed a normal or lognormal distribution. Although various GoF tests, such as K-S, CSQ, S-W and A-D, are commonly applied, the efficacy of these tests for geotechnical data has not been addressed or is unknown.

From a general statistical perspective, Ramachandran and Tsokos (2009), Montgomery and Runger (2010) and Ott and Longnecker (2015) suggested that a sample size of 30 or more is typically sufficient to approximate the normality of the sample mean distribution, in accordance with the central limit theorem. However, they also reported that this threshold is not unalterable because it is derived from extensive simulation studies and may vary depending on the degree of skewness of the data. In addition, the normality of the data should be examined carefully when approximating it with a sample size of ≤30. For instance, Tang et al. (2017) optimise the required minimum sample size to be in the range of 54–458 for identifying the most suitable probability model for geotechnical data using Akaike Information Criterion scores (one of the techniques to assess appropriate distribution fit). Their analysis considered data with CoV varying from 10% to 30%. However, in geotechnical engineering practice, most studies have addressed the constraints of obtaining a sample size of more than 30, which is not uncommon (Wang et al., 2015; Tang et al., 2017).

Earlier studies by Yazici and Yolacan (2007) and Yap and Sim (2011) optimised the sample size for assessing the performance of the selected GoF tests using power analysis. However, these investigations primarily relied on synthetic data and did not account for the actual variability observed in geotechnical properties. Therefore, the present study aims to evaluate the efficacy of commonly used GoF tests in assessing the normality of geotechnical data sets with diverse statistical characteristics (CoV = 6%, 7%, 12%, 15%, 17%, 19%, 29%, 37%, 94% and 196%). To ensure accuracy on statistical decisions, sample size optimisation is performed using power analysis, allowing for an effective comparison of the performance and power of each GoF test. Furthermore, to evaluate the test sensitivity under controlled deviations from normality, skewness is artificially induced into the original data using a nonlinear transformation. Based on the findings, a classification framework will be proposed to recommend the minimum sample size requirement as a function of the CoV, thereby improving the practical applicability of normality testing in geotechnical engineering.

This study evaluates multiple data sets encompassing natural geomaterials, soil-like fractions from legacy waste deposits, and processed geomaterials to assess their normality, as summarised in Table 1. Natural geomaterial samples were collected from depths of 0.6–1.0 m at the highway construction sites located in the Karaikal and Nagapattinam regions of Puducherry and Tamil Nadu, India. Fifty field samples were collected from natural ground earth, which are classified as Silty Sand (SM) and Clayey Sand (SC) based on the Unified Soil Classification System (USCS), as reported in Table 2. The key geotechnical properties evaluated from these samples included the liquid limit (data set 1), plastic limit (data set 2), compaction characteristics (data sets 3 and 4) and California Bearing Ratio (CBR) (data set 5), following the guidelines of ASTM D1557 (2015), ASTM D698 (2021), ASTM D4318 (2017) and ASTM D1883 (1999).

Table 1

Summary of experimentally derived geotechnical properties evaluated for normality

S. no.Natural geomaterialLegacy wastePond ash-bentonite mixture
Liquid limit (%)Plastic limit (%)Dry density (mg/m3)Moisture content (%)CBR (%)% of fines (< 75 µm)Dry density (mg/m3)Moisture content (%)UCS (kPa)Permeability (m/s)
1482315.188.06.1051.6219.5437.73.50E-07
2391616.069.020.061.6521.6229.73.70E-07
3241416.069.020.691.6718.5248.15.80E-07
4251616.0610.08.1091.6719.5695.24.10E-08
5262016.210.014.491.6919.85108.52.60E-08
6341916.210.010.991.7119.47123.34.00E-08
7522416.210.54.20101.7119.22182.64.30E-09
8422116.3511.06.00111.7216.89165.11.20E-09
9501816.3511.020.9111.7219.35136.77.20E-09
10431616.5711.014.3121.7319.8773.52.60E-07
11301816.6411.017.1121.7416.4255.31.00E-07
12341816.6411.013.6131.7519.5849.98.70E-08
13381717.0711.012.1131.7518.93137.69.70E-09
14341817.0711.012.9131.7717.751451.90E-08
15401717.2211.513.8131.7717.79169.67.50E-09
16392017.2211.54.10141.7718.51228.22.50E-09
17371817.3712.010.8141.7719.85204.11.40E-09
18421717.5112.013.0141.7918.06263.74.20E-10
19321717.5112.08.90141.814.2790.51.10E-07
20341717.5112.013.9141.8217.9977.55.40E-08
21301817.5112.04.30141.8215.97106.37.20E-08
22251717.5112.013.8141.8214.88235.14.40E-09
23351717.6612.017.1141.8315.52230.22.50E-09
24431617.6612.513.3151.8414.21284.27.60E-09
25382117.6612.518.7151.8413.87367.86.40E-10
26311917.812.519.5151.913.873763.40E-10
27221617.812.56.10151.9215.25475.22.60E-10
28301517.812.520.3151.9412.61168.88.40E-08
29401618.0912.514.8161.9412.21106.85.40E-08
30281518.0912.518.3161.9412.66139.62.00E-08
31411718.2412.517.8161.9412.79425.92.20E-09
32411218.2413.013.4161.9514.34385.44.80E-09
33331318.3813.013.3161.9713.23362.61.70E-09
34301318.3813.014.1161.9712.70881.62.30E-10
35301918.5313.014.8171.9814.03974.91.10E-10
36371518.5313.033.8181.9913.68786.73.00E-10
37201618.5313.018
38221818.5313.020
39242118.8213.021
403218.8213.021
414319.1113.022
424519.1113.022
434219.1113.023
444119.2613.024
4519.2613.5-
4619.2613.5-
4719.2614.0-
4820.1314.5-
4921.1515.0-
5021.8815.5-
Note(s):

CBR = California Bearing Ratio; UCS = unconfined compressive strength

Source(s): Table by authors
Table 2

Summary of descriptive statistics of the geotechnical data used for normality testing

DescriptionPropertiesSoil typeSample size (n)Average (x¯)SDCoV (%)Skewness (b1)Kurtosis (b2)
Data set 1Liquid limit (%)SM, SC4435.17.8322.3−0.010−0.613
Data set 2Plastic limit (%)SM, SC3917.42.5614.70.3880.596
Data set 3Dry density (mg/m3)SM, SC501.8210.147.40.6050.896
Data set 4Moisture content (%)SM, SC5012.11.4712.2−0.4900.802
Data set 5California bearing ratio (%)SM, SC35#13.34.9137.0−0.378−0.613
Data set 6Fines (%)SM*4414.64.2829.20.1470.141
Data set 7Dry density (mg/m3)PA-Bt mixtures361.8110.115.90.192−1.123
Data set 8Moisture content (%)PA-Bt mixtures3616.52.7816.8−0.034−1.454
Data set 9Unconfined compressive strength (kPa)PA-Bt mixtures36242.5228.7194.31.9543.665
Data set 10Permeability (m/s)PA-Bt mixtures366.41.27E-07196.82.8148.098
Note(s):

*Soil-like fraction from legacy was classified as equivalent to SM; PA-Bt. = pond ash-bentonite; # Sample size after deducting outlier

Source(s): Table by authors

Soil-like fractions were collected from a municipal solid waste open dumpsite in Karaikal, Puducherry. Forty-four samples collected from depths ranging from 1 to 4.0 m were analysed for particle size distribution, focusing on the percentages of fines (silt and clay-sized fractions) that passed through the 75 µm sieve. Based on UCSC classification (ASTM D2487, 2017), the soil-like fraction was identified as equivalent to Silty Sand (SM). The fines content (%) was designated as data set 6.

The processed geomaterial was prepared by blending pond ash with bentonite at varying inclusion rates of 10%, 20% and 30%. It was then assessed for its suitability as a landfill liner and cover system application. This data set is reported as a laboratory-derived geotechnical data set. Pond ash was sourced from the ash discharge point at the Neyveli Thermal Power Station, Tamil Nadu, and the bentonite was obtained from a commercial supplier. A total of 36 placement conditions, defined by variations in dry density and moisture content, were designated as data sets 7 and 8, respectively. In addition, the unconfined compressive strength (ASTM D2166, 2016) and permeability (ASTM D5084, 2016) of the processed geomaterials were determined and categorised as Data sets 9 and 10. The geotechnical properties determined from laboratory testing were subsequently used for statistical normality analysis.

Among the GoF tests, this study uses the K-S test with Lilliefors’ correction, commonly referred to as the LL test, because the classical K-S test assumes that the mean and variance are known before the analysis. In contrast, these parameters are invariably estimated from sample data in geotechnical applications. Alongside the LL test, analyses include the A-D, CSQ and S-W tests, which are most commonly used for normality assessment in geotechnical data sets. The present study further incorporates J-B, DP and CVM tests, which involve relatively straightforward computational procedures. A detailed description of these normality tests, along with their respective rejection criteria, is summarised in Table 3. The LL test measures the maximum deviation between the empirical cumulative distribution function and the hypothesised cumulative distribution function, calculated based on the sample data. The test is known to be less sensitive in the tails. The A-D test adjusts for this limitation by assigning greater weight to tail discrepancies (Stephens, 1974). The CVM test offers a balanced approach, integrating squared differences across the entire data range to provide a meaningful assessment of the distribution.

Table 3

Overview of selected goodness-of-fit tests, associated test statistics and corresponding rejection criteria

S. no.Test of normalityTest statisticCriteria for rejection
1Lilliefors Test (Lilliefors, 1967)Test statistic: Dn = max |Fx(x) – Sn(x)| Sn = cumulative frequency function Fx = assumed theoretical CDFDn = maximum difference Dnα=criticalvaluesatvarious αIf Dn<Dnα, the proposed theoretical distribution is acceptable at the specified significance level α
2J-B test (Jarque and Bera, 1987)Test statistic: JB=n((b1)26+(b23)224)b1=skewness; b2 = kurtosisThe test statistic is compared with 5.9915
3A-D test (Anderson and Darling, 1954)A2=i=1n[(2i1){lnFx(xi)+ln[1Fx(X(n+1i))]}/n]nA*=A2(1.0+0.75n+2.25n2)cα=aα(1+b0n+b1n2), n = no. of sample aα, b0, b1are constantsThe test statistic is compared with the related critical value
4DP test (D’Agostino and Pearson, 1973)DP=z2(b1)+z2(b2) b1=skewness,b2=kurtosisp-value of DP value is compared with significance level (α)
5CSQ test (Pearson, 1900)CSQ=j=1c(OjEj)2Ej. Oj = observed values, Ej = expected valuesCSQ value is compared with 5.991
6CVM test (Cramer, 1928; Smirnov, 1936; Stephens, 1974)CVM=112n+i=1n(zi2i12n)2Zi=X(i)X¯SZi = is the standardized observation X¯=isthesamplemeanThe calculated CVM statistics are compared with the critical value at a chosen significance level
7S-W test (Shapiro and Wilk, 1965)b = a1(xnx1) + a2(xn–1x2) +… …, w=b2(n1)S2 a1, a2,… are coefficients based on sample size, s = standard deviation of sample dataCalculated W is compared with table W
Source(s): Table by authors

Moment-based tests, including the J-B test (Jarque and Bera, 1987) and the DP omnibus test (D’Agostino and Pearson, 1973), confirmed the distributional characteristics based on measures of skewness and kurtosis. In addition, the CSQ test (Pearson, 1900), a classical data frequency-based method, was also employed. The S-W test (Shapiro and Wilk, 1965), which leverages order statistics and variance ratios, was included due to its strong performance in small sample analyses. This comparative evaluation aims to provide insights into the performance of these GoF tests under different probabilistic models, thereby facilitating the selection of an appropriate test for geotechnical engineering applications.

The power analysis was conducted through extensive simulations using MATLAB to evaluate the effectiveness of selected GoF tests. Furthermore, the Lilliefors test (lillietest function), A–D (adtest function), chi-square goodness-of-fit (chi2gof function) and Jarque–Bera (jbtest function) tests are built-in functions in MATLAB and can be used directly for analysis. However, tests like Shapiro–Wilk (swtest function), Cramér–von Mises (cmtest function) and D’Agostino–Pearson (DagosPtest function) are sourced from external MATLAB file exchange (BenSaïda, 2025a, 2025b; Trujillo-Ortiz, 2025).

The statistical power of the test is defined as the probability of correctly rejecting the null hypothesis (H0: The distribution is normal) when the alternative hypothesis (HA: The distribution is not normal) is true. Mathematically, it is expressed as Power = 1 − β, where β represents the type II error (fails to reject H0 when HA is true). A high power value minimises Type II error and ensures reliable statistical inference.

In this analysis, the Monte Carlo simulations were carried out to increase the sample size by generating the random realisations of synthetic data based on the moments and marginal distribution of the original sample data set. Key simulation parameters, including significance level, number of iterations, and sample sizes, were then defined. The simulations were conducted for multiple sample sizes, ranging from 5 to 10,000, increasing in increments of 5 (user-defined). A significance level (α) of 0.05 was selected for hypothesis testing across the simulations.

The power of the selected GoF tests was then calculated by following the protocol, as described in Steele (2003) and Arnastauskaite et al. (2021). Step 1: Collection of sample data – the analysis begins with an experimentally obtained data set consisting of observations x1, x2,…, xn. Step 2: Statistical parameter estimation – sample moments and the corresponding parameters of the fitted marginal distributions are computed from the collected data. Step 3: Monte Carlo data generation – using these estimated parameters, surrogate data sets are generated under the specified alternative distribution through Monte Carlo simulation. Step 4: Applying the hypothesis test – the test statistic is computed based on the compatibility hypothesis criteria. If the obtained test statistic (p-value) exceeds the corresponding critical value (chosen significance level of α = 0.05), the null hypothesis (H0) of the distribution is normal and will be accepted. Step 5: Simulation and repetition – steps 2–4 are repeated for k iterations (in this case, k =10,000) to ensure a robust evaluation of the test performance. Step 6: Calculating the power of the test – the power (1 − β) is determined as count/k, where count represents the number of times the null hypothesis (H0) is correctly rejected under the assumed alternative distribution across the k iterations.

For each realisation, seven different normality tests were applied: the S-W test, LL test, A-D test, CVM test, CSQ test, J-B test and DP omnibus test. After computing the power values, the performance of each test was evaluated using its behaviour at the maximum sample size to establish its comparative effectiveness. A higher power indicates a stronger ability to identify deviations from normality, making the corresponding test more reliable for assessing data normality.

The normality assessment for the geotechnical properties, summarised in Table 1, was conducted with a focus on their application in highway subgrade construction and landfill liner and cover materials. Specifically, data sets 1 to 5 were evaluated for their suitability as highway subgrade material with a target CBR value of at least 10%. Meanwhile, data set 6 was assessed to evaluate the suitability of soil-like fractions as backfill material, considering the percentage of fines. Furthermore, data sets 7 to 10 were analysed for their suitability as liner and cover material, targeting an unconfined compressive strength of ≥ 200 kPa and permeability (k) of ≤ 10−7 cm/s to ensure effective containment performance. Furthermore, conducting statistical parametric tests on these physical and mechanical properties provides valuable insight into the feasibility of utilising these materials in the subgrade and landfill containment systems while addressing inherent variability and uncertainty.

Table 2 provides a comprehensive summary of the descriptive statistics of geotechnical data analysed for normality assessment. To proceed further, each data set was initially organised, and outliers were identified and removed by performing a statistical hypothesis test proposed by Grubbs (1969) to ensure data consistency and unbiasedness. Based on the result of Grubbs’ two-tailed test (significance level, α = 0.01), it was observed that only data set 5 (CBR) contained an outlier (CBR = 33.8%), which was identified and excluded from the data set. The excluded data point exhibited statistically significant deviations from the sample mean and was removed before performing the normality and power analyses. The variability in the data, assessed using the coefficient of variation (CoV), ranged from 6% to 196%. According to Harr’s (1987) classification, the variability is categorised based on the sample CoV. Specifically, data sets 3, 4 and 7 exhibit low variability (CoV < 10%), while data sets 1, 2, 6 and 8 display moderate variability (15% < CoV < 30%). In contrast, CBR (data set 5), UCS (data set 9) and permeability (data set 10) demonstrate high variability (CoV > 30%). Higher CoV is common for the geotechnical domain (Benson, 1993; Baecher and Christian, 2005).

In addition, Table 2 reports the sample averages, standard deviations, skewness and kurtosis values, which provide qualitative insights into the data distribution characteristics. The skewness coefficient indicates asymmetry in the data distribution: data sets 1, 4 and 8 exhibit negative skewness, indicating a left-skewed distribution. While data sets 2, 3, 5, 6, 7, 9 and 10 display positive skewness, reflecting a right-skewed distribution. Furthermore, the kurtosis analysis, which measures the sharpness of the distribution, reveals that data sets 1 to 8 exhibit a flat, short-tailed distribution (Platykurtic, kurtosis < 3). In contrast, data sets 9 and 10 are characterised by sharp peaks and long tails (leptokurtic, kurtosis > 3). Ideally, a data set with skewness near zero and a kurtosis value of approximately three indicates a perfectly symmetrical normal distribution (Newell and Hancock, 1984; Thode, 2002). Although the skewness and kurtosis (third and fourth central moments, respectively) serve as functional parameters for assessing normality, relying solely on these metrics may lead to misleading conclusions. Hence, to ensure robustness in the statistical decisions on the normality assessment, the theory-driven normality tests (GoF) discussed in Section 2.2 shall be applied.

Table 4 presents a summary of the hypothesis decisions by various GoF tests in assessing the normality of the original data sets, based on the analytical equations provided in Table 3. The assessment of normality was conducted by comparing the test statistics against the corresponding critical values. Accordingly, the LL test and S-W tests rejected the null hypothesis of normality for data sets 4, 5, 9 and 10. In contrast, the moment-based J-B test accepted normality only for data sets 1 to 8. In addition, the CSQ test rejected the null hypothesis for data sets 2, 4, 5, 7, 8 and 10. However, the CVM tests failed to reject the null hypothesis for data sets 1 to 9, indicating variations in sensitivity across different normality tests. A consistent finding across all the GoF tests was the rejection of normality for data sets 9 and 10.

Table 4

Summary of hypothesis inference on the normality tests conducted on original and skewed data sets using selected GoF tests

Data setLLA-DCSQJ-BDPCVMS-W
Original data set
1
2
3
4
5
6
7
8
9
10
Skewed data set (k = 2)
1
2
3
4
5
6
7
8
9
10
Skewed data set (k = 3)
1
2
3
4
5
6
7
8
9
10
Note(s):

✓ Fails to reject H0 (Normality assumption accepted);

✗ Rejecting H0 (Normality assumption rejected)

Source(s): Table by authors

Notably, these data sets (9 and 10) exhibit high CoV of 94% and 196%, respectively, and are characterised by long-tailed distributions. Since the lognormal distribution is more commonly used in geotechnical engineering, Monte Carlo simulation studies were conducted to further investigate the boundary conditions under which data conform to a lognormal distribution. To achieve this, a random variable was generated synthetically while systematically varying CoV and kurtosis, keeping skewness constant (zero). The generated data sets were then subjected to GoF tests to determine whether they adhered to a normal (Gaussian) or lognormal distribution. Figure 1 visually represents the progressive shift in data distribution from normal to lognormal as variability increases. The analysis revealed a critical threshold: when a data set exhibits a CoV greater than 30% with a kurtosis value exceeding 3, the lognormal distribution provides a superior fit. These findings reinforce the importance of considering CoV and kurtosis when selecting an appropriate probability distribution for geotechnical data.

Figure 1
Scatter plot comparing the coefficient of kurtosis against the coefficient of variation, with data points categorized and fitted lines indicating lognormal and normal distributions.The scatter plot displays the relationship between the coefficient of kurtosis, ranging from zero to ten on the vertical axis, and the coefficient of variation, which extends from zero to seventy percent on the horizontal axis. The data points are represented by various symbols that may indicate different categories or groups. Two distinct regions are highlighted with red dashed boxes, one indicating data fitted to a lognormal distribution and another showing data fitted to a normal distribution. The plot includes annotations pointing to these fitted distributions for clarity, without interpreting the data itself. The axes are labelled clearly, and the increments on the horizontal axis are marked to facilitate understanding of the data spread.

Effect of the coefficient of variation and kurtosis value on fitting the data to the lognormal distribution

Source: Figure by authors

Figure 1
Scatter plot comparing the coefficient of kurtosis against the coefficient of variation, with data points categorized and fitted lines indicating lognormal and normal distributions.The scatter plot displays the relationship between the coefficient of kurtosis, ranging from zero to ten on the vertical axis, and the coefficient of variation, which extends from zero to seventy percent on the horizontal axis. The data points are represented by various symbols that may indicate different categories or groups. Two distinct regions are highlighted with red dashed boxes, one indicating data fitted to a lognormal distribution and another showing data fitted to a normal distribution. The plot includes annotations pointing to these fitted distributions for clarity, without interpreting the data itself. The axes are labelled clearly, and the increments on the horizontal axis are marked to facilitate understanding of the data spread.

Effect of the coefficient of variation and kurtosis value on fitting the data to the lognormal distribution

Source: Figure by authors

Close modal

Since the data sets analysed varied in sample sizes, and different GoF use different statistical measures, drawing a definitive conclusion on the superior performance of any particular GoF test is challenging. Generally, the assumption of normality becomes less critical for sample sizes n ≥ 30 (Uttley, 2019), as supported by the central limit theorem. Furthermore, there are reported cases where small sample data sets that are genuinely drawn from a normally distributed population may still pass normality tests. However, this outcome is not necessarily meaningful, as it often results from the test’s lack of statistical power (Öztuna et al., 2006; Ghasemi and Zahediasl, 2012). For small sample sizes, normality tests often lack the sensitivity needed to detect deviations from normality, increasing the likelihood of Type II errors (false acceptance of normality). Consequently, without conducting a power analysis, determining the appropriate sample size remains uncertain, which may lead to biased conclusions regarding normality.

Before conducting the power analysis through Monte Carlo Simulations, it is necessary to define the alternative distribution corresponding to the alternative hypothesis. Accordingly, each data set was evaluated against several theoretical distributions, Normal (N), Lognormal (LN), Weibull (WB), Gamma (GA), Generalised Extreme Value (GEV), Gumbel (GB) and Exponential (Exp), as presented in Figure 2. The GoF for each distribution was quantified using mean P-values derived from seven selected GoF tests. The test results revealed that data sets 1, 4 and 10 were well-fitted with the WB distribution, data sets 3, 7 and 9 with the LN distribution, data sets 5, 6 and 8 with the GEV distribution, and data set 2 with the GA distribution, respectively. The GB and Exp distributions were found to be the least suitable for the selected data sets. Notably, analysis indicated that data sets with higher CoV fitted well with LN, WB, GEV and GA, respectively. Consequently, these four models were adopted as surrogate models for the subsequent power analysis to evaluate the performance of GoF tests and determine the sample size.

Figure 2
A bar graph displays the mean p-value of various best fit distributions across ten datasets, with each distribution represented by different colored bars.The image illustrates a bar graph depicting the mean p value for different best fit distributions across ten datasets. The vertical axis is labelled MEAN P VALUE, ranging from zero to one, while the horizontal axis is marked with Dataset 1 through Dataset 10. Each dataset displays multiple coloured bars, representing different types of distributions, Normal, grey, Lognormal, green, Weibull, blue, Gamma, red, Generalized Extreme Value, gold, Gumbel, orange, and Exponential, dark grey. Each distribution type has an associated legend in the top right corner for identification. The bars are arranged to highlight the comparisons across the datasets without presenting any trends, making it clear which dataset corresponds to each distribution.

Best-fitting probability model for the original data sets based on the mean P-value estimated from the seven GoF tests

Source: Figure by authors

Figure 2
A bar graph displays the mean p-value of various best fit distributions across ten datasets, with each distribution represented by different colored bars.The image illustrates a bar graph depicting the mean p value for different best fit distributions across ten datasets. The vertical axis is labelled MEAN P VALUE, ranging from zero to one, while the horizontal axis is marked with Dataset 1 through Dataset 10. Each dataset displays multiple coloured bars, representing different types of distributions, Normal, grey, Lognormal, green, Weibull, blue, Gamma, red, Generalized Extreme Value, gold, Gumbel, orange, and Exponential, dark grey. Each distribution type has an associated legend in the top right corner for identification. The bars are arranged to highlight the comparisons across the datasets without presenting any trends, making it clear which dataset corresponds to each distribution.

Best-fitting probability model for the original data sets based on the mean P-value estimated from the seven GoF tests

Source: Figure by authors

Close modal

The power analysis revealed key insights into the performance of normality tests applied to geotechnical data. Figure 3(a)–(d) illustrates the variation in the power values with increasing sample sizes for data set 5 (CBR), evaluated under alternate distributions of LN, WB, GA and GEV. The results indicated that the power value consistently increases with sample size and asymptotically approaches 1.0 as the sample size becomes sufficiently large. Moreover, the sample size required to achieve the power value of 1.0 varied considerably across the alternative distributions, ranging from 90 to over 1,000. For instance, at a sample size of 100, using the S-W GoF test, the power values for LN, GA and GEV approached 1.0, whereas the power value of WB is relatively lower, at around 0.1. This observation highlights the strong dependence of power and corresponding sample size requirements on the shape and location parameters of the underlying alternative marginal distributions.

Figure 3
This image displays multiple plots showing the relationship between power and sample size across different statistical methods, organised in a grid format.The image consists of a grid of eight plots showing the power, one minus beta, against sample size for various statistical methods. The plots are categorised into four rows labelled a to d, with two subplots per category. The categories include L N, W B, G A, and G E V. Each plot features multiple lines representing different methods, L L, A D, C S Q, J B, D P, C V M, and S W, indicated by distinct markers and colours. The horizontal axis represents sample size, ranging from one to ten thousand, with logarithmic scaling. The vertical axis represents power with values ranging from zero to one. The upper left corner contains legend keys identifying the methods. Graph grid lines enhance readability. Each plot's data points resemble cumulative distribution functions, reflecting the performance of statistical tests as influenced by sample size.

Power curves for the original data set (CBR – data set 5) and the skewness-induced data sets. Subplots (a)–(d) correspond to the original data, while (a1)–(d1) and (a2)–(d2) represent the data sets transformed with exponents k =2 and k =3, respectively

Source: Figure by authors

Figure 3
This image displays multiple plots showing the relationship between power and sample size across different statistical methods, organised in a grid format.The image consists of a grid of eight plots showing the power, one minus beta, against sample size for various statistical methods. The plots are categorised into four rows labelled a to d, with two subplots per category. The categories include L N, W B, G A, and G E V. Each plot features multiple lines representing different methods, L L, A D, C S Q, J B, D P, C V M, and S W, indicated by distinct markers and colours. The horizontal axis represents sample size, ranging from one to ten thousand, with logarithmic scaling. The vertical axis represents power with values ranging from zero to one. The upper left corner contains legend keys identifying the methods. Graph grid lines enhance readability. Each plot's data points resemble cumulative distribution functions, reflecting the performance of statistical tests as influenced by sample size.

Power curves for the original data set (CBR – data set 5) and the skewness-induced data sets. Subplots (a)–(d) correspond to the original data, while (a1)–(d1) and (a2)–(d2) represent the data sets transformed with exponents k =2 and k =3, respectively

Source: Figure by authors

Close modal

Conversely, the type of GoF tests used had a marked influence on the power values obtained across the different sample sizes. As observed from Figure 3, the S-W, A-D, LL and J-B tests exhibited relatively higher power values, whereas the CSQ and CVM tests demonstrated comparatively lower power. Consequently, to determine whether the observed differences in power values among the selected normality tests were statistically significant under various distributional conditions (lognormal, gamma, Weibull and GEV), a non-parametric Friedman test was performed. According to Corder and Foreman (2014), the null hypothesis (H0) is postulated that all the GoF tests yield equivalent power values, while the alternative hypothesis (HA) proposed that at least one test performs significantly different. Table 5 summarises the computed power values for data set 5 across various GoF tests and sample sizes. Three representative sample sizes were included in the comparative analysis, and test statistics for each case were calculated using equation (1):

(1)
Table 5

Friedman test results comparing mean power values of the GoF tests across different parametric surrogate distributions

PDFEffect sizeLLJ-BA-DS-WDPCVMCSQFr (test statistic)Fc (critical)
Sample size (n) = 25
LN0.220.3820.4970.5210.5970.4120.0100.04622.611.6
WB0.150.0440.0290.0460.0510.0180.0000.001
GA0.180.1670.2290.2170.2730.1760.0000.007
GEV0.160.1640.2160.2340.3040.1560.0000.007
Sample size (n) = 50
LN0.220.38200.49680.52100.59660.41160.01020.045622.411.6
WB0.150.04400.02940.04640.05060.01840.00000.0014
GA0.180.16720.22860.21700.27280.17640.00040.0070
GEV0.160.16380.21580.23400.30360.15600.00000.0072
Sample size (n) = 100
LN0.220.38200.49680.52100.59660.41160.01020.045622.711.6
WB0.150.04400.02940.04640.05060.01840.00000.0014
GA0.180.16720.22860.21700.27280.17640.00040.0070
GEV0.160.16380.21580.23400.30360.15600.00000.0072
Source(s): Table by authors

Here, m denotes the number of GoF tests considered, l represents the number of distributional conditions (blocks), and Ri corresponds to the sum of ranks assigned to each GoF test. Based on the ranking procedure across the distributional block, the calculated Friedman test statistic values (Fr) were 22.6, 22.4 and 22.7, respectively. The critical value for the present case was obtained as 11.6 (where α = 0.05, l = 4, m =7) from the updated and extended critical values proposed by López-Vázquez and Hochsztain (2019). Since the computed test statistic value exceeds the critical value, it can be inferred that a statistically significant difference exists among the GoF tests with respect to their power values. Given that the significance level (α) was maintained as 0.05 throughout the power analysis, the changes in the power values must come from the sample size (shown in Figure 3) and effect size. The effect size measures how much the observed data deviates from a perfect normal distribution (Cohen, 1992); a detailed discussion of effect size is presented in the subsequent section.

As the required sample size to arrive at the power value of 1.0 was influenced not only by the specific GoF tests but also by the statistical characteristics of the data set, particularly the CoV (effect size). Hence, to further examine the sensitivity of test methods in detecting deviations from normality, all the selected GoF tests were applied to the original data sets with the induced skewness, thereby varying the effect size. For this purpose, the original data were systematically transferred to introduce a controlled level of skewness and mean shift, thereby enabling detailed evaluation of each test decision-making capability. Skewness was induced using a nonlinear power transformation of the form, Y = Xk, where Y is the transformed random variable, X is the original random variable, and k is the exponent. In this study, values of k =2 and k =3 were adopted to capture the sensitivity of different GoF tests under varying effect sizes. Figure 4 illustrates the standard normal probability plots of the original data set and the data sets induced with skewness.

Figure 4
Three graphs showing probability distributions for original and skewed data with varying z values, labelled a, b, and c, illustrating changes in distribution shape.The image consists of three graphs, labelled a, b, and c, depicting probability distributions. Graph a presents the original data with a bell shaped curve plotted against z values ranging from negative three point five to positive three point five, with probability on the vertical axis. Graph b illustrates skewed data with a parameter k equal to two, maintaining the z value range and showing a shift in the distribution shape. Graph c further represents skewed data with a parameter k equal to three, again within the same z value range, demonstrating an increased skew to the right. Each graph includes vertical bars representing distribution frequency, with arrows indicating the transformation from original to skewed data. The graphs utilise a consistent scale on both axes for easy comparison.

Standard normal probability plots for: (a) original data set 2 (plastic limit; CoV =15%), (b) skewed data set (k =2) and (c) skewed data set (k =3)

Source: Figure by authors

Figure 4
Three graphs showing probability distributions for original and skewed data with varying z values, labelled a, b, and c, illustrating changes in distribution shape.The image consists of three graphs, labelled a, b, and c, depicting probability distributions. Graph a presents the original data with a bell shaped curve plotted against z values ranging from negative three point five to positive three point five, with probability on the vertical axis. Graph b illustrates skewed data with a parameter k equal to two, maintaining the z value range and showing a shift in the distribution shape. Graph c further represents skewed data with a parameter k equal to three, again within the same z value range, demonstrating an increased skew to the right. Each graph includes vertical bars representing distribution frequency, with arrows indicating the transformation from original to skewed data. The graphs utilise a consistent scale on both axes for easy comparison.

Standard normal probability plots for: (a) original data set 2 (plastic limit; CoV =15%), (b) skewed data set (k =2) and (c) skewed data set (k =3)

Source: Figure by authors

Close modal

By systematically introducing the controlled skewness into the data set, the CoV (%) of the data sets increased twice for k =2 and thrice for k =3 compared to the original data set’s CoV values. Further, For k =2, the descriptive statistics of original data sets were transformed to: CoV (%) = {43, 30, 15, 23, 88, 56, 12, 33, 198, 315}; Skewness = {0.45, 0.88, 0.87, −0.03, 2.85, 0.94, 0.27, 0.12, 2.98, 4.21}; and Kurtosis = {−0.19, 1.23, 1.56, 0.67, 12.27, 0.57, −1.14, −1.37, 8.49, 19.41}. Likewise, for k =3, the transformed statistics were: CoV (%) = {62, 46, 23, 34, 153, 84, 18, 47, 276, 392}; Skewness = {0.92, 1.34, 1.16, 0.42, 4.51, 1.49, 0.34, 0.28, 3.44, 5.13}; and Kurtosis = {0.79, 2.33, 2.43, 1.06, 23.83, 1.67, −1.14, −1.18, 11.52, 28.09}.

Consequently, the effect size was quantified using Cohen’s d, expressed as   d=|μμ0|s, where the numerator |μμ0| indicates the difference between the means under the null and alternative hypotheses, and s denotes the pooled standard deviation. Generally, a larger effect size (data exhibiting higher CoV) represents a more pronounced deviation from normality, making it easier to detect non-normality. Conversely, a smaller effect size indicates a subtle deviation, requiring a larger sample size to achieve the same statistical power. The computed values indicated an increase in effect size for data sets with the introduction of skewness (following power transformation). For instance, for data set 5 fitted with LN, the effect size of the original data set was 0.223, which increased to 0.371 and 0.698 for transformation exponents k =2 and k =3, respectively, in the transformed data. Figure 5 presents the responses of various GoF tests at different levels of effect size (d) for both the original and transformed data sets. The effect size ranged from 0.08 for the original data sets to approximately 0.7 for the skewed data sets. Across all sample sizes (n = 25, 50, and 100), the S-W test demonstrated superior performance and higher sensitivity to the effect size variations, followed by the A-D and J-B tests. Moreover, the results revealed that increasing the sample size led to a reduction in the effect size, thereby validating the central limit theorem. Even at small effect sizes, a power value of 1.0 was achieved at larger sample sizes, confirming the dependency of test sensitivity on sample size.

Figure 5
Plot showing statistical power across different sample sizes (n) for various tests, including LN, WB, GA, and GEV. Each subplot illustrates power trends based on effect size.The image depicts four main sections of statistical power analysis in the context of sample sizes, arranged in a grid format. The first figure, a, represents the log normal distribution, L N, with sample sizes of 25, 50, and 100. Each subplot, such as a 1 and a 2, includes distinct curves for different statistical tests identified by symbols and colours, L L, A D, C S Q, J B, D P, C V M, and S W. The second section, b, illustrates the W B test under similar sample size conditions. Subsequent panels, c and d, showcase the G A and G E V tests respectively, with emphasis on the relationship between power, one minus beta, and effect size, d, across various tests. Each plot includes grid lines for clarity and distinct axes for measuring effect size and power, while sample size indicators guide interpretation. The layout allows for easy comparison between sample settings and their corresponding statistical power outcomes.

Sensitivity of the selected GoF tests illustrated through their power responses across varying sample sizes (n) and effect size conditions

Source: Figure by authors

Figure 5
Plot showing statistical power across different sample sizes (n) for various tests, including LN, WB, GA, and GEV. Each subplot illustrates power trends based on effect size.The image depicts four main sections of statistical power analysis in the context of sample sizes, arranged in a grid format. The first figure, a, represents the log normal distribution, L N, with sample sizes of 25, 50, and 100. Each subplot, such as a 1 and a 2, includes distinct curves for different statistical tests identified by symbols and colours, L L, A D, C S Q, J B, D P, C V M, and S W. The second section, b, illustrates the W B test under similar sample size conditions. Subsequent panels, c and d, showcase the G A and G E V tests respectively, with emphasis on the relationship between power, one minus beta, and effect size, d, across various tests. Each plot includes grid lines for clarity and distinct axes for measuring effect size and power, while sample size indicators guide interpretation. The layout allows for easy comparison between sample settings and their corresponding statistical power outcomes.

Sensitivity of the selected GoF tests illustrated through their power responses across varying sample sizes (n) and effect size conditions

Source: Figure by authors

Close modal

Furthermore, the results summarised in Table 4 reveal a substantial shift in statistical outcome. For moderately skewed data (k =2), the A-D, S-W, and LL tests demonstrated higher sensitivity to nonlinear transformation, with several data sets that initially satisfied the normality assumption being rejected once skewness was introduced. In contrast, the CVM tests showed minimal sensitivity under the same conditions, rejecting the normality only for data set 9. As expected, the moment-based test also responded to the transformation, owing to its direct dependence on skewness and kurtosis. With a further increase in the skewness (k =3), nearly all the GoF became more sensitive in detecting deviations from normality, as shown in Table 4.

It can be seen from Figures 3 and 5 that, even with a sample size of 100, most of the selected GoF did not achieve a power value of 1.0 for most of the data sets. This suggests that evaluating the effectiveness of the GoF solely based on achieving a power value of 1.0 may require an impractically large sample size. Consequently, researchers have emphasised the importance of setting an optimal power threshold to balance statistical reliability and sample size limitations. Cohen (1992) reported that selecting a power value less than 0.8 introduces an unacceptably high risk of committing a Type II error, which can potentially lead to erroneous conclusions in statistical analysis. Conventionally, with a significance level (α) of 0.05, a power of 0.80 results in a Type II error probability (β) of 0.2, yielding a β:α ratio of 4:1 (Cohen, 1992). Moreover, Murphy and Myors (2004) noted that a power level of 0.8 implies that the probability of correctly detecting an effect is four times greater than failing to do so, whereas a power level of 0.9 increases this likelihood to nine times. Considering these statistical principles, the effectiveness of GoF tests in this study was assessed based on the minimum power values of 0.8, ensuring a balance between minimising Type II errors and maintaining a feasible sample size.

Based on the power value of 0.8 and the chosen significance level of 0.05, the minimum sample sizes required for normality assessment are summarised in Table 6. Furthermore, the sample size obtained from the power analysis indicates that the CSQ and CVM tests require substantially larger samples, typically between 115 and 4,400, which explains their inability to identify non-normality for data sets 1–9, given the original data set’s available sample size of 36–50. In contrast, the S-W and J-B tests exhibit higher efficiency, with minimum sample size requirements ranging from 8 to 65 for data sets exhibiting strong deviations (higher variability) from normality (e.g. data sets 9 and 10). However, for data sets 1–8, the required sample sizes fall within the range of 50–2,500, indicating that any normality decision drawn from these limited samples is statistically unreliable. In addition, for LL, A-D and DP tests, the required sample size approximately ranges from 12 to 5,000.

Table 6

Summary of minimum sample size requirements for various GoF tests when assessing normality under different surrogate alternative distributions

Data setLLA-DJ-BDPS-WLLA-DJ-BDPS-W
Original data
LN (α = 0.05, Power = 0.8)WB (α = 0.05, Power = 0.8)
126214413024512011357416931175625
2642414328557309476311274501245
325111667122220711213227148134226117
4834551422725405300195177307153
5704648733936961986121415321019
613390851347251703495249623522394
73963255118533229190918011710717494
8472309257416237476312274501245
925192329174965416528
1012111114101312131511
GA (α = 0.05, Power = 0.8)GEV (α = 0.05, Power = 0.8)
161139031554528750453025174518751485
27679567271174717941616574998528
35045377927324746268550453779273247462685
4205013139821679955203311457941146636
5180115109182871579410318970
633721518531916154323173241624202284
75133515144055199433450893673218321081927
81195721549946528571330287496217
949323962272217202215
1014121520121099138
Transformed data (K = 2)
LN (α = 0.05, Power = 0.8)WB (α = 0.05, Power = 0.8)
17045487239821467395652294
2169106991628661233875226921552025
3649417335562311582388348618312
4218146126205109140310179711147896
52317212614102627213148
6402732442420011512222785
71012646493846486304194176317155
812183791236363783873226521542004
913121314121413151912
10989137989137
GA (α = 0.05, Power = 0.8)GEV (α = 0.05, Power = 0.8)
116310510017180376239210378184
2380283203357178212135126207107
311879337221164683198134118197104
454034027347125134112987125511561217
55737447031814571509905480
694616710649115787610961
71660146414901850104410759438931160811
829218715927714052063630208320951785
916131724131211121411
10119111410977137
Transformed data (K = 3)
LN (α = 0.05, Power = 0.8)WB (α = 0.05, Power = 0.8)
13424293922132808916462
27750528042500294258755195
328818916527014852063630208320951785
4101595990504935110710981166989
514131415123532324821
622152024145535447330
7460286237393218616407366647332
85740436134498290256445194
911101113101211121410
10877116877106
GA (α = 0.05, Power = 0.8)GEV (α = 0.05, Power = 0.8)
178505790427351527243
2174109104177838761608551
3625422336563318123857912368
4249159142244120710469422764389
531222942204534364530
647313859263626293623
71085665527877485399262232408200
8137848614668339233197351171
913111315111089138
101078128866116
Source(s): Table by authors

The above findings are consistent with observations by Yap and Sim (2011), who reported that lognormal alternatives generally require smaller sample sizes to achieve a given power compared to Weibull alternatives. However, while Yap and Sim (2011) reported that achieving a power of 1.0 typically requires sample sizes in the range of 100–2,000 (for different alternative surrogate distributions), the corresponding values obtained in this study are comparatively higher, i.e. approximately 100–5,000. This observed difference is justifiable, as synthetic data sets exhibit a controlled level of variability, enabling deviations from normality to be detected with a minimum number of observations. However, field- or laboratory-derived geotechnical data sets inherently exhibit greater variability and uncertainty, thereby necessitating larger sample sizes to achieve equivalent power levels.

As outlined in the preceding sections, a consistent observation was that data sets with higher variability require a smaller sample size to assess or detect deviations from normality. For instance, in the original data (k =1), data sets 9 and 10 required only 8–28 samples, whereas data sets 3 and 7 required between 94 and 2,685 samples when assessed using S-W tests. Similarly, increasing k to 2 and 3 (inducing skewness) led to greater variability, which further reduced the sample size for normality assessment. Given the wide disparity in both CoV and sample sizes, a common logarithmic transformation was applied to both independent (predictor) and dependent (response) variables. The resulting log-log relationship between the CoV and the required sample size is illustrated in Figure 6. As depicted in Figure 6, as the CoV increased, the sample size required for detecting non-normality decreased. A strong negative linear correlation was observed for LL, A-D, J-B, DP and S-W GoF tests, with a correlation coefficient of (r = −0.93), indicating the strong linear dependency on CoV. Subsequently, a predictive model was developed to estimate the minimum sample size (n) as a function of CoV. Linear log-log models were fitted for LL, A-D, J-B, DP and S-W tests, expressed as:

(2)
(3)
(4)
(5)
(6)
Figure 6
A multi-panel graph displays relationships between log N and log CoV for different categories. Each panel includes data points, trend lines, and confidence limits.The image features a multi panel graph presenting the relationship between log N and log C o V across five distinct panels labelled L L, A D, J B, D P, and S W. Each panel contains scatter plots of different data points represented by various symbols, along with fitted regression lines shown in blue. The dashed red lines indicate the upper and lower 95 percent confidence limits for each dataset. Each panel includes an equation indicating the linear relationship, along with an R 2 value to show the goodness of fit. The y axis represents log N and the x axis represents log C o V, with numerical labels spanning from zero to four on the y axis and from zero to three on the x axis. The data is arranged vertically across the five panels, making navigation straightforward from top to bottom.

Relation between CoV and required sample size fitting using a log-log linear regression model

Source: Figure by authors

Figure 6
A multi-panel graph displays relationships between log N and log CoV for different categories. Each panel includes data points, trend lines, and confidence limits.The image features a multi panel graph presenting the relationship between log N and log C o V across five distinct panels labelled L L, A D, J B, D P, and S W. Each panel contains scatter plots of different data points represented by various symbols, along with fitted regression lines shown in blue. The dashed red lines indicate the upper and lower 95 percent confidence limits for each dataset. Each panel includes an equation indicating the linear relationship, along with an R 2 value to show the goodness of fit. The y axis represents log N and the x axis represents log C o V, with numerical labels spanning from zero to four on the y axis and from zero to three on the x axis. The data is arranged vertically across the five panels, making navigation straightforward from top to bottom.

Relation between CoV and required sample size fitting using a log-log linear regression model

Source: Figure by authors

Close modal

where the regression coefficient (b1) varies from −1.362 to −1.536. Since this model was developed from the given sample, it is necessary to verify whether it accurately represents the population model (μy = β0 + β1X + ε) (Ott and Longnecker, 2015). For this purpose, a hypothesis test was conducted against population regression coefficients (β1) using this test statistic ts=(b10)/SEb1. The hypotheses were formulated as: Null Hypothesis (H0): β1 = 0 (Mean sample size is not linearly related to CoV); and Alternate Hypothesis (HA): β1 ≠ 0 (Mean sample size is linearly related to CoV). For all the regression equations (2)–(6), the calculated ts lies between −25 and −28, with SEb1 almost equal to 0.06, and was compared against the critical t-value of −2. Since ts -exceeded critical t-values, the null hypothesis was rejected, indicating that β1 ≠ 0. This confirms that variation in the sample size can be reliably explained by CoV.

Moreover, the direct use of the log-log equations (2)–(6) is not convenient for the practical prediction of the sample sizes. Accordingly, the back-transformed predictive equations (7)–(11) are suggested for operational application, as they allow direct estimation of the minimum sample size (n) for reliable normality assessment when the CoV is provided in percentage terms:

(7)
(8)
(9)
(10)
(11)

Since the S-W test exhibited higher power at lower sample sizes and performed well with all types of data sets, they are taken into primary consideration in the context of sample size optimisation with CoV. Table 7 presents recommended sample sizes for different CoV levels, providing practical guidance for data collection based on the variability of the data. For CoV values ranging from 10% to 100%, the mean fit approximation indicates that 25–665 samples are adequate to draw conclusions on normality using the S-W tests. However, it is important to note that these values reflect the general trend that higher CoV often coincides with greater skewness. Data sets with similar CoV can still differ in skewness; in such cases, the same estimated sample size acts as a 95% lower confidence bound when skewness is higher, and as a 95% upper confidence bound when skewness is lower, as presented in Table 7. Furthermore, for the J-B, A-D, LL and DP tests, the required sample sizes for 10% ≤ CoV ≤100% were estimated to be 750–30, 920–30, 1400–40 and 1250–50, respectively.

Table 7

Recommended GoF tests and corresponding minimum sample size requirements based on data set CoV

Coefficient of variationRequired sample size (Power ≥ 0.8, α = 0.05)GoF test
Mean estimateLower CB (95%)Upper CB (95%)
< 10%>665>334>1323S-W
10% ≤ CoV < 30%665-145334-641323-322
30% ≤ CoV < 50%145-7064-30322-167
50% ≤ CoV < 100%70-2530-10167-68
≥100%≤25≤10≤68
10% ≤ CoV ≤100%750-30394-141418-78J-B
920-30456-121865-81A-D
1400-40715-162817-106LL
1250-50634-192466-121DP
Note(s):

CB = confidence bound

Source(s): Table by authors

In statistical analysis, sample size plays a crucial role in obtaining reliable point estimates of the parameters of interest, which should satisfy key statistical properties such as unbiasedness, consistency, efficiency and sufficiency (Haldar and Mahadevan, 2000; Ramachandran and Tsokos, 2009). In particular, sufficiency refers to the ability of the samples to capture all available information about the population parameter (e.g. shear strength, permeability, CBR, Atterberg limits). The present study found that the S-W performed well when the sample size ranged from 665 to 145 and the sample CoV was between 10% and 30%. However, when normality was assessed with a sample size of 20 or 30, these tests still indicated normality, even for data that exhibited non-normality. Therefore, acceptance of normality at such a small sample size should not be taken as evidence that the sample adequately represents the population.

In geotechnical quality control, standard statistical tools include confidence interval estimation, control charts (e.g. range charts, standard deviation charts and average charts), and hypothesis testing. When quality control measures focus on the mean (point estimate), decisions heavily rely on the central tendency of the data. As such, data sets with 35 samples and 20 samples may yield different statistical conclusions. In cases where the mean is a critical parameter, the LL test can be used, particularly for assessing normality, as it evaluates the overall agreement between the empirical distribution and the theoretical distribution, with sensitivity to deviations in the central portion of the distribution.

Conversely, in reliability-based analysis of the geotechnical structures, where the probability of failure is often governed by the most critical values distributed in the tail region, the A-D test is generally more appropriate. Its increased sensitivity to deviations in the tails enhances the accuracy of assessing whether critical geotechnical data conform to assumed probability models. In the present study, for example, the nonlinear power transformation introduced skewness primarily in the tail region, which was effectively detected by the A-D test (Table 4): skewed data (k =2 and k =3), whereas the LL and CVM tests were least sensitive to capture the deviations. Furthermore, for the chi-square test, there are mixed conclusions regarding the use of chi-square tests for assessing univariate distributions. According to Thode (2002), the chi-square test is not recommended because it is challenging to implement compared to other tests. Moreover, this test may not be attractive to beginners, as there is a high chance of incorrectly rejecting the normality assumption due to computational procedures. In contrast, the DP (sample size = 50–1,250) and J-B (sample size = 30–750) tests are simpler to apply and may serve effectively as complementary tests.

The performance of selected GoF tests in assessing the normality of experimentally derived geotechnical data, including percentage of fines, liquid limit, plastic limit, dry density, moisture content, CBR, unconfined compressive strength and permeability, was evaluated using power analysis. Based on statistical estimates and accepted marginal distributions, a Monte Carlo simulation was used to generate synthetic random samples for assessing the normality and accurate power of the selected GoF test. The analysis provides insights into the sensitivity and reliability of these tests when applied to geotechnical data sets with diverse statistical characteristics. Based on the study, the following conclusions are drawn:

  • The initial findings from the power analysis indicate that when the available sample size is smaller than the minimum required, the GoF tests may still accept normality (e.g. data set 1) due to lack of statistical power, thereby increasing the risk of Type II error, i.e. falsely accepting normality. Conversely, rejecting normality under the same smaller sample size conditions (e.g. data set 5) remains significant, as the conclusion reflects a detectable deviation from the normality. Since Type I error is controlled at α = 0.05, rejection under reduced power provides reliable evidence against the null hypothesis of normality.

  • For the original data sets, the selected GoF tests achieved a power value of 1.0 at sample sizes ranging from 25 to over 2,000. This considerable variation in the sample size is mainly attributable to the use of different surrogate distributions (LN, WB, GA and GEV) in the alternative hypothesis during the power analysis. Additional analyses indicated that the power values were not only governed by sample size but also by the effect size. Among all the selected GoF tests, the S-W, A-D and J-B tests demonstrated the highest predictive capacity to detect non-normality and required comparatively smaller sample sizes. Notably, the minimum sample sizes derived for field-based geotechnical data sets were considerably larger than those typically reported for synthetic data sets, reflecting the higher inherent variability and uncertainty associated with real-world soil data.

  • The minimum sample size required to achieve the desired statistical power value of 0.8 at a chosen significance level of 0.05 was determined for each GoF test. Subsequent analyses were conducted to examine the relationship between optimum sample size and the data’s CoV. A clear negative correlation (r = −0.93) was observed, indicating that higher variability reduces the number of sample sizes required to detect deviations from normality. Using the fitted linear log-log regression model, the estimated sample sizes required to detect non-normality reliably were approximately 665-25 for S-W, 750-30 for J-B, 920-30 for A-D, 1400-40 for LL, and 1250-50 for DP, applicable within the CoV range of 10%–100%.

  • Furthermore, the J-B and DP tests, a moment-based method not frequently used in geotechnical data analysis, also exhibited strong performance, followed by the S-W and A-D tests. The LL test exhibited moderate sensitivity and required a slightly higher sample size than moment-based methods. The CSQ and CVM were found to be more effective for larger sample sizes, further reinforcing the importance of sample size considerations in statistical assessments of geotechnical properties.

  • Notably, the GoF used for normality assessment requires only the key statistical parameters such as mean, skewness, kurtosis and CoV. As such, the performance and reliability of these tests are influenced primarily by the sample size and statistical characteristics of the geotechnical data. The findings of this study are therefore applicable not only to the data sets analysed but also to other non-geotechnical data sets with CoV ranging from 10% to 100%.

The authors thank the anonymous reviewers for their valuable comments and suggestions.

Akbas
,
S.O.
and
Kulhawy
,
F.H.
(
2010
), “
Characterization and estimation of geotechnical variability in Ankara clay: a case history
”,
Geotechnical and Geological Engineering
, Vol.
28
No.
5
, pp.
619
-
631
, doi: .
Anderson
,
T.W.
and
Darling
,
D.A.
(
1954
), “
A test of goodness of fit
”,
Journal of the American Statistical Association
, Vol.
49
No.
268
, pp.
765
-
769
, doi: .
Ang
,
A.H.S.
and
Tang
,
W.H.
(
2007
),
Probability Concepts in Engineering: Emphasis on Applications to Civil and Environmental Engineering
,
John Willey and Sons
,
Hoboken, NJ
.
Arnastauskaite
,
J.
,
Ruzgas
,
T.
and
Brazenas
,
M.
(
2021
), “
An exhaustive power comparison of normality tests
”,
Mathematics
, Vol.
9
No.
7
, p.
788
, doi: .
ASTM D1557
(
2015
), “
Standard test methods for laboratory compaction characteristics using modified effort [56,000 ft-lbf/ft3 (2,700 kN-m/m3)]
”,
West Conshohocken, PA
.
ASTM D1883
(
1999
),
Standard Test Methods for CBR (CA Bearing Ratio) of Laboratory Compacted Soils
,
ASTM International
,
West Conshohocken, PA
.
ASTM D2166
(
2016
), “
Standard test method for unconfined compressive strength of cohesive soil
”,
West Conshohocken, PA
.
ASTM D2487
(
2017
), “
Standard practice for classification of soils for engineering purposes (unified soil classification system)
”,
West Conshohocken, PA
.
ASTM D4318
(
2017
), “
Standard test methods for liquid limit, plastic limit, and plasticity index of soils
”,
West Conshohocken, PA
.
ASTM D5084
(
2016
), “
Standard test methods for measurement of hydraulic conductivity of saturated porous materials using a flexible wall permeameter
”,
West Conshohocken, PA
.
ASTM D698
(
2021
), “
Standard test methods for laboratory compaction characteristics using standard effort [12,400 ft-lbf/ft3 (600 kN-m/m3)]
”,
West Conshohocken, PA
.
Baecher
,
G.B.
and
Christian
,
J.T.
(
2005
),
Reliability and Statistics in Geotechnical Engineering
,
Wiley
,
Hoboken, NJ
.
BenSaïda
,
A.
(
2025
a), “
Shapiro-Wilk and Shapiro-Francia normality tests”, MATLAB Central file exchange
”,
available at:
Shapiro-Wilk and Shapiro-Francia normality tests”, MATLAB Central file exchangeLink to the cited article.
BenSaïda
,
A.
(
2025
b), “
Cramer-von mises test”, MATLAB Central file exchange
”,
available at:
Cramer-von mises test”, MATLAB Central file exchangeLink to the cited article.
Benson
,
C.H.
(
1993
), “
Probability distributions for hydraulic conductivity of compacted soil liners
”,
Journal of Geotechnical Engineering
, Vol.
119
No.
3
, pp.
471
-
486
, doi: .
Burati
,
J.L.
,
Weed
,
R.M.
,
Hughes
,
C.S.
and
Hill
,
H.S.
(
2003
), “
Optimal procedures for quality assurance specifications (No. FHWA-RD-02-095
)”,
Turner-Fairbank Highway Research Center
.
Cherubini
,
C.
and
Vessia
,
G.
(
2010
), “
Reliability-based pile design in sandy soils by CPT measurements
”,
Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards
, Vol.
4
No.
1
, pp.
2
-
12
, doi: .
Cohen
,
J.
(
1992
), “
A power primer: quantitative methods in psychology
”,
Psychological Bulletin
, Vol.
112
No.
1
, pp.
155
-
159
.
Corder
,
G.W.
and
Foreman
,
D.I.
(
2014
),
Nonparametric statistics: A step-bystep approach
,
John Wiley & Sons, Hoboken
,
New Jersey
.
Cramer
,
H.
(
1928
), “
On the composition of elementary errors
”,
Scandinavian Actuarial Journal
, Vol.
1928
No.
1
, pp.
13
-
74
.
D’Agostino
,
R.
and
Pearson
,
E.S.
(
1973
), “
Tests for departure from normality. Empirical results for the distributions of b 2 and √b1
”,
Biometrika
, Vol.
60
No.
3
, pp.
613
-
622
, doi: .
D’Agostino
,
R.B.
and
Stephens
,
M.A.
(
1986
),
Goodness-of-Fit Techniques
,
Marcel Dekker
,
New York, NY
.
Elkateb
,
T.
,
Chalaturnyk
,
R.
and
Robertson
,
P.K.
(
2003
), “
An overview of soil heterogeneity: quantification and implications on geotechnical field problems
”,
Canadian Geotechnical Journal
, Vol.
40
No.
1
, pp.
1
-
15
, doi: .
Fenton
,
G.A.
and
Griffiths
,
D.V.
(
2008
),
Risk Assessment in Geotechnical Engineering
,
John Wiley and Sons
,
New York, NY
.
Galeandro
,
A.
,
Doglioni
,
A.
and
Simeone
,
V.
(
2017
), “
Statistical analyses of inherent variability of soil strength and effects on engineering geology design
”,
Bulletin of Engineering Geology and the Environment
, Vol.
76
No.
2
, pp.
587
-
600
, doi: .
Ganesh
,
K.S.S.
and
Kumar
,
T.A.
(
2024
), “
Statistical assessment of liquid limit of pond Ash-Based processed geomaterial
”,
Proceedings of the 2nd International Conference on Geotechnical Issues in Energy, Infrastructure and Disaster Management
.
Patna, India
, pp.
461
-
469
, doi:
Ganesh
,
K.S.S.
and
Kumar
,
T.A.
(
2025
), “
Statistical analysis of pond Ash-Bentonite mixtures: implications for liner and cover material design
”,
Quarterly Journal of Engineering Geology and Hydrogeology
, Vol.
58
No.
2
, doi: .
Ghasemi
,
A.
and
Zahediasl
,
S.
(
2012
), “
Normality tests for statistical analysis: a guide for non-statisticians
”,
International Journal of Endocrinology and Metabolism
, Vol.
10
No No.
2
, pp.
486
-
489
, doi: .
Grubbs
,
F.E.
(
1969
), “
Procedures for detecting outlying observations in samples
”,
Technometrics
, Vol.
11
No.
1
, pp.
1
-
21
, doi: .
Haldar
,
A.
and
Mahadevan
,
S.
(
2000
),
Probability, Reliability and Statistical Methods in Engineering Design
,
Wiley
,
New York, NY
.
Harle
,
S.M.
and
Wankhade
,
R.L.
(
2025
), “
Machine learning techniques for predictive modelling in geotechnical engineering: a succinct review
”,
Discover Civil Engineering
, Vol.
2
No.
1
, pp.
1
-
21
, doi: .
Harr
,
M.E.
(
1987
),
Reliability-Based Design in Civil Engineering
,
McGraw-Hill
,
New York, NY
.
Henderson
,
A.R.
(
2006
), “
Testing experimental data for univariate normality
”,
Clinica Chimica Acta
, Vol.
366
Nos
1-2
, pp.
112
-
129
, doi: .
IRC SP-11
(
2012
),
Handbook of Quality Control for Construction of Roads and Runways
,
Indian Road Congress
,
New Delhi, India
.
Jarque
,
C.M.
and
Bera
,
A.K.
(
1987
), “
A test for normality of observations and regression residuals
”,
International Statistical Review / Revue Internationale de Statistique
, Vol.
55
No.
2
, pp.
163
-
172
, doi: .
Kumar
,
T.A.
,
Saseendran
,
R.
and
Sundaravel
,
V.
(
2023
), “
Engineering characterization of intermediate geomaterials-A review
”,
Geomechanics and Engineering
, Vol.
33
No.
5
, pp.
453
-
462
, doi: .
Lee
,
S.
,
Lee
,
G.
and
Jeon
,
G.
(
2019
), “
Statistical approaches based on deep learning regression for verification of normality of blood pressure estimates
”,
Sensors
, Vol.
19
No.
9
, p.
2137
, doi: .
Li
,
Z.
,
Gong
,
W.
,
Li
,
T.
,
Juang
,
C.H.
,
Chen
,
J.
and
Wang
,
L.
(
2021
), “
Probabilistic back analysis for improved reliability of geotechnical predictions considering parameters uncertainty, model bias, and observation error
”,
Tunnelling and Underground Space Technology
, Vol.
115
, p.
104051
.
Liu
,
H.
,
Su
,
H.
,
Sun
,
L.
and
Dias-da-Costa
,
D.
(
2024
), “
State-of-the-art review on the use of AI-enhanced computational mechanics in geotechnical engineering
”,
Artificial Intelligence Review
, Vol.
57
No.
8
, p.
196
, doi: .
Lilliefors
,
H.W.
(
1967
), “
On the Kolmogorov-Smirnov test for normality with mean and variance unknown
”,
Journal of the American Statistical Association
, Vol.
62
No.
318
, pp.
399
-
402
, doi: .
López-Vázquez
,
C.
and
Hochsztain
,
E.
(
2019
), “
Extended and updated tables for the Friedman rank test
”,
Communications in Statistics - Theory and Methods
, Vol.
48
No.
2
, pp.
268
-
281
.
Montgomery
,
D.C.
and
Runger
,
G.C.
(
2010
),
Applied Statistics and Probability for Engineers
,
John Wiley and Sons
,
New York, NY
.
Murphy
,
K.R.
and
Myors
,
B.
(
2004
),
Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests
,
Lawrence Erlbaum Associates, Inc
.,
Mahwah, NJ
, doi: .
Newell
,
K.M.
and
Hancock
,
P.A.
(
1984
), “
Forgotten moments: a note on skewness and kurtosis as influential factors in inferences extrapolated from response distributions
”,
Journal of Motor Behavior
, Vol.
16
No.
3
, pp.
320
-
335
, doi: .
Nguyen
,
T.S.
,
Ngamcharoen
,
K.
and
Likitlersuang
,
S.
(
2023
), “
Statistical characterisation of the geotechnical properties of Bangkok subsoil
”,
Geotechnical and Geological Engineering
, Vol.
41
No.
3
, pp.
2043
-
2063
, doi: .
Nowak
,
A.S.
and
Collins
,
K.R.
(
2000
), “
Reliability of structures
”,
CRC press
,
London
.
Ott
,
R.L.
and
Longnecker
,
M.
(
2015
),
An Introduction to Statistical Methods and Data Analysis
,
Cengage Learning Inc
,
Boston, USA
.
Öztuna
,
D.
,
Elhan
,
A.H.
and
Tüccar
,
E.
(
2006
), “
Investigation of four different normality tests in terms of type 1 error rate and power under different distributions
”,
Turkish Journal of Medical Sciences
, Vol.
36
No.
3
, pp.
171
-
176
,
available at:
Investigation of four different normality tests in terms of type 1 error rate and power under different distributionsLink to the cited article.
Pearson
,
K.
(
1900
), “
On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonable supposed to have arisen from random sampling
”,
Philosophical Magazine
, Vol.
50
No.
302
, p.
157
.
Phoon
,
K.K.
and
Ching
,
J.
(Eds.) (
2018
),
Risk and Reliability in Geotechnical Engineering
,
CRC Press
,
Boca Raton, USA
.
Phoon
,
K.K.
and
Kulhawy
,
F.H.
(
1999
), “
Characterization of geotechnical variability
”,
Canadian Geotechnical Journal
, Vol.
36
No.
4
, pp.
612
-
624
, doi: .
Phoon
,
K.K.
,
Kulhawy
,
F.H.
and
Grigoriu
,
M.D.
(
1995
), “
Reliability-based design of foundations for transmission line structures
”,
Electric Power Research Institute (EPRI TR-105000), Research Project 1493-04
,
Cornell University
,
New York, NY
.
Phoon
,
K.K.
,
Shuku
,
T.
and
Ching
,
J.
(
2023
),
Uncertainty, Modeling, and Decision Making in Geotechnics
,
CRC Press
,
Boca Raton, USA
.
Razali
,
N.M.
and
Wah
,
Y.B.
(
2010
), “
Power comparisons of some selected normality tests
”,
Proceedings of the Regional Conference on Statistical Sciences
, pp.
126
-
38
.
Ramachandran
,
K.M.
and
Tsokos
,
C.P.
(
2009
),
Mathematical Statistics with Applications in R
,
Elsevier Academic Press
,
USA
.
Saseendran
,
R.
and
Dodagoudar
,
G.R.
(
2020
), “
Reliability analysis of slopes stabilised with piles using response surface method
”,
Geomechanics and Engineering
, Vol.
21
No
6
, pp.
513
-
525
.
Shapiro
,
S.S.
and
Wilk
,
M.B.
(
1965
), “
An analysis of variance test for normality
”,
Biometrika
, Vol.
52
Nos
3-4
, pp.
591
-
611
, doi: .
Sigut
,
J.
,
Piñeiro
,
J.
,
Estévez
,
J.
and
Toledo
,
P.
(
2006
), “
A neural network approach to normality testing
”,
Intelligent Data Analysis
, Vol.
10
No.
6
, pp.
509
-
519
, doi: .
Simić
,
M.
(
2021
), “
Testing for normality with neural networks
”,
Neural Computing and Applications
, Vol.
33
No.
23
, pp.
16279
-
16313
, doi: .
Somani
,
M.
,
Datta
,
M.
,
Ramana
,
G.V.
and
Sreekrishnan
,
T.R.
(
2019
), “
Leachate characteristics of aged soil-like material from MSW dumps: sustainability of landfill mining
”,
Journal of Hazardous, Toxic, and Radioactive Waste
, Vol.
23
No.
4
, p.
4019014
, doi: .
Steele
,
C.M.
(
2003
), “
The power of categorical Goodness-Of-Fit statistics
”, (Ph.D. Dissertation,
Griffith University)
,
Victoria, Australia
, doi: .
Stephens
,
M.A.
(
1974
), “
EDF statistics for goodness of fit and some comparisons
”,
Journal of the American Statistical Association
, Vol.
69
No.
347
, pp.
730
-
737
, doi: .
Smirnov
,
N.V.
(
1936
), “
Sui la distribution de w2 (Criterium de M.R.v. Mises)”, C.R. (Paris
),
202
, pp.
449
-
452
.
Tang
,
X.S.
,
Li
,
D.Q.
,
Cao
,
Z.J.
and
Phoon
,
K.K.
(
2017
), “
Impact of sample size on geotechnical probabilistic model identification
”,
Computers and Geotechnics
, Vol.
87
, pp.
229
-
240
, doi: .
Theocharis
,
A.I.
,
Zevgolis
,
I.E.
,
Roumpos
,
C.
and
Koukouzas
,
N.C.
(
2024
), “
Probability distributions of geotechnical properties for heterogeneous lignite mine spoils
”,
International Journal of Geotechnical Engineering
, Vol.
18
No.
5
, pp.
528
-
536
, doi: .
Thode
,
H.C.
(
2002
),
Testing for Normality
,
Marcel Dekker, Inc
,.,
New York, NY
.
Toraldo
,
C.
,
Modoni
,
G.
,
Ochmański
,
M.
and
Croce
,
P.
(
2018
), “
The characteristic strength of jet-grouted material
”,
Géotechnique
, Vol.
68
No.
3
, pp.
262
-
279
, doi: .
Trujillo-Ortiz
,
A.
(
2025
), “
“DagosPtest”, MATLAB Central file exchange
”,
available at:
Link to the cited article.
Uttley
,
J.
(
2019
), “
Power analysis, sample size, and assessment of statistical assumptions—improving the evidential value of lighting research
”,
LEUKOS
, Vol.
15
Nos
2-3
, pp.
143
-
162
, doi: .
Uzielli
,
M.
,
Lacasse
,
S.
,
Nadim
,
F.
and
Phoon
,
K.K.
(
2007
), “
Soil variability analysis for geotechnical practice
”,
Proceedings of the 2nd International Workshop on Characterisation and Engineering Properties of Natural Soils
,
Taylor and Francis
,
London
, pp.
1653
-
1752
.
Wang
,
Y.
,
Zhao
,
T.
and
Cao
,
Z.
(
2015
), “
Site-specific probability distribution of geotechnical properties
”,
Computers and Geotechnics
, Vol.
70
, pp.
159
-
168
, doi: .
Yazici
,
B.
and
Yolacan
,
S.
(
2007
), “
A comparison of various tests of normality
”,
Journal of Statistical Computation and Simulation
, Vol.
77
No.
2
, pp.
175
-
183
, doi: .
Yap
,
B.W.
and
Sim
,
C.H.
(
2011
), “
Comparisons of various types of normality tests
”,
Journal of Statistical Computation and Simulation
, Vol.
81
No.
12
, pp.
2141
-
2155
, doi: .
Zhai
,
H.
and
Benson
,
C.H.
(
2006
), “
The log-normal distribution for hydraulic conductivity of compacted clays: two or three parameters?
”,
Geotechnical and Geological Engineering
, Vol.
24
No.
5
, pp.
1149
-
1162
, doi: .
Zhao
,
H.F.
,
Zhang
,
L.M.
,
Xu
,
Y.
and
Chang
,
D.S.
(
2013
), “
Variability of geotechnical properties of a fresh landslide soil deposit
”,
Engineering Geology
, Vol.
166
, pp.
1
-
10
, doi: .
Zbiciak
,
A.
,
Volchok
,
D.
,
Kozyra
,
Z.
,
Michalczyk
,
R.
and
Al Garssi
,
N.
(
2025
), “
Fuzzy-Modulus-Based layered elastic analysis of asphalt pavements for enhanced fatigue life prediction
”,
Materials
, Vol.
18
No.
13
, p.
3034
, doi: .
KuaJiang
,
S.H.
,
Huang
,
J.
,
Griffiths
,
D.V.
and
Deng
,
Z.P.
(
2022
), “
Advances in reliability and risk analyses of slopes in spatially variable soils: a state-of-the-art review
”,
Computers and Geotechnics
, Vol.
141
, p.
104498
, doi: .
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence maybe seen at Link to the terms of the CC BY 4.0 licenceLink to the terms of the CC BY 4.0 licence.

or Create an Account

Close Modal
Close Modal