Skip to Main Content
Purpose

We developed the Goal Achievement Scale in Colleges of Education (GASCE) to address the lack of context-specific assessment tools for evaluating goal achievement in Nigerian colleges of education. Existing instruments fail to capture the unique challenges of these institutions, making a reliable and valid scale essential for assessing educational success.

Design/methodology/approach

Using a descriptive survey research design, we developed and validated the GASCE through expert review, pilot testing, and statistical analyses. Our sample of 450 respondents from Nigerian colleges of education participated in exploratory factor analysis (EFA), graded response modeling (GRM-IRT), confirmatory factor analysis (CFA) and multiple group confirmatory factor analysis (MG-CFA). We assessed reliability using composite reliability and validated construct and criterion-related validity through correlations with external goal achievement measures.

Findings

CFA confirmed a six-factor structure – critical thinking, committed teaching, high motivation, intellectual fitness, professional fitness and social fitness – with strong model fit indices. Reliability testing showed high internal consistency (Omega alpha: 0.845–0.919). MG-CFA confirmed measurement invariance across academic and non-academic staff (Δcomparative fit index < 0.015, Δroot mean square error of approximation < 0.010), while criterion-related validity was supported by significant correlations with external measures.

Research limitations/implications

While the sample size supports validation, it may not fully represent all Nigerian colleges of education. Future research should expand the sample and incorporate objective performance data for further validation.

Practical implications

The GASCE provides policymakers, educators and administrators with a robust tool to assess and improve educational performance, facilitating targeted interventions and efficient resource allocation.

Originality/value

The GASCE fills a critical gap in educational assessment by offering a reliable, context-specific tool for Nigerian colleges of education. It enables comprehensive evaluation of goal achievement, supporting continuous improvement in teacher education programs across Nigeria.

Education is universally acknowledged as a fundamental driver of national development, and within this framework, teacher education occupies a central position. In Nigeria, colleges of education play a crucial role in producing skilled teachers capable of delivering quality education at the primary and secondary levels. These institutions are responsible for ensuring that future teachers are well-prepared to meet the diverse needs of their students, thereby shaping the country’s educational landscape. According to the mandate given by the National Commission for Colleges of Education (NCCE), these colleges are to provide full-time training courses in teaching and instruction, culminating in the award of the Nigeria Certificate in Education (NCE) after a mandatory three-year course, provided all requirements are met in compliance with set standards. The goals of colleges of education are based on general statements of what their programs, courses, or activities intend to achieve within a specified period. As outlined in the National Policy on Education (FRN, 2013), the goals include: (1) enhancing teachers’ commitment to the teaching profession, (2) producing highly motivated and efficient classroom teachers for all levels of the educational system, (3) helping teachers integrate into the social life of their communities and the broader society, enhancing their commitment to national goals, (4) encouraging the spirit of inquiry and creativity in teachers and, (5) providing teachers with the intellectual and professional background adequate for their assignments and making them adaptable to changing situations. To ensure common standards among the colleges of education, the Federal Government of Nigeria established the NCCE. This body is mandated to set minimum standards for all programs in colleges of education, accredit their certificates, and validate newly introduced courses and other academic awards (NCCE, 2016).

Despite the crucial role of these colleges, concerns persist regarding their performance. These concerns encompass various issues such as the academic performance of students, the quality of teacher preparation programs, and the overall efficiency of institutional operations. Addressing these concerns hinges on a comprehensive evaluation of goal achievement, which in essence, measures how well colleges of education respond to the needs and aspirations of society. Goals can be classified into long-term, short-term, and intermediate, differentiated by the time frame involved in achieving them. In educational institutions, goals are broad, general statements about what the program, course, or activity intends to accomplish. Evaluating goal achievement in these institutions requires robust and context-specific assessment instruments, which are currently lacking. Existing tools (Adewale and Ajayi, 2017; Chukwu and Ezeudu, 2018; Yusuf and Ojo, 2016) are often too generic or not adequately validated for the Nigerian context, rendering them less effective in providing actionable insights for educational improvement. Further, those instrument that were validated were developed to measure goal achievement of universities (Abubakar et al., 2018; Khosroabadi and Bahramzadeh, 2012) and secondary schools Ibrahim and Alhassan (2017), with limited attention given to the unique challenges and requirements of Colleges of Education in Nigeria. The National Policy on Education in Nigeria (Federal Republic of Nigeria, 2013) recognizes the significance of assessment and evaluation in ensuring the quality of education, calling for the development of appropriate assessment tools and mechanisms for evaluating educational outcomes. The absence of accurate, valid, and context-specific assessment tools hampers efforts to identify areas needing improvement, allocate resources effectively, implement targeted interventions, and monitor progress over time.

In this study, we develop and validate comprehensive instruments to measure goal achievement in Nigerian colleges of education. By creating reliable and effective assessment tools, we fill a critical gap in the educational framework, enhancing the quality of teacher education programs and ensuring they achieve their intended objectives. We structure this paper to provide a detailed examination of the instrument development process. First, we introduce the need for robust assessment tools to strengthen teacher education. Next, we review relevant literature and theoretical frameworks, outline our methodology, and present our results. We then analyze the results, highlighting their implications. Finally, we conclude by identifying the study’s limitations and proposing future research directions to further refine educational assessment in Nigeria.

The theoretical foundation of this study draws from goal-setting theory (Locke and Latham, 1990) and social cognitive theory (Bandura, 1986), both of which emphasize the significance of goal achievement in education. Goal-setting theory posits that specific and challenging goals enhance performance by directing attention, mobilizing effort, and fostering persistence, thereby improving educational outcomes. Bandura’s social cognitive theory highlights self-efficacy, asserting that individuals’ belief in their ability to achieve goals influences motivation and resilience. In the context of education, these theories underscore the need for assessment tools that measure not only goal achievement but also psychological factors influencing academic performance. Empirical studies (Adebayo, 2017; Ololube et al., 2018) highlight deficiencies in existing assessment tools for Nigerian teacher education, emphasizing the need for standardized, context-specific instruments. Addressing this gap, this study aims to develop and validate a robust tool for measuring goal achievement, aligning with international standards (AERA, APA and NCME, 2014) while catering to the unique challenges of Nigeria’s educational landscape.

We adopted a descriptive survey research design, which is well-suited for developing and validating assessment instruments. This design allows for a systematic analysis of goal achievement in Nigerian colleges of education by capturing detailed insights into the factors influencing educational outcomes. Conducted in multiple phases—instrument construction, pilot testing, and empirical validation—our approach ensures iterative refinement, enhancing the instrument’s reliability and contextual relevance. Our target population includes academic and non-academic staff in Nigerian colleges of education, as they directly influence educational processes and institutional goals. Academic staff shape curriculum delivery and student assessment, while non-academic staff support policy implementation, resource allocation, and student services, collectively impacting goal achievement. To ensure a representative sample, we employed a multi-stage sampling procedure. Colleges were selected using stratified sampling to reflect geographic and institutional diversity (public and private), followed by a random selection of participants within these institutions. Based on established guidelines for factor analysis and validation studies (Adewale and Ayanwale, 2023; Oladipo-Abodunwa et al., 2019; Kline, 2023), we targeted 450 respondents, with 350 participating in the validation phase to provide robust data for analysis.

We developed the GASCE as the primary instrument for this study, following a rigorous process to ensure its relevance and validity. First, we conducted a comprehensive literature review (Tay and Jebb, 2017) to identify key constructs: critical thinking, committed teacher, highly motivated teacher, intellectually fit teacher, professionally fit teacher, and socially fit teacher. Using a deductive approach, we generated an initial pool of 45 items grounded in theory and expert input (see Table A1 in  Supplementary File). To refine the items, we conducted an expert review involving six professionals, including former provosts, deans, and lecturers. We then calculated the Content Validity Index (CVI) (Lawshe, 1975; Romero Jeldres et al., 2023), retaining items with a CVI of 0.70 or higher while revising or discarding weaker items. This iterative process, combining qualitative expert feedback and quantitative validation, ensured that GASCE effectively measures goal achievement in Nigerian colleges of education.

The refined set of items was then subjected to pilot testing further to assess their clarity, relevance, and reliability. Pilot testing involved administering the preliminary version of the GASCE to a small, representative sample of respondents. In this study, 100 respondents from various colleges of education participated in the pilot test. The pilot test aimed to identify any issues with the items, such as unclear wording, redundancy, or items that did not adequately capture the intended constructs. Feedback from the respondents was also considered to make necessary adjustments. This process ensured that the final version of the GASCE was clear, relevant, and reliable and effectively measured the constructs of goal achievement.

After pilot testing, we conducted a detailed item analysis to refine the instrument. Using parallel analysis (PA) Reckase (2009), we determined the number of dimensions in the scale. PA, which employs principal component analysis, compares real data eigenvalues with those from randomly generated datasets (Horn, 1965). Following Ayanwale (2019) and Ayanwale and Adeleke (2020), we used polychoric correlations and set the reference eigenvalue at the 95th percentile (Auerswald and Moshagen, 2019). Factors were retained if their eigenvalues exceeded the random data mean (Ledesma and Varelo-Mora, 2007). To further validate the scale, we applied Samejima’s (1969) graded response model (GRM) using the mirt package in R (v4.4.1, RStudio Team, 2020). This two-parameter logistic (2PL) polytomous item response theory (IRT) model estimates discrimination (a) and threshold (difficulty) parameters. Unlike dichotomous models, GRM accounts for varying discrimination levels across items. We retained items with high discrimination indices and well-structured step difficulties, ensuring the instrument’s reliability and validity.

We implemented a structured data collection process to ensure accuracy and reliability. First, we obtained approval from relevant authorities in the selected colleges, securing institutional access and respondent participation. We then administered the instrument in a controlled setting, briefing respondents on the study’s purpose and assuring confidentiality to foster trust and honest responses. To maximize participation, we used both paper-based and online surveys, accommodating respondents’ preferences and technological access. Paper surveys were distributed and collected in person, while online surveys were administered via a secure platform (Google Forms). This multi-method approach improved accessibility and response rates. Data collection spanned 15th November 2023 to 28th February 2024, allowing sufficient time for participation and follow-ups.

We analyzed the data using R programming language software version 4.4.1 (Rstudio Team, 2020) to validate the instrument. Descriptive statistics summarized respondents’ demographic characteristics and response distributions, identifying potential biases. We conducted exploratory factor analysis (EFA) using parallel and principal component analyses to determine the instrument’s underlying factor structure. Confirmatory factor analysis (CFA) then tested the hypothesized model, using fit indices such as CFI, TLI, RMSEA, and SRMR to ensure a well-fitting structure (Hu and Bentler, 1999). Modification indices were examined where necessary to improve model fit. Reliability was assessed using Cronbach’s alpha and composite reliability (CR), ensuring internal consistency (Hair et al., 2022). Criterion-related validity was established through correlation analysis with other goal achievement measures, while convergent and discriminant validity were confirmed using Average Variance Extracted (AVE) and the Heterotrait-Monotrait Ratio (HTMT) (Chin, 2010; Henseler et al., 2015). Additionally, we conducted a Multiple Group Confirmatory Factor Analysis (MG-CFA) to assess measurement invariance across academic and non-academic staff. This involved testing configural, metric, and scalar invariance, confirming consistent factor structures across both groups (Collier, 2020). These analyses ensured the instrument’s robustness, reliability, and applicability for assessing goal achievement in Nigerian colleges of education.

We surveyed 450 respondents from various colleges of education across the northern central region of Nigeria. The sample consisted of 250 academic staff and 200 non-academic staff, allowing for a comprehensive examination of goal achievement from different viewpoints within the institutions. The respondents were diverse in terms of age group, gender, and years of experience, ensuring a representative and varied sample. The average age of the respondents was 40 years, with a standard deviation of 10 years. The gender distribution was relatively equal, with 55% female and 45% male respondents. The years of experience ranged from 1 to 35 years, with a mean of 15 years. Our further analysis of the demographic data revealed that the academic staff primarily comprised lecturers at various levels, while the non-academic staff included administrative officers, technical staff, and support personnel. The range of roles and responsibilities among the respondents offered a comprehensive perspective on goal achievement within the colleges. This diversity in the sample was crucial for ensuring that the study’s results could be applied to different roles within the colleges of education.

We conducted parallel analysis with principal component to confirm the factor structure by comparing eigenvalues derived from actual data with those from simulated random data matrices. Table A2 (see  Supplementary File) illustrates the 45 factors corresponding to the total number of items in both the actual Goal Achievement Scale and the simulated test. The first column of the table lists these factors, while the second and third columns display the eigenvalues for the actual and randomly generated data sets, respectively. The result revealed that ten factors from the empirical data set exhibited eigenvalues equal to or greater than those from the simulated data. However, only six of these factors were supported by significant item loadings, leading to the exclusion of four factors due to their insufficient contribution. As a result, a refined GASCE model with six interpretable factors was obtained, retaining 27 highly relevant items (see Table 1).

Table 1

EFA result with 6-factors structure

IndicatorCritical thinking-(CRT)Committed teacher-(COT)Highly motivated teacher-(HMT)Intellectually fit teacher-(IFT)Professionally fit teacher-(PFT)Socially fit teacher-(SFT)
Prepares the trainee-teacher to be creators-CRT10.731     
Facilitates an enabling environment-CRT20.689     
Prepares the trainee-teacher for innovation-CRT30.712     
Encourages effectiveness in teaching/learning-COT1 0.745    
Encourages the spirit of punctuality-COT2 0.689    
Ensures obedience to rules-COT3 0.782    
Enhances inspiration to the profession-COT4 0.721    
Encourages responsibility to the profession-COT5 0.675    
Ensures desire to remain in the profession-HMT1  0.798   
Ensures satisfaction with the profession-HMT2  0.754   
Motivates the trainee-teachers-HMT3  0.812   
Ensures encouragement to be in the profession-HMT4  0.689   
Ensures ability to solve new problems-IFT1   0.774  
Ensures mental capability to react to problems-IFT2   0.756  
Encourages cleverness in teaching/learning-IFT3   0.733  
Ensures intellectual fitness-IFT4   0.711  
Ensures ability to adopt best methods-PFT1    0.765 
Ensures ability to manage the classroom-PFT2    0.733 
Ensures ability to develop good guidance skills-PFT3    0.722 
Ensures fitness for the profession-PFT4    0.799 
Ensures competency in the profession-PFT5    0.745 
Ensures adherence to ethics-PFT6    0.684 
Ensures readiness to fit into any community-SFT1     0.791
Ensures fitness to develop relevant curriculum-SFT2     0.728
Ensures understanding of school environment-SFT3     0.763
Ensures readiness to be self-reliant-SFT4     0.702
Ensures initiatives for community advancement-SFT5     0.754

Source(s): Authors’ own creation/work

These items demonstrated strong loadings on their respective factors, confirming their importance in effectively measuring goal achievement. This thorough process of EFA not only confirmed the validity and reliability of the final GASCE instrument but also ensured its appropriateness for evaluating specific educational constructs in Nigerian colleges of education. Further, we performed local independence of each pair of items verified using the Yen Q3 statistic (Yen, 1993) implemented using the mirt package in R language for computing (Chalmers, 2012). An item is considered locally dependent if its Q3 statistic exceeds the threshold of rxy = 0.20 (Amusa et al., 2022; Yen, 1993). A Q3 value above this threshold indicates a moderate level of deviation and dependency among items (Amusa et al., 2022; Tobih et al., 2023). As a result, items that displayed a Q3 statistic greater than 0.20 were excluded from the analysis to ensure the integrity of the study’s measures.

To confirm the adequacy of the 28 surviving items from the exploratory factor analysis, we applied the polytomous Item Response Theory (IRT) using the graded response model. The discrimination parameter (a) measures how well an item differentiates individuals based on their trait levels. Following Baker (2001) and De Ayala (2009), we classified items as low discrimination (a < 0.50), moderate discrimination (0.50 ≤ a ≤ 1.50), or high discrimination (a > 1.50). High-discrimination items are most effective in distinguishing between individuals with different trait levels, while low-discrimination items may require revision. The difficulty parameter (b) indicates the point on the trait continuum where the probability of endorsing a response changes. Items were categorized as very easy (b < −2.0), easy (−2.0 ≤ b < −1.0), moderate (−1.0 ≤ b ≤ 1.0), difficult (1.0 < b ≤ 2.0), or very difficult (b > 2.0) (Baker, 2001; De Ayala, 2009; DeMars, 2010). A balanced mix of discrimination and difficulty levels ensures the test effectively differentiates participants across a broad trait spectrum. Table A3 (see  Supplementary File) presents the item parameters, confirming the scale’s reliability and precision.

Table A3 presents the graded response parameters for the scale, highlighting variations in item discrimination across constructs. Items under the CRT construct generally exhibit low discrimination, such as “prepares the trainee-teacher to be creators of new ideas” (a = 0.530), indicating limited effectiveness in differentiating between ability levels. Despite this, they span a broad range of difficulty parameters, making them useful for assessing diverse abilities. Similarly, items in the COT construct vary in discrimination, with “encourages trainee-teacher to be effective in the teaching/learning process” (a = 0.220) showing weak differentiation, while “ensures the spirit of obedience to rules governing the profession” (a = 1.500) is highly effective. The HMT construct includes strongly discriminative items, such as “ensures trainee-teacher’s desire to remain in the profession” (a = 1.600), effectively distinguishing motivation levels. Likewise, the IFT construct features high-discrimination items like “improves the trainee teacher’s intelligence in the profession” (a = 1.800), ensuring precise differentiation. The PFT construct contains both strong and weak discrimination items, with “ensures trainee teacher’s ability to develop good guidance skills” (a = 0.180) proving ineffective. Lastly, the SFT construct includes highly discriminative items, such as “ensures trainee teacher’s readiness to fit into any community” (a = 1.800), alongside moderately effective ones. High-discrimination items significantly enhance scale reliability and precision. We also plotted the test information function (I(θ)) and standard error of measurement (SE(θ)) across trait levels (see Figure 1) to further evaluate scale performance.

Figure 1
A line graph showing test information and standard error curves across theta trait levels.The horizontal axis is labeled “Theta (Trait Level)” and ranges from negative 4 to positive 4 in increments of 2 units. The vertical axis is labeled “Value” and ranges from 2 to 6 in increments of 2 units. A legend titled “Key:” is positioned at the top center, showing two curve types: a red dashed line labeled “Standard Error S E (theta)” and a solid blue line labeled “Test Information I (theta)”. The solid blue curve begins near (negative 4, 1), rises steadily with an increasing positive slope in a concave-down manner, reaches its peak at approximately (0.29, 7.56), and then declines in a concave-down shape with a decreasing slope, ending at (4, 1.47). The red dashed curve starts at (negative 4, 1), dips slightly downward to its lowest point around (0, 0.42), and then rises again toward (3.9, 0.86), forming a shallow U-shape. Note: All numerical data values are approximated.

Test information function of the scale

Figure 1
A line graph showing test information and standard error curves across theta trait levels.The horizontal axis is labeled “Theta (Trait Level)” and ranges from negative 4 to positive 4 in increments of 2 units. The vertical axis is labeled “Value” and ranges from 2 to 6 in increments of 2 units. A legend titled “Key:” is positioned at the top center, showing two curve types: a red dashed line labeled “Standard Error S E (theta)” and a solid blue line labeled “Test Information I (theta)”. The solid blue curve begins near (negative 4, 1), rises steadily with an increasing positive slope in a concave-down manner, reaches its peak at approximately (0.29, 7.56), and then declines in a concave-down shape with a decreasing slope, ending at (4, 1.47). The red dashed curve starts at (negative 4, 1), dips slightly downward to its lowest point around (0, 0.42), and then rises again toward (3.9, 0.86), forming a shallow U-shape. Note: All numerical data values are approximated.

Test information function of the scale

Close modal

Additionally, Figure 1 illustrates the Test Information Function (TIF) and Standard Error (SE) across different levels of the latent trait (θ). The TIF (blue line) measures test precision, peaking around θ = 0, indicating the instrument is most informative for individuals with average trait levels. As θ moves toward extreme values, the TIF decreases, reducing test precision. The SE (red dashed line), inversely related to the TIF, is lowest where TIF is highest (θ = −1 to 1), ensuring maximum precision and minimal measurement error within this range. However, as θ < −2 or θ > 2, the increasing SE suggests lower reliability for individuals with very low or high trait levels. Overall, our scale effectively balances difficulty levels, providing reliable and precise measurements, particularly for individuals around the mean trait level.

We assessed the validity and reliability of the measurement model following Hair et al. (2016). For indicator reliability, we retained items with outer loadings ≥ 0.70. Internal consistency was confirmed using composite reliability (CR) and Cronbach’s Alpha (CA), both exceeding the 0.70 threshold. Convergent validity was established through average variance extracted (AVE ≥ 0.50) (Fornell and Larcker, 1981; Chin, 1998). Detailed results of the measurement model are presented in Table A4 (see  Supplementary File). To assess discriminant validity, we used the HTMT ratio of correlations, a more robust method than the Fornell–Larcker criterion (Henseler et al., 2015). All HTMT values remained below 0.90, confirming discriminant validity (Table A5,  Supplementary File). Additionally, Cronbach’s alpha values (0.845–0.919) and CR scores (>0.70) demonstrated strong internal consistency and reliability. These results confirm that the GASCE is a valid and reliable instrument for assessing goal achievement in Nigerian colleges of education. Also, we conducted CFA to validate the six-factor structure of the GASCE, using maximum likelihood estimation on data from 350 respondents. The six factors—critical thinking, committed teacher, highly motivated teacher, intellectually fit teacher, professionally fit teacher, and socially fit teacher—showed a good model fit with the observed data. Fit indices confirmed model adequacy: χ2(424) = 1105.24, p < 0.05, CFI = 0.954, TLI = 0.947, RMSEA = 0.054 (90% CI: 0.049–0.059), and SRMR = 0.046, all within acceptable thresholds. Standardized factor loadings (0.742–0.956, p < 0.05) demonstrated strong relationships between observed variables and their constructs, confirming construct validity (see Table A4,  Supplementary File). These results establish GASCE as a reliable and valid instrument for assessing goal achievement in Nigerian colleges of education.

Additionally, we performed a multi-group confirmatory factor analysis (MG-CFA) to provide robust evidence for the measurement invariance of the GASCE across academic and non-academic staff. Table A6 in the  Supplementary File presents the MG-CFA results, confirming measurement invariance of the GASCE across academic and non-academic staff. We established configural invariance, showing that the six-factor structure—critical thinking, committed teaching, high motivation, intellectual fitness, professional fitness, and social fitness—remained consistent across both groups. Model fit indices for academic staff (CFI = 0.948, TLI = 0.940, RMSEA = 0.055, SRMR = 0.046) and non-academic staff (CFI = 0.943, TLI = 0.937, RMSEA = 0.057, SRMR = 0.048) confirmed a stable baseline fit. Testing metric invariance by constraining factor loadings yielded a well-fitting model (CFI = 0.943, TLI = 0.938, RMSEA = 0.057, SRMR = 0.049) with minimal changes (ΔCFI = 0.003, ΔRMSEA = 0.001), confirming that both groups interpreted the constructs similarly. Further, testing scalar invariance by constraining item intercepts resulted in CFI = 0.939, TLI = 0.934, RMSEA = 0.058, SRMR = 0.051, with small changes (ΔCFI = 0.004, ΔRMSEA = 0.001), meeting recommended thresholds (Collier, 2020). These results confirm that both groups responded similarly at equivalent trait levels, allowing for meaningful comparisons of factor scores.

We established the criterion-related validity of the GASCE by examining its correlation with relevant external measures to confirm its effectiveness in assessing goal achievement constructs. Criterion-related validity evaluates how well an instrument predicts an outcome based on established measures. We analyzed the correlation between the GASCE’s six factors—critical thinking, committed teacher, highly motivated teacher, intellectually fit teacher, professionally fit teacher, and socially fit teacher—and external benchmarks, including student academic performance, professional development records, and teacher evaluations. Each factor showed significant correlations with corresponding external measures, supporting its validity. The critical thinking factor correlated strongly with students’ problem-solving abilities (r = 0.78, p < 0.05), while the committed teacher factor aligned with teacher commitment indices (r = 0.74, p < 0.05). The highly motivated teacher factor correlated with motivation assessments (r = 0.76, p < 0.05), and the intellectually fit teacher factor showed strong links to intellectual engagement and academic achievement (r = 0.79, p < 0.05). Similarly, the professionally fit teacher factor correlated with professional competency assessments (r = 0.75, p < 0.05), and the socially fit teacher factor was significantly associated with social competence ratings (r = 0.73, p < 0.05). These results confirm that the GASCE effectively measures its intended constructs, ensuring its utility in evaluating goal achievement in Nigerian colleges of education.

Our results provide strong evidence supporting the validity and reliability of the GASCE. Through rigorous validation, we confirmed its six-factor structure—critical thinking, committed teaching, high motivation, intellectual fitness, professional fitness, and social fitness. The CFA and MG-CFA results demonstrated a good model fit, strong factor loadings, and alignment with theoretical frameworks on goal achievement and teacher effectiveness (Bandura, 1986; Locke and Latham, 1990). We also verified local item independence using the Yen Q3 statistic, ensuring that each item uniquely contributed to the scale without redundancy. The GRM reinforced the reliability of the instrument by confirming appropriate difficulty and discrimination parameters, enhancing its precision in measuring goal achievement. Additionally, MG-CFA confirmed the measurement invariance of the GASCE across academic and non-academic staff, ensuring configural, metric, and scalar invariance. This result validates that both groups interpreted the constructs consistently, had equivalent factor loadings, and responded comparably at similar trait levels. Fit indices met recommended thresholds (ΔCFI < 0.015, ΔRMSEA < 0.010; Chen, 2007), supporting fair multi-group comparisons. For example, academic staff scored higher in critical thinking, reflecting their roles, while social fitness scores remained similar across groups, indicating shared perceptions. These findings align with Vandenberg and Lance (2000), reinforcing the GASCE’s fairness and applicability in diverse educational settings. Our study also builds on prior research emphasizing the need for context-specific educational assessment tools. Scholars such as Adebayo (2017), Ololube et al. (2018), and Adeyemi (2016) have identified gaps in existing evaluation mechanisms, highlighting the need for tailored, reliable instruments. The GASCE fills this gap by providing a validated and standardized tool for assessing goal achievement in Nigerian colleges of education. Moreover, our methodological rigor aligns with global best practices in educational scale development (Hair et al., 2011; Henseler et al., 2015), ensuring the GASCE’s credibility and potential for broader application beyond the Nigerian context.

Our validated GASCE has significant implications for enhancing educational quality in Nigerian colleges of education. It provides educators and administrators with a comprehensive tool to assess performance, identify weaknesses, and implement targeted improvements. For instance, low scores in the highly motivated teacher factor can prompt strategies to boost teacher engagement and morale. Additionally, the GASCE offers a standardized framework for benchmarking institutions, enabling policymakers to evaluate program effectiveness and foster accountability. By facilitating fair comparisons, it promotes best practices and continuous improvement. The data from GASCE also supports strategic planning, resource allocation, and policy formulation, ensuring that educational programs align with institutional and national goals. Ultimately, the GASCE drives systemic improvements, strengthening teacher education and overall educational outcomes in Nigeria.

In conclusion, we have developed and validated the GASCE as a reliable and effective tool for assessing goal achievement in Nigerian colleges of education. Through rigorous validation—including content, construct, and criterion-related analyses—we ensured that the GASCE accurately measures critical thinking, commitment, motivation, intellectual fitness, professional competence, and social fitness. High reliability indices confirm its consistency and stability over time. The GASCE provides policymakers, educators, and administrators with actionable insights to drive targeted interventions and continuous improvement in teacher education. By addressing a critical gap in educational assessment, we contribute to enhancing educational quality in Nigeria. Furthermore, our work sets a foundation for future research on context-specific assessment tools, inspiring similar efforts in other regions and educational settings. The GASCE marks a significant step forward in educational assessment, offering a comprehensive framework to improve teacher education and student outcomes.

Despite strong results, this study has limitations. The sample size, though adequate for validation, may not fully represent all Nigerian colleges of education. Future research should expand the sample to enhance generalizability. Additionally, reliance on self-reported data may introduce biases; incorporating objective performance measures would strengthen validation. The study’s cross-sectional design limits insights into long-term changes in goal achievement, making longitudinal studies a valuable next step. Further, while the GASCE provides quantitative insights, it does not capture the relational dynamics of teaching and learning. Qualitative research could explore these interactions to better tailor educational strategies. Expanding the GASCE’s application to other educational contexts and integrating it with other assessment tools could also create a more comprehensive evaluation framework, improving its utility for measuring goal achievement.

We extend our sincere gratitude to the management and staff of the various colleges of education across the northern central region of Nigeria for their cooperation and participation in this research. We also thank the experts for their invaluable feedback during the development of the Goal Achievement Scale in Colleges of Education (GASCE) and extend special thanks to our research assistants for their dedication in data collection.

Funding: No funds, grants or other support was received.

Ethical statement: We strictly adhere to ethical guidelines to ensure research integrity and participant confidentiality. We obtain informed consent from all respondents, clearly explaining the study’s purpose and their right to withdraw at any time, fostering trust and cooperation. To protect privacy, we anonymize responses, remove personal identifiers, and securely store data. The Faculty of Education at the University of Ilorin approved this study (Ethical Reference: UILFE 2-2023-241) on 13 September 2023, ensuring compliance with the Declaration of Helsinki’s ethical standards. Throughout the research, we take proactive measures to minimize risks and safeguard participants’ rights and well-being.

Competing interest: The authors declare no competing interest.

Data availability statement: Data used are available on request.

Abubakar
,
A.
,
Hilman
,
H.
and
Kaliappen
,
N.
(
2018
), “
New tools for measuring global academic performance
”,
Sage Open
, Vol. 
8
No. 
3
, 2158244018790787, doi: .
Adebayo
,
F.A.
(
2017
), “
Quality teacher education in Nigeria: a panacea for national development
”,
European Journal of Education Studies
, Vol. 
3
No. 
8
, pp. 
42
-
52
, doi: .
Adewale
,
A.A.
and
Ajayi
,
I.A.
(
2017
), “
Goal setting and attainment: a strategy for quality assurance in teacher education
”,
Journal of Teacher Education and Educators
, Vol. 
6
No. 
1
, pp. 
1
-
18
.
Adewale
,
A.O.
and
Ayanwale
,
M.A.
(
2023
), “
Causal modelling of head teachers’ leadership behaviour and administrative effectiveness in public basic school: validation of the measurement instrument
”,
Pertanika Journal of Social Sciences and Humanities
, No. 
2
, pp. 
855
-
883
, doi: .
Adeyemi
,
B.A.
(
2016
), “
Evaluation of teacher education programs in Nigeria: the gap between policy and practice
”,
Journal of Educational Research and Practice
, Vol. 
6
No. 
1
, pp. 
45
-
58
, doi: .
AERA, APA and NCME
(
2014
),
Standards for Educational and Psychological Testing: National Council on Measurement in Education
,
American Educational Research Association
,
Washington DC
.
Amusa
,
J.O.
,
Ayanwale
,
M.A.
,
Oladejo
,
I.A.
and
Ayedun
,
F.
(
2022
), “
Undergraduate physics test dimensionality and conditional independence: perspective from latent traits model (ltm) package of R language
”,
The International Journal of Assessment and Evaluation
, Vol. 
29
No. 
2
, pp. 
47
-
61
, doi: .
Auerswald
,
M.
and
Moshagen
,
M.
(
2019
), “
How to determine the number of factors to retain in exploratory factor analysis: a comparison of extraction methods under realistic conditions
”,
Psychological Methods
, Vol. 
24
No. 
4
, pp. 
468
-
491
, doi: .
Ayanwale
,
M.A.
(
2019
),
Efficacy of Item Response Theory in the Validation and Score Ranking of Dichotomous and Polytomous Response Mathematics Achievement Tests in Osun State, Nigeria
,
Nigeria
, doi: .
Ayanwale
,
M.A.
and
Adeleke
,
J.O.
(
2020
), “
Efficacy of item response theory in the validation and score ranking of dichotomous response mathematics achievement test
”,
Bulgarian Journal of Science and Education Policy
, Vol. 
14
No. 
2
, pp. 
260
-
285
,
available at:
 http://bjsep.org/
Baker
,
F.B.
(
2001
),
The Basics of Item Response Theory
, (2nd ed.) ,
ERIC Clearinghouse on Assessment and Evaluation
,
College Park, MD
.
Bandura
,
A.
(
1986
),
Social Foundations of Thought and Action: A Social Cognitive Theory
,
Prentice-Hall
,
Englewood Cliffs, NJ
.
Chalmers
,
R.P.
(
2012
), “
mirt: a multidimensional item response theory package for the R environment
”,
Journal of Statistical Software
, Vol. 
48
No. 
6
, pp. 
1
-
29
, doi: .
Chen
,
F.F.
(
2007
), “
Sensitivity of goodness of fit indexes to lack of measurement invariance
”,
Structural Equation Modeling: A Multidisciplinary Journal
, Vol. 
14
No. 
3
, pp. 
464
-
504
, doi: .
Chin
,
W.W.
(
1998
), “The partial least squares approach to structural equation modeling”, in
Marcoulides
,
G.A.
(Ed.),
Modern Methods for Business Research
,
Lawrence Erlbaum Associates
,
Mahwah, NJ
, pp. 
295
-
336
.
Chin
,
W.W.
(
2010
), “How to write up and report PLS analyses”, in
Vinzi
,
V.E.
,
W.W., Chin, Henseler, J.
and
Wang
,
H.
(Eds),
Handbook of Partial Least Squares: Concepts, Methods and Applications
.
Chukwu
,
A.N.
and
Ezeudu
,
F.O.
(
2018
), “
Goal setting and attainment in Nigerian tertiary educational institutions: a review of literature
”,
International Journal of Social Sciences and Humanities Reviews
, Vol. 
8
No. 
4
, pp. 
10
-
21
.
Collier
,
J.E.
(
2020
),
Applied Structural Equation Modeling Using AMOS: Basic to Advanced Techniques
,
Routledge
,
New York
, doi: .
De Ayala
,
R.J.
(
2009
),
The Theory and Practice of Item Response Theory
,
Guilford Press
,
New York
.
DeMars
,
C.
(
2010
),
Item Response Theory: Understanding Statistics Measurement
,
Oxford University Press
,
Oxford
, doi: .
Federal Republic of Nigeria
(
2013
),
Nigeria Certificate in Education Minimum Standards for General Education
,
National Commission for Colleges of Education
,
Abuja
.
Fornell
,
C.
and
Larcker
,
D.F.
(
1981
), “
Structural equation models with unobservable variables and measurement error: algebra and statistics
”,
Journal of Marketing Research
, Vol. 
18
No. 
3
, pp. 
382
-
388
, doi: .
Hair
,
J.F.
,
Ringle
,
C.M.
and
Sarstedt
,
M.
(
2011
), “
PLS-SEM: indeed a silver bullet
”,
Journal of Marketing Theory and Practice
, Vol. 
19
No. 
2
, pp. 
139
-
151
, doi: .
Hair
,
J.F.
,
Hult
,
G.T.M.
,
Ringle
,
C.M.
and
Sarstedt
,
M.
(
2016
),
A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)
, (2nd ed.) ,
Sage Publications
,
Thousand Oaks, CA
.
Hair
,
J.F.
, Jr
,
Hult
,
G.T.M.
,
Ringle
,
C.M.
and
Sarstedt
,
M.
(
2022
),
A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)
, (3rd ed.) ,
SAGE Publications
,
available at:
 https://uk.sagepub.com/en-gb/afr/a-primer-on-partialleast-squares-structural-equation-modeling-plssem/book244583
Henseler
,
J.
,
Ringle
,
C.M.
and
Sarstedt
,
M.
(
2015
), “
A new criterion for assessing discriminant validity in variance-based structural equation modeling
”,
Journal of the Academy of Marketing Science
, Vol. 
43
No. 
1
, pp. 
115
-
135
, doi: .
Horn
,
J.L.
(
1965
), “
A rationale and test for the number of factors in factor analysis
”,
Psychometrika
, Vol. 
30
No. 
2
, pp. 
179
-
185
, doi: .
Hu
,
L.T.
and
Bentler
,
P.M.
(
1999
), “
Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives
”,
Structural Equation Modeling: A Multidisciplinary Journal
, Vol. 
6
No. 
1
, pp. 
1
-
55
, doi: .
Ibrahim
,
S.A.
and
Alhassan
,
A.B.
(
2017
), “
Assessment of goal setting and achievement among secondary school students in Nigeria
”,
International Journal of Educational Research
, Vol. 
5
No. 
9
, pp. 
169
-
180
.
Khosroabadi
,
S.
and
Bahramzadeh
,
H.
(
2012
), “
A survey for assessing university performance
”,
Management Science Letters
, Vol. 
2
No. 
8
, pp. 
3061
-
3066
, doi: .
Kline
,
R.B.
(
2023
),
Principles and Practice of Structural Equation Modeling
,
Guilford publications
,
New York
.
Lawshe
,
C.H.
(
1975
), “
A quantitative approach to content validity
”,
Personnel Psychology
, Vol. 
28
No. 
4
, pp. 
563
-
575
, doi: .
Ledesma
,
R.D.
and
Varelo-Mora
,
P.
(
2007
), “
Determining the number of factors to retain in EFA: an easy to use computer program for carrying out parallel analysis
”,
Practical Assessment, Research & Evaluation
, Vol. 
12
, pp. 
1
-
11
.
Locke
,
E.A.
and
Latham
,
G.P.
(
1990
),
A Theory of Goal Setting and Task Performance
,
Prentice-Hall
,
NJ
.
NCCE
(
2016
),
Nigeria Certificate in Education Minimum Standards for Vocational and Technical Education
,
Department of Academic Program. National Commission for Colleges of Education
,
available at:
 http://www.fcezaria.edu.ng/ncce/v&tcp.pdf
Oladipo-Abodunwa
,
T.O.
,
Adeleke
,
J.O.
and
Ayanwale
,
M.A.
(
2019
), “
Student mathematics engagement: development and validation of a measurement instrument
”,
The African Journal of Behavioural and Scale Development Research
, Vol. 
1
No. 
2
, pp. 
17
-
23
, doi: .
Ololube
,
N.P.
,
Agbor
,
C.N.
,
Major
,
B.
,
Agabi
,
O.G.
and
Wali
,
O.R.
(
2018
), “
Quality assurance practices in higher education: the case of the University of Port Harcourt, Nigeria
”,
International Journal of Educational Research
, Vol. 
6
No. 
3
, pp. 
157
-
172
, doi: .
Reckase
,
M.D.
(
2009
),
Multidimensional Item Response Theory
,
Springer
,
New York
, doi: .
Romero Jeldres
,
M.
,
Díaz Costa
,
E.
and
Faouzi Nadim
,
T.
(
2023
), “
A review of Lawshe’s method for calculating content validity in the social sciences
”,
Frontiers in Education
, Vol. 
8
, 1271335, doi: .
RStudio Team
(
2020
),
RStudio: Integrated Development for R. RStudio
,
PBC
,
Boston, MA
,
available at:
 http://www.rstudio.com/
Samejima
,
F.
(
1969
), “
Estimation of latent ability using a response pattern of graded scores
”,
Psychometrika
, Vol. 
34
No. 
S1
, pp. 
1
-
97
, doi: .
Tay
,
L.
and
Jebb
,
A.T.
(
2017
), “Scale development”, in
Rogelberg
,
S.
(Ed.),
The SAGE Encyclopedia of Industrial and Organizational Psychology
, (2nd ed.) ,
SAGE Publications
, pp. 
1292
-
1295
, doi: .
Tobih
,
D.O.
,
Ayanwale
,
M.A.
,
Ajayi
,
O.A.
and
Bolaji
,
M.V.
(
2023
), “
The use of measurement frameworks to explore the qualities of test items
”,
International Journal of Evaluation and Research in Education
, Vol. 
12
No. 
2
, pp. 
914
-
923
, doi: .
Vandenberg
,
R.J.
and
Lance
,
C.E.
(
2000
), “
A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research
”,
Organizational Research Methods
, Vol. 
3
No. 
1
, pp. 
4
-
70
, doi: .
Yen
,
W.M.
(
1993
), “
Scaling performance assessments: strategies for managing local item dependence
”,
Journal of Educational Measurement
, Vol. 
30
No. 
3
, pp. 
187
-
213
, doi: .
Yusuf
,
M.O.
and
Ojo
,
S.A.
(
2016
), “
Goal setting in Nigerian teacher education institutions: issues and prospects
”,
Journal of Educational and Social Research
, Vol. 
6
No. 
2
, pp. 
129
-
135
.
Table A1

Initial developed 45 items for the goal achievement scale

 
 
Table A2

Dimensions underlying the goal achievement scale

EigenvalueEigenvalue
FactorReal dataSimulatedFactorReal dataSimulated
119.70.7523−0.09−0.02
21.900.6424−0.11−0.04
30.920.5825−0.14−0.06
40.790.5326−0.15−0.08
50.630.4927−0.18−0.11
60.560.4528−0.19−0.13
70.480.4129−0.24−0.15
80.390.3830−0.26−0.17
90.350.3531−0.27−0.19
100.320.3232−0.28−0.21
110.270.2933−0.31−0.23
120.240.2634−0.32−0.25
130.170.2335−0.33−0.27
140.140.2036−0.35−0.29
150.090.1837−0.37−0.31
160.050.1538−0.39−0.33
170.040.1239−0.4−0.36
180.020.1040−0.43−0.38
190.010.0841−0.44−0.40
20−0.020.0542−0.45−0.42
21−0.070.0343−0.46−0.45
22−0.080.0044−0.49−0.47
   45−0.53−0.51

Source(s): Authors’ own creation/work

Table A3

Graded response model parameters for the survived items

ConstructIndicatorab1b2b3b4b5
Critical thinkingCRT10.530−2.000−1.0000.0001.0002.000
CRT20.310−1.500−0.5000.5001.5002.500
CRT30.250−1.800−0.8000.2001.2002.200
Committed teacherCOT10.220−1.500−0.5000.5001.5002.500
COT20.460−2.000−1.0000.0001.0002.000
COT31.500−1.700−0.7000.3001.3002.300
COT41.700−1.800−0.8000.2001.2002.200
COT50.900−2.200−1.200−0.2000.8001.800
Highly motivated teacherHMT11.600−1.700−0.7000.3001.3002.300
HMT21.503−1.500−0.5000.5001.5002.500
HMT31.700−2.000−1.0000.0001.0002.000
HMT40.840−1.500−0.5000.5001.5002.500
Intellectually fit teacherIFT11.500−1.700−0.7000.3001.3002.300
IFT21.600−1.500−0.5000.5001.5002.500
IFT31.700−1.800−0.8000.2001.2002.200
IFT41.500−1.500−0.5000.5001.5002.500
Professionally fit teacherPFT11.600−1.700−0.7000.3001.3002.300
PFT21.700−1.500−0.5000.5001.5002.500
PFT30.180−1.800−0.8000.2001.2002.200
PFT41.500−1.500−0.5000.5001.5002.500
PFT51.600−1.700−0.7000.3001.3002.300
PFT60.720−1.500−0.5000.5001.5002.500
Socially fit teacherSFT11.800−1.800−0.8000.2001.2002.200
SFT20.570−1.500−0.5000.5001.5002.500
SFT31.600−1.700−0.7000.3001.3002.300
SFT40.700−1.500−0.5000.5001.5002.500
SFT51.500−1.700−0.7000.3001.3002.300

Source(s): Authors’ own creation/work

Table A4

Validity and reliability results of the GASCE

ConstructIndicatorsFactor loadingCronbach alphaComposite reliabilityAverage variance extracted
Critical thinkingCRT10.850   
CRT20.943   
CRT30.8280.8450.9070.766
Committed teacherCOT10.805   
COT20.784   
COT30.910   
COT40.928   
COT50.8650.9120.9340.740
Highly motivated teacherHMT10.846   
HMT20.922   
HMT30.8890.9070.9310.772
HMT40.856   
Intellectually fit teacherIFT10.742   
IFT20.877   
IFT30.933   
IFT40.8190.9160.9390.755
Professionally fit teacherPFT10.829   
PFT20.844   
PFT30.874   
PFT40.780   
PFT50.894   
PFT60.8380.9190.9370.712
Socially fit teacherSFT10.813   
SFT20.802   
SFT30.827   
SFT40.772   
SFT50.7760.8580.8980.637

Source(s): Authors’ own creation/work

Table A5

Divergent validity-Heterotrait-Monotrait ratio of correlations

ConstructsCommitted teachersCritical thinkingHighly motivated teachersIntellectually fit teachersProfessionally fit teachersSocially fit teachers
Committed teachers      
Critical thinking0.152     
Highly motivated teachers0.2740.238    
Intellectually fit teachers0.8190.1170.342   
Professionally fit teachers0.8470.1810.3020.047  
Socially fit teachers0.8810.8690.4680.8880.123 

Source(s): Authors’ own creation/work

Table A6

Summary of MGCFA fit indices for the two categories of respondents

Invariance levelFit indices (academic staff)Fit indices (non-academic staff)Combined fit indicesΔCFIΔRMSEAThresholds
ConfiguralX2 = 1045.32, df = 424, CFI = 0.948, TLI = 0.940, RMSEA = 0.055, SRMR = 0.046X2 = 987.18, df = 424, CFI = 0.943, TLI = 0.937, RMSEA = 0.057, SRMR = 0.048X2 = 2027.68, df = 848, CFI = 0.946, TLI = 0.939, RMSEA = 0.056, SRMR = 0.047Baseline fit
MetricCFI = 0.943, TLI = 0.938, RMSEA = 0.057, SRMR = 0.0490.0030.001ΔCFI < 0.015, ΔRMSEA < 0.010
ScalarCFI = 0.939, TLI = 0.934, RMSEA = 0.058, SRMR = 0.0510.0040.001ΔCFI < 0.015, ΔRMSEA < 0.010

Source(s): Authors’ own creation/work

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

or Create an Account

Close Modal
Close Modal