Confirmatory Factor Analysis of the School-Based Assessment Evaluation Scale Among Teachers

The school-based assessment (SBA) system is a holistic assessment system that is conducted in schools by subject teachers in assessing the students cognitive (intellect), affective (emotional and spiritual) and psychomotor (physical) aspects. It is in line with the National Philosophy of Education and the Standards-based School Curriculum in Malaysia. In order to evaluate the implementation of SBA, a measurement scale was validated. Questionnaire was used as an instrument for data collection. 776 primary and secondary school teachers were selected as respondents using stratified random sampling. The data was analyzed with SPSS and AMOS version 18. The aim of this paper was to explore different factor structures of the SBA evaluation scale by using the second-order Confirmatory Factor Analysis. Results indicated that the SBA evaluation model was a valid and reliable scale. The input measurement model was validated with two factors (personnel qualifications and physical infrastructure), process measurement model was validated with six factors (‘attitude’, ‘understanding’, ‘skills’, ‘challenges’, ‘moderation’ and ‘monitoring’) and product measurement model was validated with two factors (‘students’ attitude’ and ‘students’ motivation’). This study provides support for using a valid instrument in evaluating the implementation of SBA in schools. Furthermore, the CFA procedures used supported the conceptual framework set out earlier. Thus, it presents clearly the importance of the evaluation process of any education system to follow all the dimensions outlined in the evaluation model proposed by Daniel Stufflebeam.


Introduction
According to the aspiration of the Malaysia National Education Philosophy, education in Malaysia is supposed to be an on-going effort towards further developing the potentials of individuals in a holistic and integrated manner so that well-balanced individuals can be produced.In order to put on such an effort, reform has to be formulated in our education system.Recently, the assessment system in Malaysian education has been reviewed and the inference is that previously, the assessment was only focusing on summative type where public examinations were implemented to all students in year six, nine and eleven (Ong, 2010).In the recent years, formative assessment has been introduced to certain subjects at certain level of schooling in all government schools.
A traditional concept of assessment has been found to be less effective in improving students' learning.This is because it focuses more on public examination which has rendered the students to become examination-oriented (Wiliam, 2001) and the assessment only evaluates the students purely on their academic achievement based on knowledge and skills in a very time-limited situation (Fan, 2011).It is also seen as negatively affecting students' emotion and confidence levels (Stiggins, 2005) hence, it produced more passive students and teachers (Mercurio, 2008).
Similarly, the Malaysian public examination is a method that orientates the public to focus on the examination (Cheah, 2010).As such, a new system of assessment which is capable in determining the full potential of students and improving students' learning is greatly needed.This is why formative assessment is becoming more and more popular these days.In general, related to SBA, there are two main forms of assessment which are formative SBA and summative SBA.Formative SBA is an assessment to promote students' learning and it is school-based (Lembaga Peperiksaan Malaysia, 2011).It is conducted in line with the teaching and learning process using various methods of gathering information such as worksheet, observation, quiz, check list, assessment report, homework or test.On the other hand, a summative SBA is an assessment which is also school-based providing a record of a student's overall achievement at the end of the month, semester or year using monthly or semestered testing (Harlen, 2004;Lembaga Peperiksaan Malaysia, 2011).SBA, which is now being practiced in Malaysia includes both types of assessment; formative and summative (Lembaga Peperiksaan, 2010).Furthermore, SBA which focuses more on formative assessment rather than summative has been conducted in countries like Australia, New Zealand, Hong Kong, Finland, United Kingdom, USA, Canada, Africa, Sweden, Scandinavia and Singapore (Assessment Support Material, 2001).Australia has implemented SBA in the late 1960s (Mercurio, 2008) while Finland and Sweden had it in the early 1970s (Darling-Hammond and McCloskey, 2008).Malaysia has taken an astonishing decision when SBA has been formally implemented in all the government schools since 2011, with the Year one students becoming the first batch of students to undergo the SBA enactment.
The need to have a valid and reliable measurement model to evaluate the implementation of this assessment system is becoming increasingly important.Hence, the instrument used in order to assess the teachers' perception about a particular concept need to be evaluated first before administering.This is to make sure that the questionnaire used is valid and reliable, or in other words, it is measuring what it is supposed to measure, and that the extent to which test scores are free of measurement error (Muijs, 2011).Validity and reliability of the questionnaire are the most important things to consider when dealing with measurement (Barroon and Abd Rahman, 2015).And, the relationship between validity and reliability is that, any test can be reliable without being valid but it cannot be valid if it is not reliable (Jackson, 2003).There are various types of reliability but in this study, three types of reliability are considered which are internal reliability, construct reliability (CR) and average variance extracted (AVE), where as in validity aspect, there are convergent validity, construct validity and discriminant validity.Internal reliability is a concept referring to the degree to which all of the items are measuring the same underlying construct (Pallant, 2007) whereas construct reliability is a concept to assess the extent to which a measuring instrument accurately measures a theoretical construct that it was designed to measure (Jackson, 2003).Construct validity is the extent to which a set of items actually reflect the theoretical latent construct those items are designed to measure (Hair et al., 2006) whereas discriminant validity is a concept where individual measured items should represent only one latent construct.
When a questionnaire is valid and reliable, a researcher will have confidence in the results obtained using those questionnaires during data collection.Hence, the purpose of this study is to develop an instrument to evaluate teachers' perception on the factors concerning SBA implementation by exploring the different factor structures of the evaluation scale by using the second-order Confirmatory Factor Analysis.

Research Methodology
Questionnaire are distributed by using postal mail and by-hand to the primary and secondary schools in ten major districts in Kelantan, a state in the north-east of Peninsular Malaysia.Teachers are selected as respondents because they are the most involved and the most concerned with the system compared to other parties.A total of 776 usable questionnaires were obtained for analysis.This sample size has met the suggested recommendation by Kline (2005) as he suggested that a sample size of more than 200 participants is enough to run SEM analysis.Similarly, 500 participants are regarded as a minimum sample size required for a study involving more than seven latent constructs with some constructs that have less than three items (Hair et al., 2010).The issues of unidimensionality, reliability and validity for all measurement models are determined.Unidimensionality is achieved when the factor loading of items for the respective latent construct is 0.5 or more (Zainuddin, 2012).Three types of reliability are considered, they are internal reliability, construct reliability (CR) and average variance extracted (AVE), whereas in validity aspect, there are also three categories of validity determined namely convergent validity, construct validity and discriminant validity are determined.The requirements are shown in Table 1.In this study, AMOS version 18 and SPSS version 21 are used to facilitate the result analysis.AMOS software is used in assessing the relationship between latent and observed variables of a measurement model.The technique used is called a confirmatory factor analysis.It is a theory-driven technique which determines the goodness-of-fit between the model and the sample data (Byrne, 2010).This type of analysis is preferable when the researcher has had some knowledge about the latent structure.In this study, maximum likelihood estimation method is used in generating parameter estimates of the measurement models.This estimation method is more practical due to its ability to deal with complex models and also its robustness to non-normality data (Brown, 2006).There are a few fit indices used in this study to discern how well the specified model reproduces the covariance matrix among the indicator items (Hair et al., 2006).They are grouped under three main groups of measures; practical fit measures (chi-square statistics or X 2 /df), absolute fit indices (GFI, AGFI or RMSEA) and incremental fit indices (TLI or CFI).According to Hair et al. (2010), a study should report at least three fit indices with at least one from each category.In addition, the accepted values listed in Table 2 have to be fulfilled if we were to gain a good or perfect fit model.

Research Finding and Discussion
Nearly two-third (74.7 percent) of the participants are females and one-third (24.6 percent) are males.The majority (93.6 percent) of them are Malays.Nearly half of them have had 10 to 20 years of teaching experience.Overall, most of them have experienced practising SBA in the range of 0 to 3 years.

Input Evaluation
The Input evaluation as a 2nd-order measurement model is proposed to measure personnel, resources and procedures in achieving SBA objectives (Stufflebeam, 1971a).Three factors are involved, known specifically as material and personal needs (mat), appropriateness of qualification (appr) and suitability of physical infrastructure and ICT (suit).These factors are measured by three items, two items and three items respectively as shown in Figure 1 (initial model).A total of eight items are used to measure input evaluation.The model yields a Chi-square (X 2 ) statistic of 157 with 756 on 17 degrees of freedom.The model was over-identified but with hierarchical model, the higher-order structure would be just-identified.To resolve just-identification issue, equality constraints are placed on particular parameters to yield a more accurate estimate.Reviewing the goodness of fit statistics, it shows that X 2 /df=9.280;GFI=0.952;AGFI=0.898;NFI=0.928;CFI=0.935;TLI=0.892 and RMSEA=0.103.This measurement model provides a poor fit and thus, modification such as deleting a construct or items is later conducted to gain a better fit.It follows with a determination of modification indices values to correlate the measurement error between items.According to Arbuckle and Wothke (1999), these have to be done by considering a theoretical or common sense to avoid producing an absurd parameter estimate.For final measurement model (Figure 1), four items are left to measure input evaluation.List of remaining items are as listed in Table 3.These remaining items (a17, a18, a19 and a20) have factor loadings ranging from 0.53 to 0.92 indicating the meaning of the factors that have been preserved.Reviewing the benefit of fit statistics, this final measurement model indicates a very good fit (as in Table 6).Finally, the issues of unidimensionality, validity and reliability have been addressed and are shown in Table 5.

Process Dimension
The process evaluation as a 2nd-order measurement model is proposed to measure the process implemented in achieving the objectives of the programme (Stufflebeam, 1971a).There are twelve major constructs proposed-belief, feeling, readiness, understanding, skills, in-house training, administration, moderation, monitoring, challenges, role and the importance of SBA with a total of fifty-two items.When this measurement model is run, the result shows that it does not fit the implementation process.Therefore, the principal component analysis (PCA) and confirmatory factor analysis (CFA) technique are conducted.The models have also been modified based on theory.Finally, four measurement models are produced-process1 with three 2 nd -order constructs which identified as attitude, understanding and courses (skills), process2 with two 2 nd -order constructs which were moderation and monitoring, process3 with two 2 nd -order constructs identified as role and importance (crucial) of SBA and last of all, is the 1 st -order construct which is challenges.Model modification has been applied to get the most fitted models.The final measurement models for process dimensions are shown in Figure 2 and the issues of uni-dimensionality, validity and reliability are addressed in Table 5.

Product Dimension
The product evaluation as a 2nd-order measurement model is proposed to measure the program outcomes.Three factors are taken into consideration and they are students' attitude towards SBA ('att'), students' knowledge in SBA ('know') and students' motivation towards learning ('mot').These factors are measured by three items, two items and three items respectively.A total of eight items are used altogether.The model yields a Chi-square (X 2 ) statistic of 138 with 876 on 17 degrees of freedom.Reviewing the goodness of fit statistics, it shows that X 2 /df=8.169;GFI=0.960;AGFI=0.915;NFI=0.971;CFI=0.974;TLI=0.958 and RMSEA=0.096.This measurement model presents a poor fit, hence model modification such as deleting a construct or items was conducted to achieve a better fit.This follows with a determination of modification indices values to correlate the measurement error between items.As the consequence, there are five items left as shown in table 4.  For final product measurement model (Figure 3), five items remain to measure product evaluation.The remaining items (e32i, e32ii, e34i, e34ii and e34iii) have quite a high factor loading ranging from 0.84 to 0.95 indicating the meaning of the factors has been preserved.Reviewing the goodness of fit statistics, it shows that the measurement model indicates a very good fit (as shown in Table 6).Finally, the issues of unidimensionality, validity and reliability are addressed in table 5. Table 6 shows the characteristics of the six final measurement models.In general, the fitness indices values are identified to be the most well-fitting; all unstandardised estimates are statistically significant given critical values more than 1.96; all standard errors are in good order; all standardized estimates are above the moderate strength and the multivariate kurtosis value has improved and has achieved the required level.All multivariate kurtosis values is less than 50.0 indicated a multivariate normality distribution of data set.However, there is a high correlation value between process3 and product (r=0.939)and also between process3 and process1 (r=0.923).It displays a multi-collinearity phenomenon, so process3 model has been deleted.Literatures have been reviewed to look for gaps in the existing SBA implementation research.Most evaluation processes look only at some dimensions which do not give a fully rounded indication of the effectiveness of the system such as looking at teachers' attitude towards SBA (Majid, 2011), teachers' leadership (Boon and Shaharuddin, 2011), teachers' knowledge and best practises in SBA (Juliana, 2008) and several more.To date, studies that combine all the four dimensions of evaluation are non-existent.Therefore, in this study, the psychometric properties of an instrument was developed and measured.Selecting a validated instrument is easy but to get an instrument which suits the study objectives and the study context is quite difficult.In this case, it is to develop and validate an instrument to measure teachers' perception towards SBA implementation in the Malaysian context.Finding a validated instrument for this purpose is not easy as so many factors need to be considered in this context.Evidence has shown that the final model on the evaluation of SBA consisted of five factors (input, process 1, process 2, challenges and product evaluation).
The model is hierarchical, so there are first and second-order factors involved.Input comprises of two first-order factors (personnel's qualification and physical infrastructure and ICT), process1 consists of three 1 st -order factors (attitude, understanding and skills), process2 constitutes of two 1 st -order factors (moderation and monitoring process), challenges consists of six strongly loading items and product is made up of two 1 st -order factors (students' attitude and students' motivational towards learning).As all the values of fitness indices are the most well-fitting, all unstandardized estimates are statistically significant, all standard errors are in good order and all standard estimates are above the moderate strength, this result implies good reliability and validity of the instrument.Hence, the questionnaire is suitable to assess the perception of school teachers on the SBA system implementation in schools in the Malaysian context.

Conclusion
The psychometric properties of a new extended SBA evaluation scale for assessing teachers' perception of the SBA system are presented.The instrument was developed after reviewing relevant literatures and consulting experts' in measurement and evaluation.The findings demonstrate that the instrument has adequate psychometric properties (valid and reliable) and is fit to be used for the main study as it was tested with quite a large sample size and has been analyzed using CFA.Furthermore, CFA procedures used in this study supported the conceptual framework set out earlier.Thus, it presents clearly the importance of evaluation of any system to follow all the dimensions outlined in the evaluation model by Stufflebeam.Hence, the findings of this present study have expanded the existing body of knowledge on the development of a measurement scale to evaluate the SBA system implementation in schools.
Nevertheless, this study has several limitations.First, the samples were taken only from teachers and not from other stakeholders, and therefore the development and validation of instruments might be limited.Furthermore, data comes only from the perceptions of the teachers without observing their real practices.Secondly, items included in the survey have been deleted during CFA procedure.Deletion of the items is needed to make sure that the models would be fit and yet considering the hypothesized models are acceptable but there might be other variables which are more influential than those we have chosen.In addition, there might also be other models which may fit the data that we have not tested.Finally, the sample of this study has only been collected at the government schools in one of the states in the north-east of Malaysia.Although the education system might not have much difference between each country, the cultural difference might limit the generalization of findings to other states.The model reported here might be useful in the Asian context especially in those countries that are becoming more and more interested in making assessment system further aligned to classroom learning, providing effective feedback and validly describing students' learning.This current instrument could be such a great value for them.
On the contrary, for those countries which are still examination-oriented in their assessment system, it is expected that there will be a higher disagreement and discrepancies among the teachers in accepting SBA as the teachers might see this new way assessment as relevant but not to the extent to improving students' attitude, knowledge and motivation towards learning.The results of this survey indicates that the knowledge of students of SBA are not consistent with the official Malaysian government policy concerning the objectives of the National Education Assessment System on improving students' learning.Certainly, based on this survey, we would expect practitioners or teachers to be exposed more vigorously to some form of professional development, so that they are equipped with enough skills especially on the use of feedback.If the students do not understand the function of feedback in improving their learning, it would be difficult to achieve the desired objectives in the SBA implementation.

Figure 1 :
Figure 1: Measurement Model for Input Evaluation (Initial Model [Left] and Final Model [Right])

Figure 2 :
Figure 2: Four Measurement Models for Process Evaluation (Final Model)

Figure 3 :
Figure 3: Measurement Model for Product Evaluation (Final Model)

Table 1 :
Requirement for the Reliability and Validity of the Measurement Model

Table 3 :
Input Evaluation Items and their descriptions

Table 4 :
Product Evaluation Items and Their Descriptions Confirmatory Factor Analysis of the School-Based Assessment Evaluation Scale Among Teachers Nor Hasnida Che Md Ghazali Ta'dib: Journal of Islamic Education ▪ Volume 21, Number 1, June 2016 P-ISSN: 1410 -6973; E-ISSN: 2443 -2512 Available online at http://jurnal.radenfatah.ac.id/index.php/tadibStudents'motivational towards learning: SBA encourages students to read more books than before Students are becoming more interested in my subject than before SBA helps students to understand more on their strengths

Table 6 :
Final Characteristics of the Measurement Models