Assessing the performance of the Caregiver Reported Early Development Instruments (CREDI) in rural India

Abstract Although many education and health programs aim to improve early childhood development, it is challenging to assess developmental levels of infants and small children through large household surveys. The Caregiver Reported Early Development Instruments (CREDI) has been proposed as an adaptable, practical, and low‐cost instrument for measuring the developmental status of children under 3 years of age at scale, as it is relatively short and collected by caregiver report. This study employed the CREDI to measure the development of a sample of 994 children ages 22–35 months in rural India and compared the results to those obtained using the Bayley Scales of Infant and Toddler Development (Bayley‐III), a reliable and widely used instrument, albeit one not always suited to large‐scale data collection efforts given its length, cost, and complexity of administration. The CREDI validation exercise showed that caregivers can provide assessments in keeping with the more interactive (hence more time‐consuming and training‐intensive) Bayley‐III instrument. Noteworthy, there was no indication that concordance of the instruments differed by education of the caregiver. This is important as it points to alternate feasible tools to measure child development outcomes through large‐scale surveys.


Introduction
Billions of dollars have been invested in a range of interventions to improve nurturing care. For example, the World Bank and the Inter-American Development Bank have invested US$5 B combined in early childhood development (ECD) since 2000 across every region of the developing world. 1 It is hoped that such investments can close longterm equity gaps in the productivity and earnings of beneficiaries. [2][3][4] Given the potential importance of such programs as well as the amount invested, there is a need for rigorous evaluation, especially to assess the impact of interventions at scale 5,6 and to monitor any resulting progress in developmental outcomes at the population level. 7 These evaluation efforts require instruments to measure ECD that are feasible for use in a wide range of survey conditions, particularly for children under the age of 3 years. 1,7,8 However, many of the frequently used diagnostic instruments, such as the Bayley Scales of Infant Development, 9 while reliable 8,10 and sensitive to differences due to interventions, [11][12][13] are designed for use by clinically trained professionals in specific contextsprimarily high-income and westernized 14 -and are difficult and expensive to adapt to field settings in low-income environments. 15 Moreover, most of these tests are proprietary, involving expensive test kits and administration fees (copyrights), and their administration is long and requires the presence of the child-all of which makes them impractical for use at large scale. 16 doi: 10 There is, thus, a need for tools that assess young children's development that are reliable, valid, adaptable, and feasible for use at scale, both for program evaluation and population monitoring. One such instrument is the Caregiver Reported Early Development Instruments (CREDI), a relatively new, open source instrument for assessing ECD outcomes of children 0-36 months of age in culturally diverse settings. 17 The CREDI has both a short form, designed for large-scale multipurpose surveys and population-level monitoring, as well as a long form intended for research and evaluation. The long form of the CREDI was used in this study. Both forms differ from other instruments often used in that they rely entirely on caregiver report and were specifically designed for administration as part of household surveys in low-resourced areas in a broad array of culturally diverse settings. To date, the CREDI has been piloted in 17 countries. 18 The reliance on caregiver response is clearly an advantage in terms of ease of implementation compared with direct assessment of children, offering more flexibility on the time and place of administration and the tester profile, as well as substantially reducing test training and administration times and requirements. But these gains are only practical if the information obtained is deemed reliable and valid. In the process of piloting the CREDI, researchers have verified that the instrument had the same relation with a latent development construct whether the sample came from a low-income country or a high-income setting and also validated the instrument against other instruments commonly used for measuring child development. 18 For example, an exercise in Brazil ascertained its concurrent validity with a directly administered measure, the Inter-American Development Bank Regional Project on Child Development Indicators (PRIDI). 19,20 Similarly, a study in Tanzania compared the cognitive scale, which also includes language items, to the third edition of the Bayley Scales, the Bayley-III. 15,21 A recent review of 27 tools for early child development that covered at least three domains rated the CREDI highly in regard to validity and reliability. 22 The current study, undertaken in two lowincome, mostly rural, districts in Madhya Pradesh, India, adds additional evidence of the reliability and validity of the CREDI by comparing results from the long form of the CREDI to those observed using the Bayley-III, often used as a standard to which other instruments have been compared. 16 As in previous studies, we ascertain the internal consistency of both tests, as well as the concurrence of the caregiver observations given by the CREDI with the direct observation of the child abilities obtained from the Bayley-III, focusing on cognitive, language, and fine motor development. We not only investigate the covariation of the CREDI and the Bayley-III with adjustments for age, as in previous studies, but also explore correlations after controlling for the common role of socioeconomic covariates. We expect that the same socioeconomic conditions will be associated with both instruments and conjecture that, after controlling for these common socioeconomic factors, both measures will still indicate similar patterns of child development. We further pay attention to whether the relative performance of the CREDI differs by the education of the caregiver and other characteristics that may affect the caregiver response. This is important for understanding the determinants of developmental heterogeneity observed within a sample population. Finally, we assess whether the CREDI conveys additional information on the relationship of cognitive development to nutrition and care indicators not identified with the Bayley-III instrument. Through this analysis, we hope to contribute to establishing the reliability and validity of a more practical-namely, convenient to use-tool.

Study setting
The research was undertaken in the context of the endline survey of a Cluster Randomized Control Trial (CRCT) that evaluated the impacts on child development of the expansion of daycare services provided by the Indian Integrated Child Development Services (ICDS) in the districts of Dhar and Singrauli in Madhya Pradesh. 23 These districts occupy the 32nd and 26th wealth percentiles, respectively, when ranking from poorest to wealthiest, based on the India's National Family Health Survey (NFHS-4), which is representative at the district level. 24 The study design was a repeated cross section of children 18-42 months. A baseline survey was undertaken in September-December 2014 for the purposes of the CRCT and the endline survey was completed between January and February 2018. Balance between the treatment and control groups 59 Ann. N.Y. Acad. Sci. 1492 (2021) 58-72 © 2020 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals LLC on behalf of New York Academy of Sciences was verified using the baseline data, including balance in ECD outcomes, which were assessed using the Ages and Stages Questionnaire (third edition, ASQ-3). 25 Both the baseline and endline samples were designed to be representative of all households in the respective districts with at least one child under 5 years of age, and study size was based on power calculations for the CRCT. As per the CRCT study protocols, the sample included 200 communitylevel clusters divided equally between the treatment and control communities. In each community, 15 children were randomly selected from a listing of all children aged 18-42 months residing within a center's catchment area and, therefore, eligible for the program. As a few communities had fewer than 15 children in the age bracket, the endline survey was completed with 2856 households. The full survey, including the long form of the CREDI, was administered to the primary caregivers of the target children (referred to as index children hereafter) in each household. The Bayley-III was administered to the subset of index children who were 22-35 months of age at endline since the maximum age for which the CREDI is designed is 36 months. All told, the analysis sample for the purpose of comparing the CREDI and the Bayley-III are 994 index children falling within the age range of 22-35 months.
The initial IRB approval by IFPRI's IRB (# 00005121) was amended before endline data collection to accommodate the inclusion of the administration of the CREDI and the Bayley-III (# 00007490). Caregivers provided written agreement to undertake the data collection. The trial, Making Integration the Operative Concept in the Indian Integrated Child Development Services, was registered at the AEA RTC Registry as AEARTCR-0000967. Written consent for trial participation was also obtained from caregivers.

Instruments and training
As indicated, the study compares assessments of child development using the long form of the CREDI (hereafter, referred to simply as the CREDI) with results using the Bayley-III, which is often considered the gold standard or, at least, one of the most reliable instruments for assessing the development of very young children (under 42 months). 8,10 There is a total of 109 questions in the long form arranged in increasing order of difficulty. However, the number of responses for the CREDI depends on the child's age and development; there is a different starting point for various age groups and then the interview continues until there are five consecutive "no" answers. CREDI scores were generated by processing the caregiver responses in the software program R via the credi package following the instructions of the CREDI team that developed the measure. 26 The credi package calculates scores for each subcategory (motor, cognitive, language, and socioemotional) as well as an overall development score. The Bayley-III measures cognition, receptive language, expressive language, fine motor, and gross motor development by direct observation of the child's performance in a series of testeradministered items, arranged in increasing order of difficulty. Basal and ceiling rules determine the number of items to administer to each child.
The Bayley-III uses the Greenspan Social-Emotional Growth Chart for the assessment of socioemotional development, which is collected by caregiver report. 27 However, in the current study, we did not collect the Bayley-III socioemotional scale since a comparison with the CREDI socioemotional domain would only compare two caregiver reports, an undertaking that was not deemed germane to the task at hand. Instead, we preferred to minimize respondent's fatigue and optimize testing time. Similarly, we did not assess gross motor development with the Bayley-III because of time and logistical constraints. The gross motor scale requires the use of certain materials (e.g., standardized steps not provided by the publisher), which were difficult to transport in the relatively remote study setting. Thus, for the purpose of the current analysis, only the cognitive, language (both receptive language and expressive language), and fine motor scales of the Bayley-III were used.
Both the CREDI and Bayley-III instruments were translated into Hindi and back translated. Piloting indicated small differences in the dialects of Hindi in the two districts and minor accommodations were made to reflect these. RehabInsights, a firm with prior experience adapting and implementing the Bayley-III in the Indian context, was responsible for the adaptation of this instrument to local conditions and to train a separate team of dedicated testers ("testers" henceforth). Many pictures in the standard instrument were adapted after pretesting as the originals were not familiar to residents in the The CREDI was administered during household interviews conducted by staff of the Oxford Policy Management Ltd. Thirty-five interviewers were trained for 10 days on the main survey instrument, which included the CREDI. The Bayley-III was conducted at the day care center (the Anganwadi center) of the ICDS, wherever possible. If the index child was not available for the test at the Anganwadi center, testing was done at the child's home (and this was controlled for in the analysis). Twenty percent of tests were undertaken in this manner. The protocol was to conduct both tests on the same day whenever possible. The fact that one test was with the caregiver made this feasible.
Bayley-III testers were provided an extensive 6week training from November 27, 2017 to January 10, 2018. The training period was divided into two parts: 2 weeks of theory training were followed by 1 week of practical training and 3 weeks of field practice. Each tester had a chance to practice on a minimum of 20 children. During these practice sessions, interobserver reliabilities were calculated among the testers and supervisors. While the training was conducted for a total of 18 testers, in the end, 10 were selected based on their performance to continue with data collection.
We seek to assess the association of the CREDI instrument and other aspects of childcare and child outcomes. Thus, additional information collected during the household endline included data on anthropometry for the index child, collected following standard WHO protocols, as well as information on service provision from the ICDS. Caregiver characteristics, including educational attainment, were also collected. Maternal symptoms of depression were measured using the Center for Epidemiologic Studies Short Depression Scale (CESD-10), an adaptation of the 20-question scale in common use in a range of settings, including rural India. 28,29 We categorized mothers with high depressive symptoms if their total score was of 10 or higher (out of a total of 30) on the CESD-10 and with low depressive symptoms, otherwise. 29 In addition, the survey collected data on household assets and a household wealth index was constructed using principal component analysis. 30 Finally, the survey collected information on the Family Care Indicators (FCI), a commonly used measure of the quality of the home environment. 31

Analytic framework
The CREDI instrument is designed for children from birth up to 36 months. However, the CRCT required that the index child be old enough to have spent time in day care. Since many sampled clusters had relatively few children in the 22-35 months age bracket, which is included in CREDI coverage, 24.7% of the index children were between 36 and 42 months.
Given that the performance of the CREDI at the boundary of ages for which it is designed is relevant to program evaluations, where ages are often recorded with error, we included all children (22- In our analysis, we used the R software provided by CREDI to calculate a raw scaled (factor) score for each domain. Raw scores from both instruments were then internally age-standardized using age-conditional means and standard deviations (SDs) following the nonparametric method proposed elsewhere 16 with two modifications: we used age in months (instead of in days) and we did not remove interviewers/testers' effects before standardizing. The R package also provides a "normreferenced standardized score (Z-score)" for each domain. Specifically, the referencing subtracts the average raw score of children in a global reference population of the same age in months from the observed raw score and then divide the difference by the age-specific SD. A Z-score of 0 thus means that the child has exactly the same score on that particular domain as the average same-age child in the CREDI reference population. A score of 61 Ann. N.Y. Acad. Sci. 1492 (2021) 58-72 © 2020 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals LLC on behalf of New York Academy of Sciences "−1" means that the child's raw score is 1 SD below the same-age average of the reference population.
The Bayley-III also provides norm-referenced scores. 32 However, to utilize these for motor development, it is necessary to have observations on both gross and fine motor development. Thus, the majority of the results reported in the main body of this study use results that are referenced by survey age conditional means; for comparison, we report one key table as an appendix using the global norm-referenced scores for cognitive and language domains based on the R software package for the CREDI and the Bayley-III administration manual.
We first investigated the reliability of the measures by exploring the internal consistency of each tests' domain using Cronbach's alpha (α) for the entire sample and separately by maternal education (no education versus some education). We stratified by education as a possible key mediating factor for the CREDI, but not the Bayley-III, is the education level of the respondent. However, any comparison of scores across subgroups, such as educational categories, is conditional on the assumption that groups view the survey items the same; that is, the measurement structure of the latent cognitive domain is "invariant." 33 Thus, we applied a structural equation modeling approach for latent index construction to the CREDI cognitive, language, and motor domains to assess measurement invariance.
We next investigated validity. We assessed the performance and covariation of the developmental scores from the two instruments in a variety of ways. We started by conducting graphical descriptive analysis on the scores from the two instruments, using nonparametric regressions. To account for the possibility that any observed correlation would reflect the expected improvement of unadjusted scores as a child ages, we present graphical analysis with raw scores as well as with age adjusted scores. We then computed the concurrence between scores by domain using Pearson correlations (r) on both raw (unadjusted) and internally age-standardized scores (adjusted).
As such concurrence may be driven by the fact that both indicators are strongly associated with common factors, we subsequently decompose the association in scores to covariation because of observed characteristics and covariation in residuals. To do this, the analysis first investigated the conditional associations in each test between the internally age-standardized aggregate development scores and predictors, such as maternal education and household asset wealth, which have been shown in numerous previous studies to be significantly related to child development. 34 We then controlled for the influences of these predictors by estimating the residuals from the scores regressed on socioeconomic covariates. We investigated the cross-test correlations of these residuals, which provide evidence of the concurrence of the information on child development in each domain of each test net of the linear influence of key socioeconomic factors. As the education level of the respondent might influence the reliability of her assessment, we investigated the stability of the associations between the two scales across different levels of caregiver formal education.
Finally, we related the developmental scores to child height-for-age (HAZ) as well as a measure of home stimulation environment-two outcomes that have been shown in numerous settings to be related to assessments of child development. 6,29 The strength of association of each development score to these outcomes was assessed. Table 1 conveys the mean values for all characteristics of the children and caretakers in the 994 children used in the analysis sample. We report the mean for the raw scores as well as the global norm referenced scores but do not report the locally normed scores as they are centered at 0 by construction. Thirteen percent of the Bayley-III assessments were observed by a study supervisor and only 1% of the assessments involved a child break in which the child had something to drink. Both factors may influence performance in the Bayley-III and consequently are controlled for in subsequent analysis. The mean age of an assessed child is almost 29 months, and 51% of the children are boys. The asset wealth score of the household is assigned to a quartile indicator, and as expected, each quartile indicator contains roughly 25% (ranging from 23% to 27%) of the sample. The educational attainment of the mothers in the study sample ranges from no formal education to post-secondary completion. Given the rural concentration of the sample, the education distribution skews toward relatively low levels of educational attainment with 59% of all mothers reporting primary level education completed or less.  Note: The table includes the CREDI and Bayley-III raw scores. The wealth index is generated from a principal components analysis (PCA) of underlying household assets. The variables "Was the test observed?" and "Were there breaks?" account for whether an interviewer/tester was present during the survey and whether the child took a break during the survey. For the categorical variables "Wealth index" and "Mother's education," we report the proportion for each category.

Results
Only 11% of mothers report secondary attainment or greater. Figure 1 relays the nonparametric local polynomial association between the CREDI and Bayley-III raw scores, separately for the language, cognitive, and motor domains. These were constructed using the lpolyci nonparametric regression command in STATA with the standard kernel default. The Pearson's correlation coefficient in develop-mental scores is estimated to be 0.40 for the language domain, 0.28 for the cognitive domain, and 0.31 for the motor domain. Figure 1 indicates a generally positive association throughout the distribution of CREDI scores, with the possible exception of the upper range of CREDI scores for motor and cognitive development (but not language) where the association with Bayley-III appears to weaken.  associations between the two measures, but now the scores have been age standardized. The slope of the associations of scores between the CREDI and the Bayley-III do not differ by domain. The correlation coefficients between the two scores are still precisely estimated and positive, however, lower in magnitude at 0.21 for the cognitive domain, 0.33 for the language domain, and 0.20 for the motor domain. Table 2 indicates the internal consistency of the scores for each domain for both the CREDI and the Bayley-III using Cronbach's alpha (α). The internal consistency of any domain either approaches or exceeds the conventional cutoff of 0.7, indicating a degree of consistency 35 and is in accordance with that observed in similar studies. 15 Moreover, estimated internal consistency in the CREDI was no different for uneducated compared with educated mothers. This is notable since the CREDI is a caregiver-reported instrument for which interpretation and recall of children's abilities may be influenced by the education level of the respondent. There are also no differences in consistency by the age of the child (results not shown).
As education is an important mediating factor, we assessed metric invariance with respect to the education group of the caregiver and found that only the cognitive domain exhibits metric invariance (P < 0.01). However, the requirements for testing for measurement or metric invariance are difficult to meet, and strict forms of measurement invariance rarely hold. 33 The approach employed tests for invariant factor loadings on the individual question items; thus, the test is sensitive to the number of individual items. Since the CREDI has numerous individual items, we also explored metric invariance with respect to a random subset of 10 individual items in the language and motor domains. When we do so, we observe metric invariance for the motor domain (P < 0.05) but not language. Thus, there is some evidence for measurement invariance in the CREDI, at least the cognitive and motor domains, and that differences in scores across education group represent real differences in the latent construct.
Various socioeconomic factors are widely recognized to affect child development. Table 3 investigates the relationship between development and key socioeconomic covariates for both the CREDI and the Bayley-III. As the scores are standardized by age and normalized, each coefficient expresses the change in SDs of the score associated with a  unit change in the characteristic. For both measures, all developmental domains-cognitive, motor, and language-are positively associated with wealth and caregiver education, associations that are generally statistically significant. For example, standardized scores for the wealthiest quartile in all domains are at least 0.6 SD higher than for the poor, and in the case of Bayley-III language more than 1 SD higher. Children whose mothers had 6-8 years of education-that is, early middle school level-had scores in the neighborhood of 0.2 SD higher than those whose mothers had not gone to school. While the point estimates (and statistical precision) were not larger when the mother had more secondary education, the point estimates were larger for the children whose mothers had schooling beyond secondary. Most germane to the question of CREDI performance vis-à-vis the Bayley-III, coefficients for measures of education and assets are similar between the two instruments, except for gender. In the CREDI cognitive domain, boys scored significantly lower than girls, while the opposite was the case for the Bayley-III.
The results in the first three rows of Table 4 indicate correlations of residuals after regressing locally age-standardized scores on control variables. Corre-lations with residuals using raw scores of the CREDI and the Bayley-III are also included in parenthesis for comparison. The first set include only controls for tester/interviewer as well as an indicator for the few tests that were paused to allow the child to rest. This parallels a similar analysis in Tanzania. 15 The next set of columns relay the correlations of residuals from regressions that sequentially add the caregiver's education and subsequently the household socioeconomic quartile. As expected, the correlations decline as additional adjustments for common determinants are included; adding both education and socioeconomic quartile reduces the correlations by approximately 25%. The correlations with all the adjustments are, nevertheless, significantly different from zero (and positive). The relatively smaller correlations for motor development may reflect the fact that the CREDI motor scale includes both gross and fine motor items while only the Bayley-III fine motor scale was collected.
The next three rows of Table 4 reveal that the correlations of caregiver-reported cognitive, language, and motor development, and the Bayley-III assessments do not differ appreciably by level of education after controlling for child and caregiver  characteristics. This addresses a concern that less educated caregivers might report the activities of their child in a manner that was less in conformity with independent observations than are observations by those with more education. In particular, there is no support to either the view that uneducated caregivers might be less accurate in reporting the skills of their child relative to the Bayley-III or conversely that the most educated caregivers might see skills in their child that are less apparent to the staff administrating the Bayley-III. As indicated further in Table 4, however, mothers who report relatively high numbers of symptoms of depression seem to report skills that are less in accord with the Bayley-III results than those reported by caregivers with fewer symptoms. For example, the correlation of cognitive scores as reported by mothers with low depressive symptoms with the Bayley-III cognitive scores is 19% larger than it is for mothers with more depressive symptoms.
It is possible that greater contact with service providers, as occurred in the treatment group of the CRCT, might make caregivers more aware of development milestones and, thus, improve their ability to assess the development of their child. The next row of Table 4 indicates that there is only a small difference in the association of the results in the two indicators by treatment status in the CRCT. That is, caregivers of participants in the day care program report development of their child that is only slightly more in concordance with the results in the Bayley-III. Finally, the bottom panel of the table shows that there are slight differences in the correlations based on where the Bayley-III test took place. The correlations of residuals are quite similar to those reported in Table 4 when the global population reference groups are used to standardize results instead of when age-standardizing using the survey data (Table S1, online only). Table 5 presents regressions that explore the joint association of both the CREDI and the Bayley-III with two variables widely believed to be strongly associated with a child's level of development, and often used as proxies: early life nutrition and disease exposure, as summarized in the HAZ score, and the home environment, as measured by the FCI. Although no direction of causality is implied by the regressions, they indicate that information in both the CREDI and Bayley-III scores is significantly and, at times, independently associated with these two measures. Interestingly, the FCI is positively related to both the Bayley-III and CREDI language domains (and not other domains), indicated by the fact that both the coefficient of the CREDI and the Bayley-III language domains are significant at P < 0.01 in the regression for FCI. While the conditional association is greater in magnitude for the CREDI than the Bayley-III, the two associations are not statistically significantly different from each other. This suggests that there may be independent elements in the two language domains that identify complementary but distinct aspects of language development. Regarding the HAZ Z-score, it is the CREDI cognitive domain and the Bayley-III gross motor domain that are positively related to HAZ. This suggests that, while the CREDI motor domain adds little over the (more associated) Bayley-III motor domain with regard to predicting HAZ, it is the opposite case with regard to the cognitive domainthe conditional covariation of HAZ loads onto the CREDI cognitive score and not the Bayley-III. We take this as suggestive evidence that the Bayley-III and CREDI index may measure somewhat distinct elements of child development, rather than entirely the same construct.  Note: All coefficients derive from ordinary least squares (OLS) regressions with standard errors clustered at the survey enumeration (village) level. CREDI and Bayley-III scores are locally age standardized. Age is added in the analysis to correct for any residual variance. The wealth index is generated from a PCA of underlying household assets. The variables "Was the test observed?" and "Were there breaks?" account for whether an interviewer/tester was present during the survey and whether the child took a break during the survey. All regressions include interviewer/tester fixed effects. Standard errors in parentheses. * P < 0.10; * * P < 0.05; * * * P < 0.01.

Discussion
Results from this application of the CREDI indicate that the instrument appears to perform well with regard to a recognized standard in rural northern India. This is in line with the evidence from other studies 18 and supports the view that the CREDI is useful within a range of settings, including low income/low education communities. In many respects, the correlations that removed the com-  Note: All coefficients derive from OLS regressions with standard errors clustered at the survey enumeration (village) level. CREDI and Bayley-III scores are locally age standardized. Age is added in the analysis to correct for any residual variance. The wealth index is generated from a PCA of underlying household assets. The variables "Was the test observed?" and "Were there breaks?" account for whether an interviewer/tester was present during the survey and whether the child took a break during the survey. All regressions include interviewer/tester fixed effects. Standard errors in parentheses. * * * P < 0.01.
indicating that both measures reflect the child's environment in a similar but not identical manner.
This conclusion is also supported by the fact that household wealth and caregiver education are associated with age standardized CREDI scores over the three domains of cognition, language, and motor development in accord with interna-tional experience. 18,36 Moreover, the association with the CREDI is quantitatively close to the association of these covariates with the standardized Bayley-III scores. That the two measures show concordance reinforces previous evidence that the CREDI is useful for assessing child development in a community, 18   as to inference from the context of a single culture and restricted age coverage. At the same time, the two instruments have independent associations with nutrition and indicators of the quality of the home care environment implies that they provide complementary information and may be most useful in tandem rather than as substitutes.
The reason for the gender difference in the results of these two instruments, however, is not readily apparent. The CREDI results here are more in keeping with past studies using the instrument 18 than are those for the Bayley-III, which generally show that girls outperform boys in contrast to the results reported here. 37 to a gender related bias on the part of the Bayley-III assessors. Alternatively, or additionally, young boys and girls in this region of India may differ in their reticence to perform for the Bayley-III assessor; such differences in desire to perform have been noted in various settings. 39,40 The current study, however, was not designed to assess the reasons the CREDI differs from the Bayley-III on this one pattern. Such gender differences are, nevertheless, an area worthy of future investigation.
The results here also address one potential drawback of the CREDI, that relatively uneducated caregivers might be less likely to provide information that corresponds to the observations of a trained researcher than would their more educated neighbors. 8 In this study, caregiver (respondent) education does not appear to be a barrier to the validity of information conveyed in self-reported child development assessments. While the difference in the association with the Bayley-III with regard to maternal depression is modest in our study, it may reflect such a barrier, and would need to be further investigated in future studies.
We did not have multiple tests of the same child over time, so the study did not add to the evidence that the CREDI is reliable as defined by the within-subject intertemporal reliability of results. Another limitation of the study is that we are unable to compare the two instruments as indicators of project impact. Although the data were collected as part of a CRCT, neither cognitive nor motor domains were affected by the treatment using the full CREDI sample. 23 Thus, the comparison provides little insight about whether these two measures are equally capable of capturing effects of an intervention. For example, although we see a small difference in the association of CREDI and Bayley-III scores by treatment status, neither of these outcomes are influenced by the intervention; the null effect could not be rejected with a level of significance of P < 0.10. Therefore, the current study does not address whether the CREDI is as useful for program impact evaluation as other established methods. The fact that we are unable to determine whether the CREDI would be subject to social desirability bias when used in an assessment of a program that aimed to provide center-based early childcare hints at an important area for future research.
Another drawback is the restricted age range. We cannot say anything for children younger than 22 months, for which the assessment of child development is more complex and the number of easy to use instruments further limited-an area for future investigation. Additionally, future research can assess the longitudinal predictive reliability of the CREDI relative to other caregiver responses, as well as compared with direct professional assessment. As any instrument for assessing child development will be called upon for these tasks, such research will help determine the role of the CREDI as a component of an ECD toolkit. As mentioned, the goal of this study was to assess caregiver response vis a vis direct child assessment. As such, the socioemotional domain was not covered in this paper. It remains, however, a potentially important dimension of the CREDI instrument.
All told, the results suggest that the CREDI may be a suitable alternative to expert assessment in a variety of low-income contexts and, moreover, may be a useful complement to expert assessments in other studies. Indeed, items in the CREDI have already contributed to the Global Scale for Early Development (GSED), a new effort, currently in development, aimed at generating two globally applicable instruments-that is, internationally standardized and validated instruments-for the assessment of ECD for children under age 3 years at population (short form) and programmatic (long form) levels. Led by the WHO, the GSED group encompasses the harmonization of the CREDI and the instruments and methodologies developed by two other groups: the Young Child Development group and the Global Child Development group. 41 The instrument has also been used in the design of the Early Child Development Index, which is part of UNICEF's Multiple Indicator Cluster Surveys.

Conclusion
That the CREDI appears to perform well with regard to a recognized standard, and at a lower cost than that standard, in disparate contexts, suggests that the CREDI may be a suitable alternative to expert assessment in large-scale surveys in a variety of low-income contexts. It exhibits adequate validity with respect to the more resource-intensive Bayley-III assessment. Given its comparative simplicity, the CREDI can be relatively easily included in environmental studies and as a practical indicator for a range of child-oriented projects. Furthermore, it has potential to serve as a component for a