Implementing aeioTU: quality improvement alongside an efficacy study—learning while growing

Effectiveness trials of increasing childhood development interventions across low‐ and middle‐income countries have shown significant variability. The strength and consistency of benefits for children are dependent on program quality, and this requires paying attention to program implementation. In this paper, we summarize findings on program quality and teacher practices and perceptions for the aeioTU program, a center‐based Reggio‐inspired program in Colombia, now serving more than 13,000 children. The research found engaged, committed staff who valued the emergent approach and understood the children as requiring opportunities to express themselves, being the source for the curriculum, and having relationships with the materials around them. Although the average classroom quality was low in 2011, it increased significantly by 2014, particularly in the language and reasoning and interactions items. Indicator‐level analyses showed that higher‐order interactions and language processes were observed in a large proportion of classrooms by 2014. Teachers' self‐reports on the environment and their teaching and learning showed high levels of quality by 2013. These findings illustrate the significance of process data for program improvement, especially when a program is young. Program quality can be raised after teachers improve their skills, have experience enacting a curriculum, and after training has been strengthened in response to information, while simultaneously scaling up the program.


Introduction
A growing body of research has shown the importance of early interventions to ameliorate risks, reduce disparities, and improve child developmental trajectories. 1,2 Effectiveness trials across lowand middle-income countries (LMICs) have shown variations in effects across children's various developmental dimensions. 3,4 The strength, consistency, and persistency of the benefits children (and families) receive from the variety of early interventions that have been tried in the early years are strongly dependent on quality. 2,4 While quality may mean different things in different contexts, 5 quality requires that attention is paid to the coherence of a program, its content, its participants, trainings, monitoring, and evaluation (see Ref. 6), among other things. Understanding the quality of a pro-gram also requires understanding its implementation, e.g., any course corrections done early on, whether it was evaluated, how data were used, and what changes were made, for scalability and replication (see Ref. 7). Our study discussed in this paper is unique in that it reports a process evaluation in a program as it was growing to scale, with the goal of informing the scale-up in a continuous improvement cycle.
Colombia advanced a comprehensive strategy on early childhood development (ECD) in 2011. 6,8,9 There is a national strategy known as De Cero a Siempre (DCAS) aimed at increasing access and improving the quality of early childhood services provided to poor children. Its objective is to deliver high-quality integrated early childhood services for 1.2 million in poverty under age 6. 10 This strategy includes partnerships with the private sector for doi: 10.1111/nyas.13662 direct provision, as well as for advocacy. One such partnership is with the aeioTU program, an initiative from Fundación Carulla started in 2009 that provides center-based care services for children 0-5 years of age, as well as professional development and technical assistance in ECD. It partners with the state to provide and scale early childhood services throughout the country, with the mission of helping vulnerable children across the country to fulfill their developmental potential.
Early childhood initiatives such as aeioTU have grown across LMICs. 4 While the previous decade had extensive experiences supporting early childhood through initiatives centered on nutrition or cash transfers, 3 the last decade has evidenced an increase in studies on home-visitation and child care initiatives. 4 A meta-analysis of interventions outside the United States and Canada found that interventions with a mix of education and nutrition had larger effects on cognition than cash transfers or nutritional interventions alone. 3 This suggests that both inadequate nutrition and inadequate cognitive stimulation contribute to poor cognitive development, conceivably in synergy. Among the studies included, the Turkish Early Enrichment program offered childhood enrichment and maternal education and found positive effects on general ability (0.54 standard deviations, SD) and aggression (0.30 SD). 11 The Kingston study, which also combined supplementation and stimulation, found strong cognitive effects (0.18-0.62 SD) at age 7. 12 Engle et al. 13 reviewed research on parenting education and center-based early interventions in low-income countries and found significant effects on children's cognitive and socio-emotional development across programs. They concluded that center-based early interventions improve children's cognitive performance and school readiness. They found larger effects for disadvantaged children and children in higher quality programs. Britto et al. 4 reviewed the evidence on the effects of early education programs in developing countries and concluded that, regardless of the type of program, quality is a key predictor of its effectiveness, in particular, factors such as positive interactions, individualized attention, and positive emotional climate. In Colombia, a recent study of center-based care found no or negative impacts on vocabulary and social development from low-quality centerbased programs. 14 Similarly, Attanasio et al. 15 reported positive effects of a quality enhancement of a parenting program targeting poor parents of children younger than age 2. The quality improvement included the use of a structured early stimulation curriculum, along with pre-service and in-service training of paraprofessional home visitors. Stimulation and parenting interventions for children under 5 in low-income countries have been recently reviewed by Baker-Henningham and Lopez-Boo, 16 who conclude that there were generally positive program effects on cognitive, socio-emotional, and nutritional development.
AeioTU's effectiveness was investigated between 2010 and 2016, and its results are still in the process of being published. The study measured children's anthropometry, language, cognitive, motor, and socio-emotional development, executive function, and parenting. Children were identified through a census in the catchment area for two centers opening in northern Colombia. Each aeioTU center in the study had capacity for about 320 children ages 0-5. Eligible children were underserved and poor. Families were randomized using computer generated random lists to assign them to either treatment or an ordered waiting list, stratified by age and gender. Children were first assessed in late 2010, prior to random assignment and to the centers starting operation, assessed again about 8 months' posttreatment, and then yearly after that.
Nores et al. 17 found that 8 months into treatment, the program had already produced positive effects on infants and toddlers of 0.16 SD (P = 0.006) for language, 0.08 SD (P = 0.086) for cognitive development, and P = 0.048 for overall development. Effects were observed only for girls. The effects for girls reached up to 0.30 SD in language. No intervention effects were observed for nutritional outcomes or all other dimensions of socio-emotional development and home environment. These findings make understanding the implementation of this program relevant, as no other program has shown comparable effects in such a short period of time. For example, effects of effective interventions in Colombia are 0.20-0.25 SD only after at least a year or more of intervention. [14][15][16]19 Our research and evaluation assessed processes with a twofold purpose: (1) to understand the quality of the program as implemented and (2) to support a process of continuous improvement in the program. We understood we were evaluating the program "before it was proud" as defined by Campbell (1987, as cited in Ref. 20) and therefore saw our role as one that would not only measure program impact, but also program quality and inform program improvement as aeioTU expanded country-wide. Below, we provide an account of the results of the evaluation in terms of program quality and the practices, perceptions, and knowledge of teachers, as well as the measures for program improvement that were taken in response to the findings as part of such a continuous improvement cycle process (Fig. 1). 21 As is argued in Nores and Fernandez, 6 continuous improvement is a central piece for scaling up any program and for strengthening quality. Our discussion below focuses on program quality and teacher's self-efficacy, which are central to effectiveness. 4,22,23 Researchers often make a distinction between process and structural quality. Structural quality refers to the resources and elements in the space and environment that facilitate processes for learning, activities, and others, such as class size, child-adult ratio, teacher qualifications, and physical environment. 8 Process quality is associated with the quality and types of teacher-child interaction and child-child interactions, as well as processes related to the use of time, enactment of a curriculum, integration of curriculum, and scaffolding of children's learning, among others. 8 Measures of process quality are critical aspects that promote child development. 24,25 Use of observational Western measures of quality in non-Western countries is not without debate; 26 however, the expanding use of these measures is one source to understand quality, despite their limitations. 27,28 Context Colombia has 65% of its 4.3 million children between age 0 and 5 living in socioeconomically disadvantaged conditions, based on the national census and Colombia's proxy means test for social programming, known as SISBEN. 29 Therefore, it has made a strong effort to increase early childhood care and education services, which currently reach about 40% of children. 30 12 months, and in language these are about 1 standard deviation by the age of 5 relative to high-income children. 32 In 2011, the government launched its DCAS national intersectoral early childhood strategy with the goal of providing quality comprehensive services for 1.2 million children. As part of this strategy, center-based care grew from serving about 125,000 children in 2011 to about 370,000 children in 2014. This initiative became institutionalized into a national law in 2016 (see Nores and Fernandez 8 ).

The aeioTU program
AeioTU has been part of the DCAS early childhood national strategy from the very beginning, partnering with the national government to serve children through direct service provision. Starting its operations in 2009, it now serves over 13,300 children across Colombia. In 2010, we initiated an impact study in two centers opening in the northern coastal region serving poor children (as measured by Colombia's SISBEN). Aligned with this impact study, process measures were included to understand implementation and support aeioTU as it scaled up.
AeioTU is a Reggio Emilia "inspired" educational program, as Reggio Emilia does not define itself as a curriculum but rather a philosophy. 33 The aeioTU program started by providing center-based care to low-income children ages 0-5; later, it began to provide home-visiting services to infants and toddlers, and center-based care to children ages 2-5 across most of its centers. AeioTU developed its own curriculum based on learning through strategies such as play, exploration, and projects, and which builds upon child-initiated activities and balances these with teacher-directed activities. A day in an aeioTU center sees a child moving between fully outfitted classrooms and experientially-based classrooms; the latter may focus on areas such as art, music, natural sciences, and light and shade, among others.
The Reggio Emilia approach encompasses a set of philosophical and pegagogical assumptions, methods of school organization, and principles of environmental design to foster children's intellectual development. 33 It is built upon a constructivist framework inspired by John Dewey, Jean Piaget, Lev Vygotsky, and Jerome Bruner. 34 This model sees children as co-constructing knowledge through their interactions with each other, with adults, and with the environment. 35 The curriculum, therefore, is emergent in nature, as it is based on the interests of the children and the goal of co-construction. A recent evaluation of Reggio Emilia in three cities in Italy showed little or no effects on children and adolescents in relation to other center-based care, and positive program effects in relation to not having attended center-based care. 36 Children in aeioTU receive 70% of their nutritional intake needs in morning and afternoon snacks and a lunch. The program is year-round (11 months, about 213 days) and serves children 9 h per day. All the centers are supported by a central aeioTU office in charge of operations and financing. Educational requirements have been higher, generally, than those required by Colombia's national guidelines (technical or professional degree), and aeioTU provides preservice (about 120 h) and in-service training (about 130 h per year), which is quite uncommon in other center-based services in the country. 12,37 Training and development is planned year-round, mostly at the center, and lead by the center's director, with central office support and using materials developed by the central office in conjunction with input from staff across various aeioTU centers. This is complemented with regional training. Training combines theory-based, didactic, individual and collective, in-person and online series of supports. Training is aligned with onsite individualized feedback and in-classroom observations. Staffing is done by the central office. Each center included (at the beginning of our study) teachers and assistant teachers in ratios of 8:2 for infants, 12:2 for toddlers, and 24:2 for preschoolers; kitchen personnel (one person per 50 children); cleaning personnel (one person per 50 children); one security person per center; one administrative assistant to the center director; the center director (or Pedagogista); one nutritionist; and a resident artist (shared between two centers). AeioTU's requirements for teachers' qualifications were a technical degree (2 years of tertiary education) or undergraduate degree (5 years of tertiary education with a professional degree). However, the partnership with the Instituto Nacional the Bienestar Familiar (National Family Welfare Agency, ICBF), in charge of early childhood services in Colombia, threatened this requirement, as their guidelines provided an exception to hire madres comunitarias, women with lower qualifications who provided home-based care under the national program (the hogares comunitarios). Screening for hiring included personality tests, background checks, and health checks. Wages for teachers varied between US$240 and $630 per month before benefits, depending on location, experience, and degree at the time our evaluation started; these were aligned with national guidelines. Meals in the centers were also included as part of the work. Child:teacher ratios are now fixed at 25:1.5 for children ages 3-5, a change due to costs over time.
As part of internal implementation practices, aeioTU included several instruments to monitor children and support teachers. These included daily reports on children's experiences, biweekly reports to families about the projects children engaged in, quarterly reports of children's progress, a collection of children's productions, a planning tool for teachers, observation tools to assesses how children progress, and center documentation of children's projects and processes.

Methods
Early in 2011 and in mid-2014, we conducted observations of classroom quality. Understanding the limitations of these measures, these were complemented with teacher surveys on their teaching practices, perceptions and knowledge, and their qualifications in 2012. The observations included classrooms in the effectiveness study, as well as other classrooms in aeioTU centers in other cities. Teacher surveys were only done with teachers in the two centers in the effectiveness study. The participation on teacher surveys was optional.
The education and experience survey was developed by the research team and was administered to collect information on the characteristics of the teachers in 2012. A total of 39 questions were asked ranging from teachers' sociodemographic data to stress, job satisfaction, and beliefs about teaching young children. In parallel, we included the Patient Health Questionnaire-9 (PHQ-9) depression screener 38 to detect depression in the teachers, which has been shown to be associated with program quality. 39 The ECERS-R (Early Childhood Environmental Rating Scale Revised) 40 was administered in May 2011 and March-April 2014, in 3-hour visits to each classroom. ECERS-R is an observation scale through which routines, environment, and interactions that occurred in a preschool classroom are observed and coded. Quality of education and care has been found to be positively related to infant and child cognitive and social development, [40][41][42] as well as to social-emotional skills. 43,44 The ECERS-R provides a global measure of preschool classroom quality, with 43 items that cover a broad range of quality considerations from safety to teacher-child interaction to parent involvement. The total ECERS-R score is an average of the scores on the 43 items. A rating of 1 indicates inadequate quality, 3 indicates minimal quality, 5 indicates good quality, and 7 indicates excellent quality. This measure has been used extensively in the field and has well-established validity and reliability. It has been used in a wide range of countries with different cultures and economic contexts, including countries in the region similar to Colombia, for example Ecuador, Peru, Bolivia, Chile, and Brazil. 24,45,46 We used the official Spanish translation of the ECERS-R. The ECERS-R is based on seven subscales: Space and Furnishings, Personal Care Routines, Language-Reasoning, Activities, Interaction, Program Structure, and Parents and Staff. Most of the items examine the quality of what children actually experience in the program. Assessors were trained by a group of expert observers from the National Institute for Early Education Research at Rutgers University. Observers were shadow scored in practice assessments until they reached 80% or higher reliability with trainers.
The Teacher Survey of Early Education Quality (TSEEQ), 47 introduced in September 2012, is a self-report survey for ECD teachers on classroom practices. Teachers are asked to reflect on several aspects of the curriculum and classroom practices, including: literacy, math, science, physical education, art curriculum, curriculum in general, instruction, assessment, physical environment, interaction and emotional climate, leadership and supervision, and family involvement. There are 105 questions on the survey mostly on a 5-point Likert Scale, with a few exceptions that include questions on either a 3-point Likert Scale response or a yes/no response. A modified version was used with teachers in infant classrooms.
Below, we report descriptive analyses for the surveys; total scores and subscales for the ECERS-R and the Kolmogorov-Smirnov test for equality comparing distributions across years; and compare Environment Rating Scales (ERS) scores to other programs in the region.
In addition, we also report indicator-level descriptive analyses for the language and reasoning and the interactions subscales, more closely related to children's learning and development, including their socio-emotional development. 48,49 The ERS scores are composed of subscales, which in turn are composed of items, which in turn are composed of indicators. Indicators are arranged into four progressive levels including "inadequate", "minimal", "good", and "excellent". Observations entail a binary system of coding each of these indicators as either "occurring" or "not", to arrive at a final score. This means that scores can be derived from the presence of various combinations of indicators met across different items; and so two classrooms could receive the same total score regardless of which specific indicators are met. If a higher order interaction or process is present but not co-occurring with another indicator under the same scoring level, the ERS score does not consider that level as attained, and instead a lower score is provided for that item. For example, item 16 "Encouraging children to communicate" is composed of nine indicators. Indicators 3.1, 3.2, and 3.3 (level 3) are required to co-occur for a "3" to be considered attained, and if only 50% of the indicators are observed the score is a "2" instead. No attention is paid to which specific indicators are met, only the number of them. If less than 50% is observed, then the score is a "1" and no indicator observed in this level or above is considered in the total score, even if observed. Indicator-level analyses look at the percentage of classrooms meeting the different indicators in the scale at any level. This type of indicator analyses allows capturing which higher order indicators are met and further promotes better guidance on what specifically needs to improve.

Teacher sociodemographic characteristics
As part of our 2012 data collection, we included rich details about the sociodemographic characteristics of teachers, including education, qualifications, and working conditions. The majority of the teachers described themselves as "Mestizo." Mestizo refers to a mixture of white and afrocolombian, which is the majority of self-reported ethnicity in the country. Also, at least 70% having specialized in ECD (N = 24; see Supplementary Table S1 online).
Most of the respondents to the survey were classroom or area teachers. Additionally, most of the teachers (63.6%) reported working 41-50 hours per week, and 20.5% reported working more hours than scheduled in a week. Although not shown in the tables, we computed the average annual gross salary of US$4,778 (converted at the average exchange rate of September 2012); and 100% reported having health insurance. The largest percent of teachers (72.7%) were working with large class sizes (21-25 children), but most (79.5%) reported working with one additional teacher in the classroom. This would follow the ratios established for infants, versus preschoolers versus area teachers (this information is reported in more detail in Supplementary  Table S2 online).
Most teachers reported being satisfied with their salaries (72.7%), responsibilities (81.8% "do not agree with their job being difficult"), and working environment among colleagues (100.0% agreed "that personnel frequently share ideas among each other"). Similarly, most or all of the teachers reported feeling committed to the early childhood field (95.4%) and content with their career choice (i.e., 0.0% agreed with "feeling stuck in early childhood"). Generally, teachers enjoyed their work (97.7%), found their work relevant (79.6%), and appreciated the training and education opportunities offered to them (89.5%). These questions were asked in a 5-point Likert-type question that ranged between "strongly disagree" and "strongly agree" (summarized in Supplementary  Table S3 online).
Central to the implementation of the Reggio philosophy is understanding and working with an emergent curriculum. The research team asked teachers their beliefs about an emergent curriculum. While 29.5% of teachers agreed that implementing the emerging curriculum is interesting but difficult, 61.3% of teachers disagreed with this statement. The biggest challenge reported by teachers refers to the requirement of documenting individual and daily work (59.2%). Other challenges, such as follow-up, time management, training, and alternative curricula, were not reported to be of concern. In general, 93.2% of teachers reported feeling that the emergent curriculum was interesting and valuable and worth the effort and time required to implement. Furthermore, teachers disagreed with "having a structured curriculum" (77.2%) or a more  directive one (63.7%). Overall, teachers showed commitment with and understanding of the importance of the curriculum they were implementing. These questions were asked in a 5-point Likerttype question that ranged between "strongly disagree" and "strongly agree", and are aggregated in Table 1.
The use of an emergent curriculum depends on teachers' mastery and understanding of child development. Table 1 also reports how teachers agreed or disagreed with statements about child development and learning. While teachers did not necessarily agree with children having "some control over their own learning process" (75%), they answered favorably to questions about children "having the opportunity to express themselves limitlessly", with the ideas of the children being the source for the curriculum, and with children having a relationship with the materials around them and exploring these, all concepts embedded in the aeioTU model. These questions were asked in a 5-point Likert-type question that ranged between "strongly disagree" and "strongly agree" and were aggregated into three levels.
The final section of the survey asked teachers questions about stress and depression. 38,50 These factors are important as they can affect teacher demeanors and approach to working with young children. Some teachers reported feeling the symptoms of stress and depression. The PHQ-9 Scores though showed that the majority of the teachers (84.1%) did not report signs of depression.

Classroom observations
As mentioned above, the ECERS-R is an observation and rating instrument for preschool classrooms serving children aged 3-5; a a rating of 1 indicates inadequate quality, 3 indicates minimal quality, 5 indicates good quality, and 7 indicates excellent quality. The ECERS-R was completed on a total of 17 classrooms in 2011 and 30 classrooms in 2014. Cronbach ␣s for the scale were 0.93 in both years. Table 2 reports the mean, range, and 90th percentile a Infant and toddler classrooms were also observed using the Infant and Toddler ERS measure (ITERS), but the sample was N = 3 in 2011, and they were not observed in 2014; therefore, they are excluded from this report.  for the ECERS-R and each of its subscales. In 2011, the program averaged a quality of 2.3, and in 2014 2.9. The language and reasoning and interaction activities were among the highest scoring, 2.04 and 3.16, respectively, and they grew to 2.80 and 3.89, respectively. These two areas focus on the activities related to child learning, and less so to program structure. The highest rated classrooms After the first set of scores, aeioTU put in place a set of reforms (described in detail further below) to improve quality as the program was scaled up. These investments were reflected in the quality indicators. Figure 1 illustrates improvements in the distribution of quality in aeioTU classrooms, comparing 2011 and 2014. By 2014, aeioTU increased the levels of both lower and higher scoring classrooms, and resulted in good quality levels in several classrooms. The increase in the total score was statistically significant (Kolmogorov-Smirnov, P = 0.040). For the subscales, there were statistically significant (or marginally so) improvements for personal care routines (P = 0.004), language and reasoning (P = 0.093), activities (P = 0.068), interactions (P = 0.083), and program structure (P = 0.001).
While these scores are low relative to highquality programs in developing countries, 24 they are not low in relation to other programs using ERS scales in the region. Figure 2 compares the 2014 aeioTU ECERS-R scores with ERS scores from four other programs in the region. 24 Higher quality is harder to find in the region, and the counterfactual to these programs is so low that their impact, as in the case of aeioTU, comes from providing a substantial improvement over alternative programs.

Interactions and indicator-level data
Research has indicated that global measures of classroom quality, like the ECERS-R, may not be the most highly predictive of student outcomes and that, rather, interaction-specific constructs are more predictive of student outcomes than the total score. 51 The ECERS-R does contain several items that are "interaction-specific." For example, "Language and Reasoning" documents and quantifies evidence on specific teacher-child interactions that build on children's vocabulary. This subscale has been noted by researchers as the most predictive subscale of the ECERS-R as related to child outcomes. The following descriptive analyses look into this and into the interactions subscales at a much more granular level, in order to describe which indicators were and were not observed in 2011 and how this changed over time. As part of a continuous improvement cycle with programs, knowing which indicators are met throughout concretely identifies areas for improvement-identifying the frequency with which classrooms are meeting each indicator under each "minimal, good, and excellent" level of quality. Such detail information provides specific information to support professional development efforts more strategically.  items (those in the "excellent" level). These findings are critical in light of research on the importance of these kinds of interactions being imperative for child development. 52 Figure 3 shows an increase in staff who balanced listening and talking, and a fourfold increase in staff who linked children's spoken communication with written language. These increases are not observed in the total score, as they would be held back by the percentages of classrooms attaining level 3 ("good") indicators. This is aligned with growth in the percentage of classrooms where children talked about logical relationships and concepts, which also increased significantly, as well as the appropriate introduction of concepts ( Supplementary Fig. S2  online). Modest growth was also observed in the percentage of classrooms where staff talked about logical relationships as children engage with materials, and in the percentage of classrooms where children are encouraged to explain their reasoning ( Supplementary Fig. S2 online). Figure 4 shows another example where indicators demonstrated large increases between 2011 and 2014 that would not be fully reflected in the total scores, as the increases were in some, but not all, the indicators under each level. For example, there was a large increase in the percentage of classrooms where language was centrally used with children and staff, encouraging communication between children, as well as in individual conversations with children. Figure 5 focuses on staff-child interactions; indicators in this area were already met for a high percentage of the classrooms in 2011. Nevertheless, by 2014, all indicators were observed in 60% of the classrooms, and all but the last indicator was met by over 80% of the classrooms. Interactions among children ( Supplementary Fig. S3 online), basically showing increases between 2011 and 2014, are also evident. In this case, almost all higher indicators were observed in more than over 60% of the classrooms.
A last set of indicators described has to do with discipline ( Supplementary Fig. S4 online). This is another area where various higher order indicators were met by over half the classrooms, but the scores in the ERS were held back because some, but not  all, indicators were met. Staff react consistently to children's behavior and involve children in solving conflicts and problems are two areas that are central to the aeioTU model.

TSEEQ
The TSEEQ asks questions related to the physical environment and routine aspects of teaching (including content areas), planning, interactions, and leadership to teachers; it was used in September of 2012. Out of 26 teachers, 14 completed the TSEEQ (54%). This, a measure developed to capture self-reported classroom quality, is summarized by seven composites that vary between 1 (minimal quality) and 5 (high quality). Figure 6 reports TSEEQ scores by subscale. Average quality across these was 4.10 (0.38 SD) over a 5-point scale. That is, self-reported quality for responding teachers as measured by the TSEEQ was good, particularly in terms of assessment of children, teaching, and interactions. Internal consistency (Cronbach ␣) for the TSEEQ total scores was 0.96 (0.64 for assessment, 0.56 for physical struc-ture, 0.82 for family participation, 0.82 for teaching, 0.91 for planning, 0.78 for interactions, and 0.76 for leadership).

Discussion
The evaluation of the aeioTU program was conceived as utilization-based, combining both implementation and child outcome parts. The implementation evaluation was meant to provide information on how the program was working and information that could guide program improvement, particularly as it was progressively scaling up. This was central, as aeioTU was committed to scale and quality. The evaluation relied on a combination of qualitative and quantitative measures. The outcome evaluation focused on the impacts of the program on participating children, 17 as well as on program improvement strategies, as it highlighted need for supports in specific areas. The implementation evaluation focused on processes, from understanding the teachers, the training they got, the quality of interactions in their classrooms, to their perception of their work. Therefore, it  provided the basis for reforms in program policies and practices, as the information from the research team informed the aeioTU program officers, directors, and teachers. These reforms occurred in cycles as the information was being collected by the research team and shared with the program. Our report shows that quality, as measured by the ERS, was quite low in the first year of observations, and that the actions implemented by the program were meaningful enough to increase quality significantly over a period of 3 years. Analyses at the indicator level show that higher level indicators were present in a significant percentage of classrooms by 2014. Teacher surveys revealed that they had a strong commitment to the Reggio inspired emergent curriculum, while they felt challenged with the amount of documentation required to follow children's progress. Moreover, the survey on self-reported quality practices in early childhood (TSEEQ) showed that teachers' perceptions of their practices, and their access to materials, resources, and supports, were high by 2013.
The results of our study illustrate that when a program is young, process and outcome data for children can be central to inform program improvement. Quality can only be raised after teachers have had their skills improved and experience enacting a curriculum, and in this case, after trainings were strengthened in response to what was observed needing further support. Other necessary improvements may be structural. Overall, this means that quality may not be as high as expected initially, and that implementation evaluation of the programs and processes can support continuous quality improvement. Below, we discuss the program improvements that were triggered by this evaluation, as well as general lessons for the field.

Program improvement
AeioTU has a centralized system of supports with curricular leadership, which allowed for decisions to be made from the central administration to support not only the centers in the study but also to put in place a system to strengthen quality. The  results of the implementation process evaluation summarized here, as well as the impact evaluation mentioned earlier, have been feeding since 2011 a continuous improvement process for aeioTU. There have been program improvements along two main sets of action based on the information provided by the evaluation: (1) changes in pre-and in-service training; and (2) program improvements in relation to structural supports, curriculum, child evaluation, and fidelity control systems. These improvements have occurred as aeioTU continued to progressively scale up, and now are part of the scaled aeioTU systems serving more than 13,000 children; and they continue to be used and revisited every year.

Improvements in pre-and in-service training.
The results of the study were shared internally within the aeioTU teams of teachers and directors to initiate processes that would strengthen quality not only in the centers that were included in the evaluation but also across all the centers. The pre-and in-service training processes were reorganized as follows.
A professional development experience was implemented early in the school year (acercamiento a la experiencia educativa, which translates as "get-ting close to the education experience") to revisit the key elements of implementing the aeioTU Reggio inspired experience. This emphasizes the central fundamentals and components of the experience and relates it to everyday pedagogical practices. This professional development experience is meant to refresh and strengthen the knowledge and skills of existing staff, and to build and integrate new staff.
A set of 11 professional development workshops were developed on (1) positive discipline in the aeioTU experience and the importance of relationships and interactions; (2) play as a learning strategy in the aeioTU experience; (3) pedagogical tools and their use in classrooms, sensory areas, and common areas; (4) assessment of children's development and learning in the aeioTU experience; (5) sensory areas, their dynamics, intention, and contribution to the development and learning of children; (6) documentation as a learning tool in the aeioTU educational experience; (7) transition moments and their contributions to children's learning and development; (8) thought and language abilities through the learning strategies in the aeioTU educational experience; (9)   r Internal pedagogical committees were established in all centers to assess the child monitoring and evaluation processes, as well as the implementation of learning strategies; r Security processes were established to strengthen monitoring and observation processes for classrooms to assess risks related to the infrastructure and the appropriate use of materials; this was supported by protocols on personal care routines for children.
Structural, curricular, and evaluation improvements. Another set of decisions had to do with revisiting the adequacy of materials, as well as solidifying a curriculum to better scaffold teachers in centers and classrooms. After the first year's process data became available, the support center put together a curriculum and pedagogy quality support team to work with the centers to support and monitor quality. As part of this support process, three additional quality support roles were added: zonal coordinators (one per department); regional pedagogical coordinators (one per region); one curriculum professional and two arts and culture professionals. This expanded quality team initiated eight activities. The first, following findings in the ERS analysis, was the revision and subsequent improvement of learning materials and furniture in centers. In addition, quality control processes were organized and standardized across all schools and centers. Third, curriculum tools and strategies were revised in response to what was being observed in the impact evaluation in terms of children's language and cognitive development. A software to monitor children's development (ConecTU) was developed and implemented across all aeioTU centers. Internal observation and evaluation tools were created to assess the quality of processes in all centers inspired by the ERS measures (Faro Operación Sana and Faro Pedagógico). A toolbox (called Cartografía Curricular, see Supplementary Image S1 online) was developed to scaffold teaching and learning processes. This toolbox includes a book on how to create pedagogical spaces that promote exploration and learning; a book on learning strategies (e.g., thought abilities, play, exploration, and projects); a book on aeioTU at home (supports the home visiting program developed afterward serving mothers and children younger than age 2); a positive discipline tool; a catalog on pedagogical tools for the different ages; didactic tools for language and mathematical processing; a support plan for children with special needs; a planning tool for teachers; and tools to strengthen the use of transition times throughout the day and throughout the different spaces. In addition, leadership and process trainings were integrated across the systems in a systemic approach.
The understanding and appropriation of the documents and instruments of the Cartografía Curricular is done in in-person trainings, where teachers are trained on the material; its content; and what, why, and how to use it in practice. This is followed by onsite visits, observations, collaborative and oneon-one meetings, providing multiple opportunities for feedback. This is the core of the aeioTU program, which can be scaled to other regions.

Lessons learned
There are no specific curricular guidelines for early education in Colombia. The Board for Early Childhood has emphasized the principle of curricular freedom, and national standards are intentionally broad. That means that teachers and other providers of services are expected to adapt the learning standards to their own classrooms and contexts. 8 AeioTU is based on the concept of an emergent curriculum. Traditional teaching in Colombia is not. While teachers were highly receptive with the conceptualization and the model, they also appeared to struggle with elements of its implementation. Classroom observations revealed that some of the ERS scores were lower because children's choices were sometimes controlled. Consequently, the aeioTU curriculum support staff invested in further scaffolding for their teachers in ways that published curriculums such as High Scope 53 or Tools of the Mind 54 commonly do, with a variety of tools and structure to help support the practice in the classrooms. This became a necessary ingredient in the development of a strong curriculum that could be enacted across all of the aeioTU centers. This is now the "toolbox" described earlier, published as a set, and is the core of the aeioTU model.
The process of understanding and working through the strengths and weaknesses of the model allowed the program to build a stronger system of supports for all its centers. This system now also provides professional development and technical assistance to other early education providers in Colombia, in partnership with the public sector or through market-based arrangements. Professional development and technical assistance are now central components of the aeioTU strategy to promote high-quality early education services in Colombia beyond their centers. Scaffolding their own model has also allowed Colombia to generate the inputs to support other ECD programs across Colombia and the region. The curriculum, the tools for monitoring progress of children, and the tools to support teachers' processes in the classroom and with children are transferable across contexts.
An example of this work is the support and scaffolding provided in the State of Quintana Roo, Mexico, where aeioTU supported the processes necessary to start operations in two government sponsored centers. Similarly, in 2015 and 2016, aeioTU provided training and professional development to 300 and 600 early childhood (non-aeioTU) centers in Colombia, as part of a quality improvement effort and in contract with the ICBF. Pre-and post-treatment evaluations of classroom quality showed classroom improvements in these programs, which serve 60,000 children.
AeioTU provides a case study of a program that progressively scale up while carefully increasing quality. Careful monitoring of quality and continuous strengthening of professional development and scaffolding provided to teachers is not unique to the aeioTU program. Other programs have implemented similar continuous improvement cycles as they have scale-up, particularly in developed countries. This is true of the Abbott preschool program in the United States in the state of New Jersey, 55,56 which is now a top ranked preschool program in the United States. 57 A study of professional development that combined didactic and coaching components in Chile's Un Buen Comienzo showed similar effects in classroom quality (although these did not translate into child outcomes). 58 Ultimately, good quality early childhood interventions require time to mature, processes to measure what is working and what is not, and capacity to implement changes to improve quality continuously. Scaling while increasing quality, as aeioTU did and can do, requires robust processes of assessment, information, and responsive decision-making.

Acknowledgments
This paper was invited to be published individually and as one of several others as a special issue of Ann. N.Y. Acad. Sci. (1419: 1-271, 2018). The special issue was developed and coordinated by Aisha K. Yousafzai, Frances Aboud, Milagros Nores, and Pia Britto with the aim of presenting current evidence and evaluations on implementation processes, and to identify gaps and future research directions to advance effectiveness and scale-up of interventions that promote young children's development. A workshop was held on December 4 and 5, 2017 at and sponsored by the New York Academy of Sciences to discuss and develop the content of this paper and the others of the special issue. Funding for open access of the special issue is gratefully acknowledged from UNICEF and the New Venture Fund.
We are very thankful to aeioTU for their commitment to early childhood and opening the doors to our evaluation team; the Jacobs Foundation and the UBS Optimum Foundation for their generous grants that made this evaluation possible; the Inter-American Development Bank for their financial contribution; iQuartil for their wonderful work managing data collection on site; and our data collectors who worked under very difficult conditions. Any views expressed are those of the authors and do not necessarily represent those of the funders. This research was approved by the IRB at Rutgers University (USA; Internal Review Board, Protocol No. E09-568c) and Universidad de los Andes (Colombia; Acta No. 032-2009).

Supporting information
Additional supporting information may be found in the online version of this article.     Image S1. Cartografía curricular.