The Dynamic Effects of Teach for America in Hard-to-Staff Schools Sally Hudson∗ November 28, 2015†
Abstract Randomized evaluations show that Teach for America (TFA) teachers outperform colleagues in boosting achievement at hard-to-staff schools. Despite this cross-sectional evidence, TFA’s long-run effects remain unknown, a key concern for policymakers. High turnover among TFA recruits – who commit to serve for just two years – may undercut the long-run returns to hiring non-TFA teachers, who improve steeply with experience. To assess this potential tradeoff, I measure the short- and long-run effects of TFA hiring in North Carolina, where schools have employed TFA teachers since the program’s founding in 1990. I identify TFA hiring effects by exploiting quasi-random variation in teacher hiring shocks across grades within schools. In the short run, TFA rookies increase math scores markedly relative to the non-TFA teachers schools might otherwise hire; TFA’s initial advantage in reading is modest. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset the costs of lost experience, increasing long-run achievement. On the other hand, when TFA supply fluctuates, schools may have to replace exiting TFA teachers with inexperienced and lower-performing non-TFA hires. On net, short run achievement gains from one-shot TFA hiring still exceed the costs.
∗ Department
of Economics, Massachusetts Institute of Technology. E-mail:
[email protected] I am grateful to my advisors, Joshua Angrist, David Autor, and Heidi Williams, for their invaluable guidance and support. I thank Raegen Miller, Rachel Perera, and Jason Atwood of Teach for America (TFA) for sharing their data and institutional knowledge; Kara Bonneau and Clara Muschkin of the North Carolina Education Research Data Center for assembling and distributing the state’s administrative records; and Mathematica Policy Research (MPR) and the Institute for Education Sciences (IES) for facilitating access to their replication files. I also thank Annice Correia, Andrew Dorner, and Mark Leary for their administrative and technical support. This project has benefitted from the feedback of many. Special thanks to Alex Bartik, Esther Duflo, Jon Gruber, Peter Hull, Ben Olken, Manisha Padi, Jim Poterba, Brendan Price, Ashish Shenoy, Michael Stepner, Melanie Wasserman, and Jeremy West for their generous comments. I gratefully acknowledge financial support from the National Science Foundation (NSF) Graduate Research Fellowship (under Grant No. 1122374) and the George and Obie Shultz Fund. The findings and conclusions expressed here are mine alone and do not necessarily reflect the views of the IES, MPR, NSF, or TFA. † The most recent draft can be found online at economics.mit.edu/files/11048.
1
Introduction Many districts struggle to attract effective teachers to low-performing schools. Teach for America (TFA),
a highly-selective recruiting program, offers a potential solution. TFA trains novice teachers for five weeks and then places them in hard-to-staff schools where they commit to teach for at least two years. Despite their inexperience, TFA teachers outperform colleagues in two randomized trials, boosting grade school math achievement by nearly .10σ and matching non-TFA teachers in reading (Chiang et al., 2014; Glazerman et al., 2006).1 Non-experimental studies that use student covariates to mitigate selection bias report similar effects when comparing TFA to other rookie teachers in Florida (Hansen et al., 2014) and North Carolina (Henry et al., 2014; Xu et al., 2007), with more modest differences in New York City (Kane et al., 2008; Boyd et al., 2006). These short-term gains notwithstanding, TFA’s critics argue that the program’s temporary staffing model has long-run costs (see, e.g., Heilig and Jez, 2014). Teacher quality improves steeply with experience (Rockoff, 2004; Rivkin et al., 2005; and, more recently, Papay and Kraft, 2015), so high turnover in the TFA pool may reduce overall teacher quality relative to policies that emphasize hiring and retaining traditional teachers. This paper documents the long-run consequences of TFA hiring taking turnover into account. TFA’s long-run impacts depend on how returns to tenure and experience vary with retention. I therefore integrate these key mediating variables in a model of time-varying TFA treatment effects. I estimate the long-run effects of TFA hiring using two decades of TFA placement records from North Carolina. North Carolina schools have been hiring TFA teachers since the program’s founding in 1990. Superintendent Ray Spain of rural Warren County championed TFA in an editorial to the Washington Post. “Like so many rural districts,” he wrote, “mine faces a true teacher shortage – particularly in subjects like math, science and special education. Teach For America helps to address this – offering our principals access to a national pipeline of diverse, accomplished candidates” (Mathews and Spain, 2013). But, barely an hour away, in the mid-sized city Durham, the school board recently voted to end its TFA contract, citing concerns over turnover. “The majority of the board felt like TFA, while they have quality candidates for sure, is more of a short-term solution. The corps members don’t tend to stay,” said Board chair Heidi Carter. “The model 1A
third trial that focused on kindergarten and pre-kindergarten teachers released preliminary findings in 2015 that show no differential impacts (Clark et al., 2015). I revisit the results from all three studies in Section 2.2.
1
of TFA is just not one that we believe supports the quality of a long-term, high-quality teaching force” (Luther, 2014). As other states debate ending, expanding, and scaling back TFA’s presence, evidence on the program’s long-run impacts are immediately policy relevant.2 I identify TFA hiring effects by exploiting the constant and capricious turnover in hard-to-staff schools. The average TFA school in North Carolina loses more than a third of its non-TFA teachers every year, so whether they hire fourth grade teachers one year or fifth grade the next is potentially random from the perspective of cohorts advancing through the grades. I show that within school-years, cohorts’ exposure to both TFA and non-TFA hiring is uncorrelated with their prior achievement trends. Cross-cohort variation in hiring shocks therefore identifies the causal effect of teacher hiring on student achievement.3 I find that TFA math teachers outperform the average counterfactual hire by .10σ in the hiring year. Impacts are larger when comparing TFA with other rookie hires (.15σ) and only slightly smaller relative to veterans (.08σ), who account for two-thirds of all non-TFA hiring in my sample of TFA schools. TFA reading teachers perform on par with the average veteran hire and .03σ better than other rookies. I extend this empirical strategy to measure TFA’s long-run impacts. The results indicate that TFA teachers maintain their .10σ math advantage at every tenure level, with positive, though small and imprecise effects in reading. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset turnover, boosting steady-state achievement by .07σ in math and .03σ in reading. TFA may not provide a consistent source of staff, however. The program’s ranks have been thinning as the economic recovery improves employment options for recent college graduates (Rich, 2015). Schools that hire TFA may therefore risk having to replace TFA teachers with other new hires, who do perform worse than the counterfactual retained veteran. I show that these potential losses are small in magnitude. Short-run gains from one-shot TFA hiring exceed negative effects on future cohorts. This study contributes to a growing literature on efforts to improve teacher quality in hard-to-staff schools. State and federal agencies have long offered salary supplements, debt forgiveness, and housing subsidies to recruit teachers to high-need districts (Jacobson, 2006). While some studies show these incentives help attract and retain teachers (see, e.g., Clotfelter et al., 2008; Steele et al., 2010), evidence that teacher incentives improve student achievement is rare. Programs that pay teachers based on student test scores have 2 Pittsburgh
was the first U.S. school district to terminate a TFA recruiting contract (Belculfine, 2013). Durham was the second. To date, no others have followed suit, though Minneapolis is reducing its reliance on TFA teachers (Brandt, 2014). West Virginia, meanwhile, recently adopted new teacher licensing standards to facilitate TFA’s expansion (Vorhees and Marra, 2015). 3 Chetty et al. (2014) use similar cross-cohort variation to develop a test for bias in conventional measures of teacher quality.
2
typically had little impact in U.S. schools (see, e.g., Goodman and Turner, 2013; Fryer, 2013; Springer et al., 2010),4 though Dee and Wyckoff (2015) find that large performance bonuses in Washington D.C. boost retention among qualifying teachers and improve their performance in later years. My quasi-experimental strategy allows me to directly measure TFA’s impacts on the hard-to-staff schools that it serves. The next section provides additional background on TFA and briefly reviews findings from the randomized trials. I detail my empirical strategy in Section 3 and describe the North Carolina data in Section 4. Section 5 presents TFA effects in the hiring year, and Section 6 reports long-run impacts. Section 7 concludes.
2
Background
2.1
TFA’s Recruiting Model TFA has recruited more than 57,000 teachers since it first launched in 1990. The program’s founder,
Wendy Kopp, patterned TFA’s two-year service model after the Peace Corps, so TFA teachers are known as corps members (Kopp, 2001). In the 2014-2015 school year, nearly 11,000 corps members served more than 500 districts in 37 states. These corps members account for a small fraction of the nation’s overall teaching force. U.S. K-12 public schools employ roughly 3.1 million full-time equivalent teachers and hire nearly 200,000 first-time instructors every year, making TFA less than three percent of the annual inflow. TFA is a major supplier to some high-poverty urban and rural districts, however. New York City, Chicago, and Dallas each employ roughly 500 corps members per year, and several North Carolina rural districts fill more than one in five positions through TFA. Roughly one in three corps members work in charter schools, which are publicly-funded schools that maintain greater flexibility to make staffing decisions than traditional public schools. TFA selects corps members through an intensive screening process, which Clark et al. (2015) detail. The vetting has four stages: an online application, a writing exercise, a phone interview, and a day-long, in-person interview that includes a mock teaching lesson. At each stage, reviewers assess applicants’ leadership and organizational skills; academic achievement and critical thinking; capacity to work in diverse 4 Fryer
et al. (2012) describe a noteworthy exception in Chicago. Performance-based teacher pay has been more consistently effective in other countries (Duflo et al., 2012; Muralidharan and Sundararaman, 2011; Lavy, 2002), though Glewwe et al. (2010) find effects are confined to the incentivized outcomes, with no impact on broader measures of student learning, as Holmstrom and Milgrom (1991) forewarn.
3
environments; and commitment to reducing educational inequality. TFA combines qualitative scores from each assessment in a quantitative model that guides selection decisions. The model uses student achievement data from past corps members to predict which applicants will become effective teachers.5 Selected applicants are typically high-achieving recent college graduates. Between 2009 and 2012, the average corps member scored above the 90th percentile on the SAT and earned a 3.6 GPA in college. Nearly all were firsttime teachers, and fewer than one in five had post-college work experience. TFA actively recruits minority and low-income candidates. A third of corps members are racial or ethnic minorities, one third are firstgeneration college-goers, and one in four are Pell Grant recipients. All told, TFA accepted just 15 percent of its nearly 50,000 annual applicants between 2009 and 2012. TFA assigns each accepted applicant to a region, subject, and grade level.6 These placements depend on both applicants’ preferences and districts’ staffing needs. Roughly 75 percent of accepted applicants join TFA after learning their assignment. Those who matriculate then apply for teaching positions in their assigned regions. Though corps members are not guaranteed jobs – they must apply and interview with other candidates – just one percent of corps members failed to obtain teaching positions in 2012.7 In exchange for the new recruits, districts pay TFA a finder’s fee that ranges from $2,000-$5,000 per corps member for each of the first two service years. In addition, districts pay corps members salaries and benefits like any other employee. Since corps members are mostly uncertified, inexperienced, bachelor’s degree-holders, they typically start at the entry-level salary for all teachers in the district. Roughly 87 percent of corps members complete their two-year teaching commitments. Nearly all those who leave early do so before completing the first year. TFA does not maintain administrative data on corps members’ employment after the two service years, so the best evidence on long-run retention comes from alumni surveys. Donaldson and Johnson (2011) find that 40 percent of survey respondents stayed in their initial school for at least three years, and 15 percent were retained through year five. Many more corps members continue teaching after leaving their initial school. More than 60 percent of respondents taught somewhere in K-12 schools for at least three years, and nearly one in four were still teaching after five years. Retention rates for non-TFA teachers nationwide are roughly twice as high – half of all new K12 instructors teach for at least five years (Ingersoll and May, 2012) – but comparable data on traditional teachers who start their careers at low-performing schools is limited (see Donaldson and Johnson, 2011). 5 Dobbie
(2011) independently verifies that these scores predict variation in achievement impacts among corps members. are collections of districts in the same vicinity, roughly comparable in size to three-digit ZIP codes. 7 Accepted corps members who were not offered teaching positions typically failed state or district certification exams. 6 Regions
4
Administrative records from North Carolina allow me to present a more complete picture of teacher turnover in the hard-to-staff schools that TFA serves.
2.2
Experimental Evidence Three randomized controlled trials (RCTs) compare corps members to other teachers within the same
schools (Glazerman et al., 2006; Chiang et al., 2014; Clark et al., 2015). The samples cover a wide range of districts, grades, and subjects, but all three trials were similar in design. They recruited schools employing TFA teachers and divided each school into grades. Within select grades, schools agreed to randomly assign students to teachers and test each student at the end of one year. In total, the trials included roughly 160 TFA teachers and 190 controls from 103 schools. Both trials that studied grade school math teachers found corps members outperformed control instructors (Glazerman et al.; Chiang et al.). Glazerman et al. find no differential impacts among grade school reading teachers, and Clark et al. estimate positive TFA reading effects in early elementary grades. Though random assignment delivers unbiased estimates of average TFA impacts within the RCT samples, the RCT data cannot be used to identify the long-run effects of hiring corps members. Cross-sectional comparisons only capture long-run effects in a steady state, and the experimental data do not represent the steady state teaching staff at any school. Indeed, most schools selected just two or three teachers to participate. Among the chosen corps members, second-years outnumber first-years by nearly double, a sure sign the samples do not reflect a steady state. When faced with non-random sample selection, researchers typically investigate treatment effects in subgroups. Comparing TFA and control teachers by tenure, for example, may shed light on how teacher quality evolves over time, but the RCT samples are too thin to estimate tenure-specific treatment effects.8 Since both departure rates and returns to experience are highest in teachers’ early years, capturing year-to-year dynamics is crucial for understanding the impact of teacher hiring policies at hard-to-staff schools. Even small gaps in retention rates for inexperienced hires can generate meaningful differences in average teacher quality, as Staiger and Rockoff discuss (2010). I therefore turn my attention to identifying the long-run effects of TFA hiring using observational methods. 8 Each
trial reports subgroup effects by experience – not tenure – in coarse bins of three or five years.
5
3
Empirical Framework The ideal experiment for measuring TFA’s long-run effects would recruit schools with vacant teaching
positions and randomly select some schools to receive corps members. The remaining schools would hire from their usual non-TFA applicant pool. Over time, teachers would leave, and TFA-treated schools would receive more corps members, while control schools would hire non-TFA replacements. Comparing achievement across schools over time would reveal the long-run effects of continually employing TFA teachers. In this section, I model how TFA’s long-run effects depend on the relationship between teacher retention and the returns to tenure. In the absence of the ideal experiment, I can combine data on retention rates with tenure-specific estimates of TFA treatment effects to measure the long-run effects of interest.
3.1
A Motivating Model Consider a school with just one teaching position, which is vacant in the baseline year zero. Let Z ∈ {0, 1}
indicate whether the school is selected to receive corps members. The tenure of future teachers may depend on the school’s TFA treatment status. Let Ty (z) denote potential tenure of the year y teacher as a function of the initial hiring choice. A type z teacher with t years of tenure generates achievement A(z, t). The causal effect of hiring corps members on year y achievement can then be written δy
≡ A 1,Ty (1) − A 0,Ty (0) .
(1)
h i The goal of this paper is to estimate E δ y over time horizons y. Since all newly-hired teachers have zero tenure, by definition, the average effect of TFA hiring in the hiring year is simply E [δ0 ] = E [A(1, 0) − A(0, 0)] . Achievement in the next year depends on how teacher tenure evolves in response to the hiring decision. A Z ,T1 (Z) = T1 (Z)A(Z , 1) + [1 − T1 (Z)] A(Z , 0)
(2)
= A(Z , 0) +T1 (Z) [A(Z , 1) − A(Z , 0)]
(3)
The first term in equation (2) captures year one achievement if the initial hire is retained. The teacher has one year of tenure, and the school obtains achievement A(Z , 1). If the initial hire leaves, the school hires a new
6
teacher with zero tenure and obtains achievement A(Z , 0), as the second term shows. Rearranging terms in the next line provides another intuitive formulation. Achievement in year one is baseline achievement for a zero tenure teacher plus the return to one year of tenure if the teacher is retained. Define τ tz ≡ A(z, t) − A(z, 0) as the return to t years of tenure for a type z teacher.9 Substituting this expression into equation (3) and differencing across teacher types yields h i E δ1
o n oi A(1, 0) +T1 (1)τ11 − A(0, 0) +T1 (0)τ10 h i h i h i = E δ0 + E T1 (1)τ11 − E T1 (0)τ10 . = E
hn
(4)
In words, the average return to TFA hiring after one year depends on the average difference in initial performance, E [δ0 ], and the difference in average returns to tenure weighted by the probability of retention, h i E T1 (z)τ1z . Average returns over longer time horizons take the same generic form. I focus here on identifying these components.
3.2
Identifying TFA Hiring Effects The primary challenge to identifying the terms in equation (4) is correlation between teachers’ observed
traits and their students’ potential achievement. This is the usual selection bias problem when measuring how teacher quality varies with credentials like tenure and TFA affiliation. Teachers with different credentials serve different schools and, within schools, different students, so the raw correlation between teacher traits and student outcomes confounds teacher quality with students’ latent ability. I address selection bias by leveraging quasi-random variation in teaching assignments at high turnover schools. Each year, North Carolina’s TFA schools lose more than a third of their teachers, on average. Another third transfer between grades within schools from year-to-year. I show that in these chaotic staffing environments, teacher credentials are as good as randomly assigned across cohorts in each year. Specifically, I document that within school-years, cohorts’ exposure to teachers with varying tenure and TFA status is uncorrelated with their prior achievement trends. Cross-cohort variation in teaching assignments therefore h i identifies the initial TFA hiring effect, E [δ0 ], and returns to tenure, E τ1z . If returns to tenure were constant h i across schools, then the joint terms, E T1 (z)τ1z , could be decomposed into the product of retention rates, 9 Defined in this way, the return to tenure combines both returns to total experience and returns to school-specific experience. Though labor economists have devoted considerable attention to disentangling these forces (see, e.g., Abraham and Farber, 1987), their sum is the relevant return to the employer.
7
h i E [T1 (z)], and returns to tenure, E τ1z , so that h i E δ1
h i h i h i h i h i = E δ0 + E T1 (1) · E τ11 − E T1 (0) · E τ10 .
Given data on retention rates, all of the components of equation (4) can then be identified. In principle, however, returns to tenure may vary across schools, and that variation may well be correlated with retention rates. Teachers that expect to improve may be more likely to stay.10 To address this second identification challenge, I divide schools into quartiles by retention rates so that retention is nearly h i h i h i constant within groups, and the decomposition E T1 (z)τ1z ≈ E T1 (z) · E τ1z holds approximately. I then estimate returns to tenure separately for each quartile, q, and average effects across quartiles to obtain the h i joint terms, E T1 (z)τ1z : 4
h i E T1 (z)τ1z
≈
h i 1X E T1 (z) | Q = q · E τ1z | Q = q . 4 q=1
(5)
In practice, this step has little empirical consequence. Turnover rates are fairly uniformly high in my sample of North Carolina’s TFA schools, and returns to tenure vary little across the higher and lower turnover schools among them.
3.3
Estimating TFA Hiring Effects I estimate TFA hiring effects using regressions of the following general form: A¯ csy
= β F¯csy +γ N¯ csy + α sy + ϕ cy + ε csy .
(6)
The outcome, A¯ csy , is the average test score for cohort c at school s in year y, and the treatments, F¯csy and N¯ csy , are the share of students taught by first-year TFA teachers and non-TFA hires, respectively. The omitted treatment is the share of students taught by returning teachers, so βˆ and γˆ compare new hires to returners, and δˆ0 = βˆ − γˆ captures the TFA hiring difference. The school-year effects, α sy , generate cross-cohort comparisons within school-years, and cohort-year effects, ϕ cy , boost precision. In principle, these regressions could also include cohort-school effects – the third pairwise interaction between the panel 10 Some schools may attract teachers with steeper tenure profiles. Some schools may better develop the teachers they manage to hire. Either scenario would result in correlation between retention rates and returns to tenure across schools.
8
dimensions. In practice, cohort-school effects reduce power without altering the tests of the identifying assumptions.11 If the hiring treatments are as good as randomly assigned in equation (6), then hiring should be uncorrelated with cohorts’ pre-treatment characteristics conditional on school-year and cohort-year effects. Section 5 reports several identification tests in this spirit. In particular, I present event study plots which show that hiring shocks in year y affect lead scores, A¯ cs ,y+∆ , but not lagged scores, A¯ cs ,y −∆ . These plots provide graphical evidence that, within school-years, cohorts would follow parallel achievement path in the absence of hiring shocks. Equation (6) also has an instrumental variables (IV) interpretation. Consider the underlying studentlevel micro-data in each cohort-school-year. Let Ficsy and Nicsy denote binary treatments that identify students, i, taught by newly-hired teachers. The average treatments, F¯csy and N¯ csy , are the first-stage fitted values from a two-stage least squares (2SLS) system that uses the full set of cohort-school-year indicators to instrument for these student-level treatments. A icsy
= βFicsy +γNicsy + µ sy +ν cy +η icsy
Ficsy
= λ csy + µ sy +ν cy +η icsy
Nicsy
n n n = λ csy + µ nsy +ν cy +η icsy
f
f
f
f
As Angrist (1988) shows, 2SLS estimation of this system delivers the same estimates of βˆ and γˆ as weighted least squares (WLS) estimation of equation (6), weighting each cohort-school-year cell by its size. Viewing equation (6) through this IV lens offers another angle on identification. Cohort-school-year instruments isolate the variation in student-teacher assignments that comes from cell-level hiring shocks. If students respond to cell-level shocks by sorting to teachers within cells – say, to avoid being placed in the new hire’s class – those margins of adjustment are embedded in the IV intent-to-treat effects. Leveraging crosscohort variation therefore bypasses the primary selection bias problem in traditional analyses of teacher quality – the non-random sorting of students to teachers12 – and, arguably, captures the more policy relevant parameter. 11 Because
achievement data are only available for grades 3-8, most cohort-schools are observed for just three years: grades 3-5 in elementary school and grades 6-8 in middle schools. 12 Rothstein (2010) and Chetty et al. (2014) provide detailed discussions of these concerns.
9
I extend the specification in (6) to estimate the returns to tenure for both TFA and non-TFA teachers. A¯ csy
=
3 X
t t β t F¯csy +γ N¯ csy + α sy + ϕ cy + ε csy .
(7)
t =1 t t are the share of teachers in each cohort-school-year cell with t years of tenure. I and N¯ csy Here, F¯csy
top code TFA tenure at three years because corps members with four or more years are too rare to precisely estimate their impacts. Consequently, the omitted treatment in this group is non-TFA teachers with with four or more years of tenure. I document that exposure to the tenure treatments satisfies the same identification tests used to evaluate the estimate from equation (6); tenure appears to be as good as randomly assigned across cohorts just like hiring. I take this empirical strategy to two decades of teacher hiring records from North Carolina.
4
Setting and Data North Carolina was one of four states that hired teachers from TFA’s founding corps in 1990. Over the
next 25 years, TFA recruited nearly 2,700 teachers to the state. The North Carolina corps grew in lockstep with TFA’s expansion nationwide, as Figure A1 shows. In the 2014-2015 school year, 570 active corps members and more than 170 TFA alumni taught in North Carolina Public Schools.13 Roughly one-third of the North Carolina corps serves in Charlotte-Mecklenburg Schools, which ranks among the 20 largest districts in the country. Half are concentrated in rural districts to the northeast of the Research Triangle, and the remainder serve a cluster of four mid-sized cities to the west: Durham, Greensboro, High Point, and Winston-Salem.
4.1
Data Sources and Variable Definitions Data for this project come from two sources: TFA’s corps member rosters and administrative records
from the North Carolina Department of Public Instruction (DPI). The DPI data are archived at the North Carolina Education Research Data Center at Duke University. The database includes records from all North Carolina public schools between the 1994-1995 and 2013-2014 school years. I matched 95 percent of corps 13 TFA’s presence in North Carolina mirrors its share of the national teaching force: corps members make up roughly three percent of first-time teachers annually.
10
members from these years to their state personnel records using Social Security Numbers.14 For concision, I hereafter refer to academic years by the fall in which they began so that 2013 denotes the 2013-2014 school year. I identify teachers using school master schedules, which list personnel assignments for every course offered in the school day. I define teachers as new hires in the first year they appear as full instructors in a school’s schedule; time spent training as a teaching assistant does not count toward tenure by this definition. Each course record contains a subject code and gives enrollment counts by grade. I can therefore calculate the share of enrollment taught by newly-hired teachers within school-subject-grade-year cells. These are the treatment variables in equation (6). The online Appendix B further details their construction. In addition to the master schedules, DPI personnel data include teacher demographics, salaries, licenses, and postsecondary education. I measure achievement using student-level test scores from annual state exams in math and reading for grades 3-8. I normalize scores statewide within subject, grade, and year using the first exam administration for each student. The test records also contain standard demographic variables, including race, sex, limited English Proficiency (LEP), subsidized lunch, special education, and gifted status.
4.2
Sample Restrictions and Summary Statistics I construct my analysis sample starting with the raw exam files from 1997-2012.15 Table A1 details the
sample selection criteria. I keep all records with non-missing scores from the first test administration for each student-subject-grade-year. I then drop charter schools, which do not participate in the state-run salary database and therefore have limited personnel data. Less than five percent of North Carolina corps members taught in charter schools during this period. Among the remaining schools, I select the 153 that hired at least one first-year TFA corps member. Not all of those corps members taught math or reading in grades 3-8; I include the 24 schools that hired 212 TFA teachers for other grades and subjects to boost precision. I drop less than one percent of the remaining test records from schools with missing master schedule data for the corresponding subject, grade, and year. The resulting analysis sample contains 147 schools and 1,676,107 exams from more than 300,000 distinct students. 14 The
majority of missing records come from charter schools, which do not process salaries through the state. The state salary database provides the only crosswalk between DPI personnel identifiers and SSNs in some years. I therefore exclude charter schools from my analysis, as I describe below. 15 The tested grades were inconsistent prior to 1997, and master schedules were unavailable for 2013.
11
In keeping with TFA’s mission, corps members served schools with mostly low-income, low-performing students. Nearly two-thirds of students in the analysis sample qualified for subsidized lunches, as shown in column 3 of Table A1. Average test scores were one-third of a standard deviation below the state mean, putting the average TFA school in the bottom quartile of schools statewide.16 The students were also disproportionately nonwhite. More than 60 percent were racial minorities in a state where nonwhites account for less than 40 percent of all public school enrollment. The sampled schools hired 635 corps members to teach math and reading in grades 3-8. Column 3 of Table 1 describes these teachers. Like most TFA recruits, North Carolina corps members were recent graduates of selective colleges. They averaged less than half a year removed from college graduation, and nearly 80 percent attended “highly competitive” schools, as measured by the Barron’s Profile of American Colleges.17 Corps members were also inexperienced. Nearly 95 percent were first-time teachers, and less than 10 percent held formal teaching credentials.18 The non-TFA hires – more than 9,000 total – were older and more experienced, as column 4 reveals. They were 10 years out of college, on average, and less than one-in-three were first-time teachers. Existing TFA evaluations have neglected this fact. Researchers typically compare TFA to other inexperienced teachers, but the counterfactual hire is rarely a rookie. At North Carolina TFA schools, more than one in four non-TFA hires had already taught for 10 or more years. In stark contrast to corps members, however, less than one in four of the counterfactual hires graduated from a “highly competitive” college. Turnover rates at these hard-to-staff schools were high, even among non-TFA teachers. Figure 1 shows that barely 60 percent of non-TFA hires returned for a second year, and only 20 percent stayed through year five – much lower than national figures on persistence in teaching would suggest. Initial retention was higher among corps members due to their two-year commitment: 82 percent returned for a second year, but just one in five re-upped for a third. Nearly all corps members left their placement schools within five years. These findings underscore the importance of measuring annual returns to tenure when assessing teacher hiring options for hard-to-staff schools. Most new hires leave after two years with or without TFA. 16 The distribution of average school performance is more compressed than the distribution of student achievement.
Less than 10 percent of schools have average scores below the first student quartile. 17 “Highly competitive” includes 191 colleges in the top three Barron’s ranks. There are five North Carolina colleges among them: Davidson, Duke, Elon, Wake Forest, and the University of North Carolina at Chapel Hill, which is the state’s flagship public institution. The other four are private schools. 18 DPI grants TFA corps members provisional licenses for their two-year commitment. Many go on to earn permanent licenses through the state’s lateral entry licensing program, which allows teachers to complete coursework and testing requirements while they teach.
12
5
Short-Run Effects of Teacher Hiring on Student Achievement
5.1
Identification The summary statistics in Section 4 illustrate that high turnover schools have low-performing students.
Within this sample of hard-to-staff schools, newly-hired teachers served especially low-scoring children. This trend can be seen in panels A and B of Figure 2, which depict the negative correlation between students’ test scores in one year and their exposure to teacher hiring shocks in the next year. Each graph plots average lagged reading scores in cohort-school-year cells, A¯ cs,y −1 , against the cell share of students taught by newly-hired reading teachers, F¯csy and N¯ csy .19 Each dot depicts the average for hiring share bins of width .05 with best fit lines estimated on the underlying cell-level data. The steep negative slope in panel A indicates that students in cells with newly-hired corps members performed much worse in the prior school year than cells staffed solely by returning teachers. The average lagged score in cells without new hires was -.24σ, while cells taught solely by TFA hires averaged nearly three times lower (-.71σ). Non-TFA hiring was also negatively correlated with lagged achievement, though less steeply so. The gap in lagged achievement between cells with no new hires and all new hires was -.10σ, as the slope in panel B indicates. These graphs show that naive comparisons of test scores among newly-hired and returning teachers likely overstate returners’ performance advantage. Students exposed to new hires would have lower test scores in the absence of hiring shocks, too. Most of the gap in lagged achievement between cells with and without new hires can be attributed to persistent differences in achievement across schools. Appendix Figure A2 presents these within-school results. The vertical axis again measures average lagged reading scores in cohort-school-year cells, and the horizontal axis plots binned residuals from regressions of the hiring shares on school fixed effects.20 The resulting best fit lines are the coefficients from a regression of average lagged scores on hiring shares that controls for school fixed effects: A¯ cs,y −1 = β F¯csy +γ N¯ csy + µ s +ν csy . 19 Appendix 20 The
Figure B1 present corresponding results that use lagged math scores to measure baseline achievement. regression that generates the residual TFA hiring share includes the non-TFA hiring share, and vice versa: F¯csy N¯ csy
= =
f f λ f N¯ csy + µ s +ν csy n λ n F¯csy + µ ns +ν csy .
13
(8)
Adding school fixed effects reduces the TFA-hiring gradient substantially, from -.60σ to -.22σ, but the negative slope remains. Even within these hard-to-staff schools, corps members served cohorts and years when incoming students were especially low achieving. Within school-years, however, hiring shocks were uncorrelated with cohorts’ prior achievement. Panels C and D of Figure 2 plot average lagged reading scores against binned hiring share residuals from models that control for both school-year and cohort-year effects A¯ cs,y −1 = β F¯csy +γ N¯ csy + α sy + ϕ cy + ε csy
(9)
Both graphs show tight linear fits through the binned residuals with precisely estimated zero slopes: -.006σ for TFA hires and -.005σ for non-TFA hires. These graphs establish that, with respect to prior achievement, hiring shocks were as good as randomly assigned across cohorts within school-years. Cohorts were balanced on a wide range of other pre-treatment characteristics within school-years. Table 2 presents these balance tests, stacking data from both math and reading exams in the full analysis sample.21 Columns 2 and 3 report the unconditional correlation between the hiring shares and cohort demographics. Cells that hired corps members had far more non-white and low income students, as column 2 reports. They served more limited English proficient students and fewer gifted children – all strong predictors of lower achievement. Non-TFA hires also served cells with lower predicted performance, though the gaps were not as stark. Columns 4 and 5 show that school fixed effects reduce but do not eliminate the residual correlation between hiring shocks and student demographics. Within school-years, however, covariates and hiring treatments were uncorrelated, as Columns 6 and 7 confirm. The coefficients in these columns are preciselyestimated zeros, and a Wald test of joint significance across all 10 traits fails to reject covariate balance with p = .73. Evidence that hiring treatments are conditionally independent of cohort observables bolsters the argument that hiring shocks are also conditionally independent of unobserved potential achievement. Cross-cohort comparisons of test scores following hiring shocks should therefore capture the causal effect of teacher hiring on student achievement. 21 The
stacked model interacts all controls with a subject indicator to ensure that the identifying variation still comes from comparisons across cohorts, c, rather than across subjects, j, within cohorts. X¯ c jsy = β F¯c jsy +γ N¯ c jsy + α jsy + ϕ c jy + ε c jsy
14
(10)
5.2
Effects in the Hiring Year When schools hire non-TFA teachers, test scores drop in exposed cohorts. Panel A of Figure 3 presents
this result in event study form. Each point plots the estimated coefficient from a regression of cohorts’ average scores in year y ± ∆ on the non-TFA hiring share the cohort was exposed to in year y. A¯ c js,y ±∆ = β F¯c jsy +γ N¯ c jsy + α jsy + ϕ c jy + ε c jsy
(11)
These results pool effects on math and reading, so all specifications control for subject-school-year and subject-cohort-year effects as in equation (10). The three points to the left of the dotted line show effects on lagged scores; they confirm that exposure to teacher hiring shocks was uncorrelated with students’ prior achievement trends. The first point to the right of the dotted line shows the contemporaneous effect of nonTFA hiring. A one-unit increase in a cohort’s year y non-TFA hiring share was associated with a .082σ drop in average year y test scores relative to cohorts with all returning teachers. This hiring penalty appears to have some lasting effect on cohorts’ achievement. Test scores in exposed cohorts are significantly lower both and one and two years after the initial hiring shock. Cohorts exposed to new corps members do not pay the hiring penalty. This result can be seen in Panel B of Figure 3, which plots the coefficients on the TFA-hiring share from equation (11). As in the previous graph, the first three points confirm that students exposed to corps members are on the same achievement trend as other cohorts prior to exposure. In contrast to Panel A, however, exposure to newly-hired corps members has no significant effect on test scores – either positive or negative. Since the omitted treatment in equation (11) is exposure to returning teachers, these results imply that newly-hired corps members perform as well, on average, as all other teachers with one or more year of tenure. The effect of TFA teachers relative to other new hires appears in Panel C. It shows that TFA teachers outperform other newly-hired teachers by roughly .05σ in the first post-hiring year. Table 3 shows that TFA’s hiring advantage comes entirely from its differential effect on math achievement. Newly-hired corps members performed as well as the average returning math teacher, as the -.005σ estimate in Panel A of column 3 indicates. In contrast, a one unit increase in cohorts’ non-TFA hiring share was associated with average math scores that were -.107σ lower, leaving an average TFA hiring difference of .102σ. TFA hires had no measurable advantage in reading, however, as Panel B reveals. Both TFA and
15
non-TFA hiring were associated with small reductions in cohorts’ reading scores relative to cells with all returning teachers, but the difference between them was a small and insignificant .009σ. Table 3 also presents a series of robustness checks for the estimated hiring effects. Traditional valueadded estimates of teacher quality rely on student covariates to mitigate selection bias. Since student traits and hiring shares are uncorrelated within school-years, however, controlling for student covariates should have little effect on the cross-cohort estimates. Columns 4-6 confirm this prediction. Column 4 adds the lagged score controls common to value-added models of teacher quality (see, e.g., Chetty et al., 2014). Each specification controls for grade-interacted cubics in lagged math and reading scores. These measures of past performance are strong predictors of future scores, boosting the model R2 from roughly .10 to .60 in both Panels A and B, and yet the treatment effect estimates move little. The TFA hiring advantage increases slightly from .102σ to .110σ in math and .009σ to .013σ in reading. The results are similarly robust to the inclusion of demographic controls: sex, race, parental education, subsidized lunch, limited English proficiency, special education, and gifted status. Substituting student fixed effects for lagged score and demographic controls in column 6 produces similar results. On balance, the TFA hiring advantage is a robust .1σ for math with zero difference in reading. In results not shown here, I find that impacts are larger when comparing TFA with other rookie hires (.15σ) and only slightly smaller relative to veterans (.08σ). TFA reading teachers perform on par with the average veteran hire and .03σ better than other rookies.
6
The Long-Run Effects of TFA Hiring
6.1
The Tenure Profile of Teacher Effects Figure 5 reveals that corps members maintain their math advantage at every tenure level. Panel A plots
the tenure profile of math effects for both TFA and non-TFA hires, estimated via equation (7). Non-TFA hires, shown in black, reduce math scores by .11σ relative to non-TFA teachers with four or more years of tenure, the omitted reference group. Experienced corps members, meanwhile, perform substantially better than veteran non-TFA teachers. TFA math instructors outperformed comparison hires by roughly .1σ at each step in the tenure profile. Panel B shows no differential impacts on reading achievement at any tenure level.
16
Table 4 validates the observational results against the experimental estimates. Column 1 reports the average TFA effect in the experimental sample, pooling data from Glazerman et al. and Clark et al..22 Column 2 restricts the experimental sample to the observational grades (3-8). As columns 3-5 show, weighting the tenure-specific observational effects by the experimental tenure distribution reproduces the average TFA effect in the observational sample.
6.2
Hiring Simulations Figure 5 uses the validated estimates to simulate expected teacher quality when schools continually fill
vacancies with either TFA corps members or non-TFA teachers. Each point averages the tenure-specific teacher effects reported in panels A and B of Figure 4 by the probability that schools employ a teacher of the given tenure-level in each year. As in Figure 4, effects are normalized relative to the average impact of non-TFA teachers with three or more years of tenure. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset turnover, boosting steady-state achievement by .07σ in math and .03σ in reading. Figure 6 simulates a one-shot deviation from a policy of hiring non-TFA rookie teachers. The black lines plot expected teacher quality when schools continually fill vacancies with non-TFA rookie teachers. The green lines plot expected teacher quality if a school hires one corps member and then returns to the non-TFA rookie policy whenever the corps member leaves. Achievement gains during a corps member’s service years far exceed negative effects on future cohorts that receive replacement rookies.
7
Conclusion Randomized evaluations find TFA corps members have positive impacts on student achievement in hard-
to-staff schools. Average effects in these select cross-sections potentially mask TFA’s turnover dynamics. This paper develops and estimates a model of time-varying TFA treatment effects that incorporates teacher tenure as a key mediating variable. I identify tenure-specific treatment effects by exploiting quasi-random variation in teaching assignments across grades within school-years. I find that first-year corps members perform as well as long-tenure teachers in math instruction. Impacts are larger when comparing TFA with other rookie hires (.15σ) and only slightly smaller relative to veterans 22 Including data from Chiang et al. produces nearly identical findings. Those results are currently undergoing disclosure review with the Institute for Education Sciences.
17
(.08σ), who account for two-thirds of all non-TFA hiring in my sample of TFA schools. TFA reading teachers perform on par with the average veteran hire and .03σ better than other rookies. I extend this empirical strategy to measure TFA’s long-run impacts. The results indicate that TFA teachers maintain their .10σ math advantage at every tenure level, with positive, though small and imprecise effects in reading. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset turnover, boosting steady-state achievement by .07σ in math and .03σ in reading. TFA may not provide a consistent source of staff, however. The program’s ranks have been thinning as the economic recovery improves employment options for recent college graduates (Rich, 2015). Schools that hire TFA may therefore risk having to replace TFA teachers with other new hires, who do perform worse than the counterfactual retained veteran. I show that these potential losses are small in magnitude. Short-run gains from one-shot TFA hiring exceed negative effects on future students.
18
References Abraham, Katharine G. and Henry S. Farber, “Job Duration, Seniority, and Earnings,” American Economic Review, 1987, 77 (3), 278–297. Angrist, Joshua D., “Grouped Data Estimation and Testing in Simple Labor Supply Models,” 1988. Belculfine, Le, “Pittsburgh school board reverses on Teach for America contract,” Pittsburgh Post-Gazette, December 2013. Boyd, Donald, Pamela Grossman, Hamilton Lankford, Susanna Loeb, and James Wyckoff, “How Changes in Entry Requirements Alter the Teacher Workforce and Affect Student Achievement,” Education Finance and Policy, 2006, 1 (2), 176–216. Brandt, Steve, “Minneapolis trims use of much-debated Teach for America,” Star Tribune, May 2014. Chetty, Raj, John N. Friedman, and Jonah E. Rockoff, “Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates,” American Economic Review, 2014, 104 (9), 2593–2632. Chiang, Hanley S., Melissa A. Clark, and Sheena McConnell, “Supplying Disadvantaged Schools with Effective Teachers: Experimental Evidence on Secondary Math Teachers from Teach For America,” 2014. Clark, Melissa A., Eric Isenberg, Albert Y. Liu, Libby Makowsky, and Marykate Zukiewicz, “Impacts of the Teach For America Investing in Innovation Scale-Up,” Technical Report, Mathematica Policy Research 2015. Clotfelter, Charles T., Elizabeth Glennie, Helen F. Ladd, and Jacob L. Vigdor, “Would higher salaries keep teachers in high-poverty schools? Evidence from a policy intervention in North Carolina,” Journal of Public Economics, 2008, 92 (5-6), 1352–1370. Dee, Thomas S. and James H. Wyckoff, “Incentives, Selection, and Teacher Performance: Evidence from IMPACT,” Journal of Policy Analysis and Management, 2015, 34 (2), 267–297. Dobbie, Will, “Teacher Characteristics and Student Achievement: Evidence from Teacher Surveys,” 2011. Donaldson, Morgaen L. and Susan Moore Johnson, “TFA Teachers: How Long Do They Teach? Why Do They Leave?,” Phi Delta Kappan, October 2011. Duflo, Esther, Rema Hanna, and Stephen P. Ryan, “Incentives Work: Getting Teachers to Come to School,” American Economic Review, 2012, 102 (4), 1241–1278. Fryer, Roland G., “Teacher Incentives and Student Achievement: Evidence from New York City Public Schools,” Journal of Labor Economics, 2013, 31 (2), 373–407. , Steven D. Levitt, John List, and Sally Sadoff, “Enhancing the Efficacy of Teacher Incentives Through Loss Aversion: A Field Experiment,” 2012. Glazerman, Steven, Daniel P. Mayer, and Paul T. Decker, “Alternative Routes to Teaching: The Impacts of Teach for America on Student Achievement and Other Outcomes,” Journal of Policy Analysis and Management, 2006, 25 (1), 75–96. Glewwe, Paul, Nauman Ilias, and Michael Kremer, “Teacher Incentives,” American Economic Journal: Applied Economics, 2010, 2 (3), 205–27. 19
Goodman, Sarena F. and Lesley J. Turner, “The Design of Teacher Incentive Pay and Educational Outcomes: Evidence from the New York City Bonus Program,” Journal of Labor Economics, 2013, 31 (2), 409–420. Hansen, Michael, Ben Backes, Victoria Brady, and Zeyu Xu, “Examining Spillover Effects from Teach for America Corps Members in Miami-Dade County Public Schools,” 2014. Heilig, Julian Vasquez and Su Jin Jez, “Teach for America: A Return to the Evidence,” Technical Report, National Education Policy Center, Boulder, CO 2014. Henry, Gary T., Kevin C. Bastian, C. Kevin Fortner, David C. Kershaw, Kelly M. Purtell, Charles L. Thompson, and Rebecca A. Zulli, “Teacher preparation policies and their effects on student achievement,” Education Finance and Policy, 2014, 9 (3), 1–40. Holmstrom, Bengt and Paul Milgrom, “Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design,” Journal of Law, Economics, & Organization, 1991, 7 (2), 24–52. Ingersoll, Richard M. and Henry May, “The Magnitude, Destinations, and Determinants of Mathematics and Science Teacher Turnover,” Educational Evaluation and Policy Analysis, 2012, 34 (4), 435–464. Jacobson, Linda, “Teacher Pay Incentives Popular But Unproven,” Education Week, September 2006. Kane, Thomas J., Jonah E. Rockoff, and Douglas O. Staiger, “What Does Certification Tell Us About Teacher Effectiveness? Evidence from New York City,” Economics of Education Review, 2008, 27 (6), 615–631. Kopp, Wendy, One Day, All Children ... : The Unlikely Triumph of Teach for America and What I Learned Along the Way, New York: Public Affairs, 2001. Lavy, Victor, “Evaluating the Effect of Teachers’ Group Performance Incentives on Pupil Achievement,” Journal of Political Economy, 2002, 110 (6), 1286–1317. Luther, Joel, “Durham schools split with Teach for America,” Duke Chronicle, 2014. Mathews, Jay and Ray V. Spain, “North Carolina superintendent defends Teach For America,” Washington Post, March 2013. Muralidharan, Karthik and Venkatesh Sundararaman, “Teacher Performance Pay: Experimental Evidence from India,” Journal of Political Economy, 2011, 119 (1), 39–77. Papay, John P. and Matthew A. Kraft, “Productivity returns to experience in the teacher labor market: Methodological challenges and new evidence on long-term career improvement,” Journal of Public Economics, 2015. Rich, Motoko, “Fewer Top Graduate Want to Join Teach for America,” New York Times, February 2015. Rivkin, Steven G., Eric A. Hanushek, and John F. Kain, “Teachers, Schools, and Academic Achievement,” Econometrica, 2005, 73 (2), 417–458. Rockoff, Jonah E., “The Impact of Individual Teachers on Student Achievement: Evidence from Panel Data,” American Economic Review, 2004, 94 (2), 247–252. Rothstein, Jesse, “Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement,” Quarterly Journal of Economics, 2010, 125 (1), 175–214. 20
Springer, Matthew G., Dale Ballou, Laura Hamilton, Vi-Nhuan Le, J.R. Lockwood, Daniel F. McCaffrey, Matthew Pepper, and Brian M. Stecher, “Teacher Pay for Performance: Experimental Evidence from the Project on Incentives in Teaching,” 2010. Staiger, Douglas O. and Jonah E. Rockoff, “Searching for Effective Teachers with Imperfect Information,” Journal of Economic Perspectives, August 2010, 24 (3), 97–118. Steele, Jennifer L., Richard J. Murnane, and John B. Willett, “Do Financial Incentives Help LowPerforming Schools Attract and Keep Academic Talented Teachers? Evidence from California,” Journal of Policy Analysis and Management, 2010, 29 (3), 451–478. Vorhees, Beth and Ashton Marra, “Is Teach for America Right for West Virginia?,” West Virginia Public Broadcasting, May 2015. Xu, Zeyu, Jane Hannaway, and Colin Taylor, “Making a Difference? The Effects of Teach for America in High School,” 2007.
21
1 .8
.8
.4
.6
.6 .4
0
.2
.2
share of teachers still employed at hiring school
1
Figure 1 Teacher Retention at TFA Schools Observational Sample
1
2
3
4
5
years after hiring TFA Hires
Non−TFA Hires
Notes: This graph plots survival curves for newly-hired math and reading teachers in grades 3-8 at North Carolina public schools that hired TFA corps members between 1997 and 2012.
22
Figure 2 Student Exposure to Teacher Hiring Observational Sample
B. Non−TFA Hires
share of students with TFA hires
share of students with non−TFA hires
.4
.6
.8
1
0
.2
.8
1
−.2 −.4 −1
−.8
−.6
average lagged score
−.4 −.6 −1
−.8
average lagged score
.6
slope = −0.10σ
−.2
slope = −0.60σ
C. TFA Hires
D. Non−TFA Hires
residualized TFA hire share −.1
0
.1
residualized non−TFA hire share .2
.3
−.2
0
.2
.4
.6
.8
0
−.2
0
−.3
−.4 −.6 −.8 −1
−1
−.8
−.6
−.4
average lagged score
−.2
slope = −0.005σ
−.2
slope = −0.006σ average lagged score
.4
0
.2
0
0
A. TFA Hires
Notes: These figures depict the relationship between students’ exposure to teacher hiring and their prior achievement. Each graph plots average lagged reading scores in cohort-school-year cells against the cell share of students taught by newly-hired reading teachers. Each dot depicts the average for bins of width .05 on the horizontal axis with best fit lines estimated on the underlying cell-level data. Panels A and B present the unconditional correlations, which show that new hires served cells with lower lagged achievement. Panels C and D show that there is no correlation between teacher hiring and lagged achievement across cells within school-years. These panels plot averaged lagged scores against binned residuals from regressions of the hiring shares on schoolyear and cohort-year effects, as in equation (9). The plotted bins are censored at the 1st and 99th percentiles of the underlying cell-level data to restrict the horizontal axis range.
23
Figure 3 New Hire Effects on Test Scores Observational Estimates Pooled Subjects
1.1
.15 .05
.1 y−3
y−2
y−1
y
y+2 y−3 C. TFA Difference
y+1
−.9
−.1
−.9
−.05
0
effect on test score
.05 0 −.1
−.05
effect on test score
.1
1.1
B. TFA Hires
.15
A. Non−TFA Hires
y−2
y
y+1
y+2
1.1
year
.15
year
y−1
1.1
.05 0
yy −−3 2
y−2
y −y1− 1
y
y
y+1
y y++ 12
−.9
−.9
y−3
−.1
−.1
−.05
effect on test score
.1
.15
effect on test score −.05 0 .05 .1
C. TFA Difference
y+2
year
year TFA Hires
Non−TFA Hires
TFA Difference
Notes: These graphs plot the effects of newly-hired teachers on test scores. Green lines depict effects for TFA hires, black lines plot effects for non-TFA hires, and orange lines show the TFA difference. Each point is the coefficient from a regression of cohorts’ average score in year y ± ∆ on the share of students taught by newly-hired teachers in year y. These estimates pool effects on math and reading, so all specifications control for school-subject-year and cohort-subject-year effects as in equation (10). Standard errors are clustered by school, and whiskers indicate 95 percent confidence intervals.
24
Figure 4 The Tenure Profile of Teacher Effects Observational Estimates
2
.3 .15 .1
3
−.9 0
2
3
4
5
8
1.1 1.1
.3 .15 .1 .05 0 −.15 −.1 −.05
y− 37 6
−.9
−.1
effect on test scores
.2
.25
effect on test score −.05 0 .05 .1
.3 .25 .2 .15 .1 .05 0
effect on test scores
−.15 −.1 −.05
1
3
D. Reading, Long Run 1.1
C. Math, Long Run
0
2
years of tenure C. TFA Difference
.15
years of tenure
1
y9 − 2
y−1
years of tenure
0
1
y2
3
year TFA Hires
Non−TFA Hires
4
y5 + 16
−.9−.9
1
.05 −.15 −.1 −.05
−.9 0
0
effect on test scores
.2
.25
.3 .25 .2 .15 .1 .05 0 −.15 −.1 −.05
effect on test scores
1.1
B. Reading, Short Run 1.1
A. Math, Short Run
7
y +92
8
years of tenure
TFA Difference
Notes: These graphs plot teacher effects by tenure, estimated by equation (7). The reference group in panels A and B is non-TFA teachers with four or more years of tenure; in panels C and D, the reference group is non-TFA teachers with 10 or more years of tenure. Panels C and D are estimated on the subsample of data from 2004-2012 so that teachers hired prior to 1994 can be top-coded at 10 years of tenure. TFA tenure is top-coded at three years in all specifications due to limited observations above that value. Standard errors are clustered by school, and dashed lines plot 95 percent confidence intervals. Figure A3 shows that these tenure treatments are uncorrelated with lagged achievement.
25
B. Reading
0
1
2
3
.1 .05 0 −.05
5
y6 − 2
y−1
years since initial hire
0
year TFA Hires
1
y
2
3
y + 14
−.9−.9
−.15
y− 3 4
−.9
−.15
−.1
−.1
effect on test scores
0 −.05 −.1
effect on test scores
.05
effect on test score −.05 0 .05 .1
.1
1.1
A. Math
1.1 1.1
.15
Figure 5 C. TFA Difference Expected Teacher Quality
5
y +62
years since initial hire
Non−TFA Hires
TFA Difference
Notes: These graphs plot expected teacher quality when schools continually fill vacancies with either TFA corps members or nonTFA teachers. Each point averages the tenure-specific teacher effects reported in panels A and B of Figure 4 by the probability that schools employ a teacher of the given tenure-level in each year. As in Figure 4, effects are normalized relative to the average impact of non-TFA teachers with three or more years of tenure.
26
B. Reading
0
1
2
3
.1 .05 0 −.05
5
y6 − 2
y−1
years since initial hire
0
year TFA Hires
1
y
2
3
y + 14
−.9−.9
−.15
y− 3 4
−.9
−.15
−.1
−.1
effect on test scores
0 −.05 −.1
effect on test scores
.05
effect on test score −.05 0 .05 .1
.1
1.1
A. Math
1.1 1.1
.15
Figure 6 TFA Difference One-Shot TFAC. Hiring Simulation
5
y +62
years since initial hire
Non−TFA Hires
TFA Difference
Notes: These graphs simulate a one-shot deviation from a policy of hiring non-TFA rookie teachers. The black lines plot expected teacher quality when schools continually fill vacancies with non-TFA rookie teachers. The green lines plot expected teacher quality if a school hires one corps member and then returns to the non-TFA rookie policy whenever the corps member leaves. Each point averages tenure-specific teacher effects by the probability that schools employ a teacher of the given tenure-level in each year. As in Figure (4), effects are normalized relative to the average impact of non-TFA teachers with three or more years of tenure.
27
Table 1 Teacher Descriptive Statistics Observational Sample
All Schools New Returning Hires Teachers (1) (2)
TFA Schools TFA Non-TFA Returning Hires Hires Teachers (3) (4) (5)
share of full-time teacher-years
.22
.78
.02
.26
.72
female
.85
.89
.77
.81
.85
non-white
.21
.18
.19
.41
.40
10.24
16.65
.49
9.89
15.24
highly competitive college graduate
.31
.30
.79
.26
.23
traditional teaching license
.77
.86
.08
.69
.77
years of experience credit
6.67
13.33
.06
6.25
11.93
zero experience credit
.30
.00
.94
.33
.00
36,491
43,587
30,980
35,993
42,411
2,309 77,413
2,366 76,404
123 635
146 9,475
147 8,278
years since college graduation
annual state salary sample schools teachers
Notes: This table describes the personnel who taught math and reading in grades 3-8 at North Carolina public schools between 1997 and 2012. Column 1 reports means for newly-hired teachers -- those working in their first year at a given school -- and column 2 describes teachers who returned to a prior employer. Columns 3-5 describe the subsample at schools that hired TFA corps members during this period. Means are weighted by full-time equivalence, and sample sizes count distinct teachers. College rankings come from the Barron's Profile of American Colleges (2009); "highly competitive" includes the top three ranks. Traditional teaching licenses are credentials earned through accredited North Carolina colleges or interstate reciprocity; I use the initial license earned for each teacher. Experience credit is a proxy for teaching experience that includes years in non-classroom positions that accrue credit on the state salary schedule. Salaries are scaled in 2010 dollars.
28
Table 2 Student Covariates and Exposure to Teacher Hiring Observational Sample
No Hire Mean (1)
Difference by New Hire Share No School, School-Year Controls Cohort, and Year and Cohort-Year TFA Non-TFA TFA Non-TFA TFA Non-TFA Hires Hires Hires Hires Hires Hires (2) (3) (4) (5) (6) (7)
lagged math score
-.33 [.95]
-.496 *** -.068 ** (.087) (.034)
-.131 ** (.053)
-.004 (.016)
-.006 (.033)
.010 (.013)
lagged reading score
-.31 [.95]
-.536 *** -.099 *** (.082) (.033)
-.122 *** -.016 (.046) (.013)
-.018 (.028)
-.005 (.011)
.004 (.007)
.001 (.002)
.005 (.008)
-.001 (.002)
female
.49
.002 (.007)
White
.33
-.465 *** -.078 *** (.060) (.025)
-.041 *** -.011 (.015) (.007)
-.003 (.005)
-.000 (.002)
Black
.53
.350 *** .079 *** (.055) (.024)
.038 ** (.016)
.016 *** (.006)
.012 * (.007)
.000 (.002)
Hispanic
.10
.106 *** -.004 (.034) (.009)
.018 (.012)
-.005 (.004)
-.003 (.006)
-.002 (.001)
subsidized lunch
.65
.322 *** .020 (.047) (.024)
.055 ** (.024)
.010 (.012)
-.004 (.006)
-.005 * (.003)
limited English proficiency
.06
.064 *** -.005 (.021) (.006)
.015 * (.008)
-.004 (.003)
-.004 (.005)
-.001 (.001)
special education
.09
.001 (.009)
.006 (.007)
-.003 (.002)
-.002 (.005)
-.002 (.002)
gifted
.10
-.086 *** -.036 *** (.022) (.008)
-.015 (.010)
-.005 * (.003)
-.004 (.006)
-.004 * (.002)
F(20, 146) p sample schools exams
.000 (.002)
.001 (.003)
-
6.08 .00
2.20 .07
.54 .70
144 334,875
147 1,676,107
147 1,676,107
147 1,676,107
Notes: This table presents tests of student covariate balance in the presence and absence of newly-hired teachers. Column 1 reports the sample mean for school-cohort-year cells with no new hires. Standard deviations for continuous values are in brackets. The remaining columns present regressions of the given covariate on the cell share of students taught by TFA and non-TFA hires. The specification in columns 2 and 3 includes no other controls. Columns 4 and 5 control for school, cohort, and year effects. Columns 6 and 7 control for second-order interactions of these terms: school-year and cohort-year. The sample stacks exams in math and reading, so all control are interacted with a subject indicator. I impute missing values using cell means. Table A2 presents corresponding estimates for the subsample with non-missing data for all variables. Standard errors are clustered by school and reported in parentheses. Degrees of freedom in the Wald tests of joint significance reflect this clustering.
29
Table 3 New Hire Effects on Test Scores Observational Estimates
No Controls (1)
School-Year and Cohort-Year School, No Lagged Student Cohort, Student Lagged Scores, Fixed and Year Covariates Scores Demog. Effects (2) (3) (4) (5) (6) A. Math
TFA hire share
-.525 *** (.079)
-.054 (.045)
-.005 (.034)
-.000 (.034)
-.001 (.033)
-.006 (.032)
non-TFA hire share
-.198 *** (.031)
-.119 *** (.016)
-.107 *** (.010)
-.110 *** (.014)
-.110 *** (.013)
-.096 *** (.012)
TFA difference
-.328 *** (.072)
.065 (.044)
.102 *** (.033)
.110 *** (.035)
.110 *** (.034)
.091 *** (.032)
R2 exams
.004 840,995
.062 840,995
.096 840,995
.625 840,995
.642 840,995
.874 840,995
B. Reading TFA hire share
-.615 *** (.084)
-.151 *** (.047)
-.043 (.027)
-.038 (.027)
-.036 (.024)
-.038 * (.020)
non-TFA hire share
-.138 *** (.035)
-.071 *** (.016)
-.052 *** (.010)
-.051 *** (.011)
-.053 *** (.010)
-.043 *** (.009)
TFA difference
-.476 *** (.085)
-.080 (.050)
.009 (.029)
.013 (.026)
.017 (.024)
.005 (.021)
.062 835,112
.086 835,112
.593 835,112
.615 835,112
.861 835,112
R2 exams
.003 835,112
Notes: This table reports estimates from regressions of student test scores on the new hire share of teachers in each school-cohort-year cell. Specifications are estimated separately by subject: panel A reports effects on math scores, and panel B reports effects on reading. Regression controls vary across columns. Column 1 has no controls; column 2 adds fixed effects for school, cohort, and year; and columns 3-6 control for school-year and cohort-year effects. Columns 4-6 add student-level controls. Lagged score controls are grade-interacted cubics in lagged scores from each subject. Demographic controls are binary indicators for sex, race, parental education, subsidized lunch, limited English proficiency, special education, and gifted status. All controls include dummies for missing values, which are imputed using cell means. Column 6 uses student fixed effects in lieu of lagged score and demographic controls. Standard errors are clustered by school and reported in parentheses.
30
Table 4 Validating Observational Estimates
Experimental All Obs. Grades Grades (1) (2)
Observational All Years OLS Years X-Cohort X-Cohort OLS (3) (4) (4) A. Math
TFA difference
.058 ** (.028)
.094 ** (.042)
N
3,476
1,480
.082 *** (.027) 840,995
.083 *** .099 *** (.030) (.017) 367,088
262,509
B. Reading TFA difference
.040 (.027)
-.001 (.032)
.034 (.022)
N
3,476
1,480
835,112
.040 ** (.020) 364,193
.008 (.008) 246,316
Notes: Column 1 reports the average TFA effect in the experimental sample, pooling data from all studies and grades. Column 2 restricts the experimental sample to the observational grades (3-8). Columns 3-5 estimate the corresponding average TFA effect in the observational sample, weighting the tenure-specific observational effects by the experimental tenure distribution. Column 3 uses the cross-cohort estimates from the full observational sample (1997-2012). Column 4 uses cross-cohort estimates from the student-teacher matched years (2006-2012), and column 5 uses the OLS estimates from the student-teacher matched subsample. See Table 3 for specification details.
31
5,000 4,000
0
0
50
1,000
100
2,000
150
3,000
200
1990
1995
2000
2005
Total
2010
North Carolina
32
2015
new corps members in North Carolina
250
new corps members nationwide
300
6,000
Figure A1 Teach for America’s Expansion
Figure A2 Within-School Exposure to Teacher Hiring Observational Sample
residualized TFA hire share
residualized non−TFA hire share
.1
.2
.3
.4
.5
−.2
.2
.4
.6
.8
1
−.4 −.6 −.8 −1
−1
−.8
−.6
−.4
average lagged score
−.2
slope = 0.001σ
−.2
slope = −0.217σ average lagged score
0
0
0
B. Non−TFA Hires
0
−.1
A. TFA Hires
Notes: These figures depict the relationship between students’ exposure to teacher hiring and their prior achievement. Each graph plots average lagged reading scores in cohort-school-year cells against the cell share of students taught by newly-hired reading teachers. Each dot depicts the average for bins of width .05 on the horizontal axis with best fit lines estimated on the underlying cell-level data. Panels A and B of Figure 2 present the unconditional correlation between teacher hiring and lagged achievement. The graphs shown here plot averaged lagged test scores against binned residuals from regressions of the hiring shares on school effects, as in equation (8).
33
Figure A3 Testing for Effects on Lagged Achievement by Teacher Tenure Observational Sample
1
2
.3 .1 0
3
−.9 0
years of tenure
1
.15
0
1
2
3
4
5
.3 −.1
0
.1
.2 8
y9 − 2
y−1
years of tenure
0
1
y2
3
year TFA Hires
Non−TFA Hires
4
y5 + 16
−.9−.9
−.3
y− 37 6
−.9
−.3
−.1
−.2
effect on test scores
0
.1
.2
effect on test score −.05 0 .05 .1
.3
1.1
D. Reading, Long Run
−.2
−.1
3
C. TFA Difference
C. Math, Long Run
effect on test scores
2
years of tenure
1.1 1.1
0
−.3
−.9
−.2
−.1
effect on test scores
.2
.3 .2 .1 0 −.1 −.3
−.2
effect on test scores
1.1
B. Reading, Short Run 1.1
A. Math, Short Run
7
y +92
8
years of tenure
TFA Difference
Notes: These graphs plot effects on lagged achievement by tenure. They serve as a placebo test for the estimates in Figure 4. See Figure 4 for specification details.
34
Table A1 Sample Restrictions Observational Sample
Analysis Sample Teacher-Matched Years X-Cohort OLS (4) (4)
All Schools (1)
TFA Schools (2)
All Years (3)
normalized score
.00
-.31
-.31
-.35
-.29
female
.49
.49
.49
.49
.50
non-Hispanic white
.58
.29
.29
.23
.24
non-Hispanic black
.28
.55
.55
.54
.53
Hispanic
.08
.11
.11
.17
.16
other race
.06
.05
.05
.06
.06
parent has bachelor's degree
.29
.22
.22
---
---
subsidized lunch
.47
.64
.64
.70
.69
limited English proficiency
.04
.06
.06
.09
.08
special education
.10
.09
.09
.08
.07
gifted
.14
.09
.09
.08
.10
2,354 19,918,388 1997-2012
153 1,679,031 1997-2012
147 1,676,107 1997-2012
145 731,281 2006-2012
144 495,264 2006-2012
sample schools exams years
Notes: This table describes North Carolina public school students who were tested in math and reading in grades 3-8 between 1997 and 2012. Column 1 contains all tests with non-missing scores from the first administration for each student-subject-year. Column 2 describes tests from schools that hired TFA corps members, excluding charter schools. The cross-cohort analysis sample in column 3 contains the subset from school-subject-cohort-year cells with non-missing personnel data. Columns 4 and 5 further restrict the cross-cohort sample to the years 2006-2012, during which students can be matched to their specific course instructors. Column 5 describes students with unique teacher matches for the tested subject and non-missing lagged scores in both subjects. I impute missing values using cell means. The following data are missing for all tests: parental education (2006-2012), subsidized lunch status (1997 and 2007), and gifted status (1997-1998). I impute these values using school means for all other years.
35
Table A2 Student Covariates and Exposure to Teacher Hiring Observational Sample
No Hire Mean (1)
Difference by New Hire Share No School, Second-Order Controls Subject, and Year Interactions TFA Non-TFA TFA Non-TFA TFA Non-TFA Hires Hires Hires Hires Hires Hires (2) (3) (4) (5) (6) (7)
lagged math score
-.24 [.96]
-.556 *** -.102 *** (.080) (.032)
-.114 ** (.049)
lagged reading score
-.24 [.98]
-.599 *** -.121 *** (.077) (.033)
.017 (.029)
.004 (.011)
-.134 *** -.037 *** (.040) (.012)
-.003 (.023)
-.004 (.009)
.008 (.007)
-.001 (.002)
.010 (.008)
-.003 (.003)
female
.50
.008 (.007)
White
.34
-.486 *** -.078 *** (.063) (.026)
-.040 *** -.010 (.015) (.007)
-.003 (.005)
-.002 (.002)
Black
.52
.342 *** .077 *** (.057) (.025)
.025 (.016)
.014 ** (.006)
.007 (.008)
.000 (.003)
Hispanic
.09
.132 *** -.002 (.037) (.009)
.029 ** (.014)
-.004 (.004)
.002 (.007)
-.002 (.001)
subsidized lunch
.64
.352 *** .022 (.050) (.024)
.063 ** (.026)
.010 (.013)
-.006 (.006)
-.004 (.003)
limited English proficiency
.04
.079 *** -.003 (.021) (.005)
.021 ** (.009)
-.003 (.003)
-.001 (.005)
-.001 (.002)
special education
.08
.002 (.008)
.006 (.006)
-.004 ** (.002)
-.004 (.005)
-.003 * (.002)
gifted
.11
-.094 *** -.041 *** (.023) (.009)
-.018 (.011)
-.006 * (.003)
-.004 (.007)
-.003 (.002)
F(20, 146) p sample schools exams
.000 (.002)
-.032 ** (.013)
-.001 (.003)
-
7.18 .00
4.77 .00
.51 .73
144 282,887
147 1,418,826
147 1,418,826
147 1,418,826
Notes: This table replicates the covariance balance tests from Table 2 on the subsample of students with non-missing data for all variables. Grade 3 students with missing lagged scores account for one-third of the excluded observations.
36
Figure B1 Exposure to Teacher Hiring Observational Sample
B. Non−TFA Hires
share of students with TFA hires
share of students with non−TFA hires
.4
.6
.8
1
0
.2
slope = −0.58σ
.6
.8
1
−.4 −1
−1
−.8
−.6
−.6
−.4
average lagged score
−.2
−.2
slope = −0.12σ
−.8
average lagged score
C. TFA Hires
D. Non−TFA Hires
residualized TFA hire share
residualized non−TFA hire share
−.1
0
.1
.2
.3
−.2
0
.2
.4
.6
.8
0
−.2
0
−.3
−.4 −.6 −.8 −1
−1
−.8
−.6
−.4
average lagged score
−.2
slope = 0.006σ
−.2
slope = 0.020σ average lagged score
.4
0
.2
0
0
A. TFA Hires
Notes: These figures depict the relationship between students’ exposure to teacher hiring and their prior math achievement. Figure 2 presents corresponding results for reading achievement, along with estimation notes.
37