The Dynamic Effects of Teach for America in HardtoStaff Schools Sally Hudson∗ November 28, 2015†
Abstract Randomized evaluations show that Teach for America (TFA) teachers outperform colleagues in boosting achievement at hardtostaff schools. Despite this crosssectional evidence, TFA’s longrun effects remain unknown, a key concern for policymakers. High turnover among TFA recruits – who commit to serve for just two years – may undercut the longrun returns to hiring nonTFA teachers, who improve steeply with experience. To assess this potential tradeoff, I measure the short and longrun effects of TFA hiring in North Carolina, where schools have employed TFA teachers since the program’s founding in 1990. I identify TFA hiring effects by exploiting quasirandom variation in teacher hiring shocks across grades within schools. In the short run, TFA rookies increase math scores markedly relative to the nonTFA teachers schools might otherwise hire; TFA’s initial advantage in reading is modest. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset the costs of lost experience, increasing longrun achievement. On the other hand, when TFA supply fluctuates, schools may have to replace exiting TFA teachers with inexperienced and lowerperforming nonTFA hires. On net, short run achievement gains from oneshot TFA hiring still exceed the costs.
∗ Department
of Economics, Massachusetts Institute of Technology. Email:
[email protected] I am grateful to my advisors, Joshua Angrist, David Autor, and Heidi Williams, for their invaluable guidance and support. I thank Raegen Miller, Rachel Perera, and Jason Atwood of Teach for America (TFA) for sharing their data and institutional knowledge; Kara Bonneau and Clara Muschkin of the North Carolina Education Research Data Center for assembling and distributing the state’s administrative records; and Mathematica Policy Research (MPR) and the Institute for Education Sciences (IES) for facilitating access to their replication files. I also thank Annice Correia, Andrew Dorner, and Mark Leary for their administrative and technical support. This project has benefitted from the feedback of many. Special thanks to Alex Bartik, Esther Duflo, Jon Gruber, Peter Hull, Ben Olken, Manisha Padi, Jim Poterba, Brendan Price, Ashish Shenoy, Michael Stepner, Melanie Wasserman, and Jeremy West for their generous comments. I gratefully acknowledge financial support from the National Science Foundation (NSF) Graduate Research Fellowship (under Grant No. 1122374) and the George and Obie Shultz Fund. The findings and conclusions expressed here are mine alone and do not necessarily reflect the views of the IES, MPR, NSF, or TFA. † The most recent draft can be found online at economics.mit.edu/files/11048.
1
Introduction Many districts struggle to attract effective teachers to lowperforming schools. Teach for America (TFA),
a highlyselective recruiting program, offers a potential solution. TFA trains novice teachers for five weeks and then places them in hardtostaff schools where they commit to teach for at least two years. Despite their inexperience, TFA teachers outperform colleagues in two randomized trials, boosting grade school math achievement by nearly .10σ and matching nonTFA teachers in reading (Chiang et al., 2014; Glazerman et al., 2006).1 Nonexperimental studies that use student covariates to mitigate selection bias report similar effects when comparing TFA to other rookie teachers in Florida (Hansen et al., 2014) and North Carolina (Henry et al., 2014; Xu et al., 2007), with more modest differences in New York City (Kane et al., 2008; Boyd et al., 2006). These shortterm gains notwithstanding, TFA’s critics argue that the program’s temporary staffing model has longrun costs (see, e.g., Heilig and Jez, 2014). Teacher quality improves steeply with experience (Rockoff, 2004; Rivkin et al., 2005; and, more recently, Papay and Kraft, 2015), so high turnover in the TFA pool may reduce overall teacher quality relative to policies that emphasize hiring and retaining traditional teachers. This paper documents the longrun consequences of TFA hiring taking turnover into account. TFA’s longrun impacts depend on how returns to tenure and experience vary with retention. I therefore integrate these key mediating variables in a model of timevarying TFA treatment effects. I estimate the longrun effects of TFA hiring using two decades of TFA placement records from North Carolina. North Carolina schools have been hiring TFA teachers since the program’s founding in 1990. Superintendent Ray Spain of rural Warren County championed TFA in an editorial to the Washington Post. “Like so many rural districts,” he wrote, “mine faces a true teacher shortage – particularly in subjects like math, science and special education. Teach For America helps to address this – offering our principals access to a national pipeline of diverse, accomplished candidates” (Mathews and Spain, 2013). But, barely an hour away, in the midsized city Durham, the school board recently voted to end its TFA contract, citing concerns over turnover. “The majority of the board felt like TFA, while they have quality candidates for sure, is more of a shortterm solution. The corps members don’t tend to stay,” said Board chair Heidi Carter. “The model 1A
third trial that focused on kindergarten and prekindergarten teachers released preliminary findings in 2015 that show no differential impacts (Clark et al., 2015). I revisit the results from all three studies in Section 2.2.
1
of TFA is just not one that we believe supports the quality of a longterm, highquality teaching force” (Luther, 2014). As other states debate ending, expanding, and scaling back TFA’s presence, evidence on the program’s longrun impacts are immediately policy relevant.2 I identify TFA hiring effects by exploiting the constant and capricious turnover in hardtostaff schools. The average TFA school in North Carolina loses more than a third of its nonTFA teachers every year, so whether they hire fourth grade teachers one year or fifth grade the next is potentially random from the perspective of cohorts advancing through the grades. I show that within schoolyears, cohorts’ exposure to both TFA and nonTFA hiring is uncorrelated with their prior achievement trends. Crosscohort variation in hiring shocks therefore identifies the causal effect of teacher hiring on student achievement.3 I find that TFA math teachers outperform the average counterfactual hire by .10σ in the hiring year. Impacts are larger when comparing TFA with other rookie hires (.15σ) and only slightly smaller relative to veterans (.08σ), who account for twothirds of all nonTFA hiring in my sample of TFA schools. TFA reading teachers perform on par with the average veteran hire and .03σ better than other rookies. I extend this empirical strategy to measure TFA’s longrun impacts. The results indicate that TFA teachers maintain their .10σ math advantage at every tenure level, with positive, though small and imprecise effects in reading. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset turnover, boosting steadystate achievement by .07σ in math and .03σ in reading. TFA may not provide a consistent source of staff, however. The program’s ranks have been thinning as the economic recovery improves employment options for recent college graduates (Rich, 2015). Schools that hire TFA may therefore risk having to replace TFA teachers with other new hires, who do perform worse than the counterfactual retained veteran. I show that these potential losses are small in magnitude. Shortrun gains from oneshot TFA hiring exceed negative effects on future cohorts. This study contributes to a growing literature on efforts to improve teacher quality in hardtostaff schools. State and federal agencies have long offered salary supplements, debt forgiveness, and housing subsidies to recruit teachers to highneed districts (Jacobson, 2006). While some studies show these incentives help attract and retain teachers (see, e.g., Clotfelter et al., 2008; Steele et al., 2010), evidence that teacher incentives improve student achievement is rare. Programs that pay teachers based on student test scores have 2 Pittsburgh
was the first U.S. school district to terminate a TFA recruiting contract (Belculfine, 2013). Durham was the second. To date, no others have followed suit, though Minneapolis is reducing its reliance on TFA teachers (Brandt, 2014). West Virginia, meanwhile, recently adopted new teacher licensing standards to facilitate TFA’s expansion (Vorhees and Marra, 2015). 3 Chetty et al. (2014) use similar crosscohort variation to develop a test for bias in conventional measures of teacher quality.
2
typically had little impact in U.S. schools (see, e.g., Goodman and Turner, 2013; Fryer, 2013; Springer et al., 2010),4 though Dee and Wyckoff (2015) find that large performance bonuses in Washington D.C. boost retention among qualifying teachers and improve their performance in later years. My quasiexperimental strategy allows me to directly measure TFA’s impacts on the hardtostaff schools that it serves. The next section provides additional background on TFA and briefly reviews findings from the randomized trials. I detail my empirical strategy in Section 3 and describe the North Carolina data in Section 4. Section 5 presents TFA effects in the hiring year, and Section 6 reports longrun impacts. Section 7 concludes.
2
Background
2.1
TFA’s Recruiting Model TFA has recruited more than 57,000 teachers since it first launched in 1990. The program’s founder,
Wendy Kopp, patterned TFA’s twoyear service model after the Peace Corps, so TFA teachers are known as corps members (Kopp, 2001). In the 20142015 school year, nearly 11,000 corps members served more than 500 districts in 37 states. These corps members account for a small fraction of the nation’s overall teaching force. U.S. K12 public schools employ roughly 3.1 million fulltime equivalent teachers and hire nearly 200,000 firsttime instructors every year, making TFA less than three percent of the annual inflow. TFA is a major supplier to some highpoverty urban and rural districts, however. New York City, Chicago, and Dallas each employ roughly 500 corps members per year, and several North Carolina rural districts fill more than one in five positions through TFA. Roughly one in three corps members work in charter schools, which are publiclyfunded schools that maintain greater flexibility to make staffing decisions than traditional public schools. TFA selects corps members through an intensive screening process, which Clark et al. (2015) detail. The vetting has four stages: an online application, a writing exercise, a phone interview, and a daylong, inperson interview that includes a mock teaching lesson. At each stage, reviewers assess applicants’ leadership and organizational skills; academic achievement and critical thinking; capacity to work in diverse 4 Fryer
et al. (2012) describe a noteworthy exception in Chicago. Performancebased teacher pay has been more consistently effective in other countries (Duflo et al., 2012; Muralidharan and Sundararaman, 2011; Lavy, 2002), though Glewwe et al. (2010) find effects are confined to the incentivized outcomes, with no impact on broader measures of student learning, as Holmstrom and Milgrom (1991) forewarn.
3
environments; and commitment to reducing educational inequality. TFA combines qualitative scores from each assessment in a quantitative model that guides selection decisions. The model uses student achievement data from past corps members to predict which applicants will become effective teachers.5 Selected applicants are typically highachieving recent college graduates. Between 2009 and 2012, the average corps member scored above the 90th percentile on the SAT and earned a 3.6 GPA in college. Nearly all were firsttime teachers, and fewer than one in five had postcollege work experience. TFA actively recruits minority and lowincome candidates. A third of corps members are racial or ethnic minorities, one third are firstgeneration collegegoers, and one in four are Pell Grant recipients. All told, TFA accepted just 15 percent of its nearly 50,000 annual applicants between 2009 and 2012. TFA assigns each accepted applicant to a region, subject, and grade level.6 These placements depend on both applicants’ preferences and districts’ staffing needs. Roughly 75 percent of accepted applicants join TFA after learning their assignment. Those who matriculate then apply for teaching positions in their assigned regions. Though corps members are not guaranteed jobs – they must apply and interview with other candidates – just one percent of corps members failed to obtain teaching positions in 2012.7 In exchange for the new recruits, districts pay TFA a finder’s fee that ranges from $2,000$5,000 per corps member for each of the first two service years. In addition, districts pay corps members salaries and benefits like any other employee. Since corps members are mostly uncertified, inexperienced, bachelor’s degreeholders, they typically start at the entrylevel salary for all teachers in the district. Roughly 87 percent of corps members complete their twoyear teaching commitments. Nearly all those who leave early do so before completing the first year. TFA does not maintain administrative data on corps members’ employment after the two service years, so the best evidence on longrun retention comes from alumni surveys. Donaldson and Johnson (2011) find that 40 percent of survey respondents stayed in their initial school for at least three years, and 15 percent were retained through year five. Many more corps members continue teaching after leaving their initial school. More than 60 percent of respondents taught somewhere in K12 schools for at least three years, and nearly one in four were still teaching after five years. Retention rates for nonTFA teachers nationwide are roughly twice as high – half of all new K12 instructors teach for at least five years (Ingersoll and May, 2012) – but comparable data on traditional teachers who start their careers at lowperforming schools is limited (see Donaldson and Johnson, 2011). 5 Dobbie
(2011) independently verifies that these scores predict variation in achievement impacts among corps members. are collections of districts in the same vicinity, roughly comparable in size to threedigit ZIP codes. 7 Accepted corps members who were not offered teaching positions typically failed state or district certification exams. 6 Regions
4
Administrative records from North Carolina allow me to present a more complete picture of teacher turnover in the hardtostaff schools that TFA serves.
2.2
Experimental Evidence Three randomized controlled trials (RCTs) compare corps members to other teachers within the same
schools (Glazerman et al., 2006; Chiang et al., 2014; Clark et al., 2015). The samples cover a wide range of districts, grades, and subjects, but all three trials were similar in design. They recruited schools employing TFA teachers and divided each school into grades. Within select grades, schools agreed to randomly assign students to teachers and test each student at the end of one year. In total, the trials included roughly 160 TFA teachers and 190 controls from 103 schools. Both trials that studied grade school math teachers found corps members outperformed control instructors (Glazerman et al.; Chiang et al.). Glazerman et al. find no differential impacts among grade school reading teachers, and Clark et al. estimate positive TFA reading effects in early elementary grades. Though random assignment delivers unbiased estimates of average TFA impacts within the RCT samples, the RCT data cannot be used to identify the longrun effects of hiring corps members. Crosssectional comparisons only capture longrun effects in a steady state, and the experimental data do not represent the steady state teaching staff at any school. Indeed, most schools selected just two or three teachers to participate. Among the chosen corps members, secondyears outnumber firstyears by nearly double, a sure sign the samples do not reflect a steady state. When faced with nonrandom sample selection, researchers typically investigate treatment effects in subgroups. Comparing TFA and control teachers by tenure, for example, may shed light on how teacher quality evolves over time, but the RCT samples are too thin to estimate tenurespecific treatment effects.8 Since both departure rates and returns to experience are highest in teachers’ early years, capturing yeartoyear dynamics is crucial for understanding the impact of teacher hiring policies at hardtostaff schools. Even small gaps in retention rates for inexperienced hires can generate meaningful differences in average teacher quality, as Staiger and Rockoff discuss (2010). I therefore turn my attention to identifying the longrun effects of TFA hiring using observational methods. 8 Each
trial reports subgroup effects by experience – not tenure – in coarse bins of three or five years.
5
3
Empirical Framework The ideal experiment for measuring TFA’s longrun effects would recruit schools with vacant teaching
positions and randomly select some schools to receive corps members. The remaining schools would hire from their usual nonTFA applicant pool. Over time, teachers would leave, and TFAtreated schools would receive more corps members, while control schools would hire nonTFA replacements. Comparing achievement across schools over time would reveal the longrun effects of continually employing TFA teachers. In this section, I model how TFA’s longrun effects depend on the relationship between teacher retention and the returns to tenure. In the absence of the ideal experiment, I can combine data on retention rates with tenurespecific estimates of TFA treatment effects to measure the longrun effects of interest.
3.1
A Motivating Model Consider a school with just one teaching position, which is vacant in the baseline year zero. Let Z ∈ {0, 1}
indicate whether the school is selected to receive corps members. The tenure of future teachers may depend on the school’s TFA treatment status. Let Ty (z) denote potential tenure of the year y teacher as a function of the initial hiring choice. A type z teacher with t years of tenure generates achievement A(z, t). The causal effect of hiring corps members on year y achievement can then be written δy
≡ A 1,Ty (1) − A 0,Ty (0) .
(1)
h i The goal of this paper is to estimate E δ y over time horizons y. Since all newlyhired teachers have zero tenure, by definition, the average effect of TFA hiring in the hiring year is simply E [δ0 ] = E [A(1, 0) − A(0, 0)] . Achievement in the next year depends on how teacher tenure evolves in response to the hiring decision. A Z ,T1 (Z) = T1 (Z)A(Z , 1) + [1 − T1 (Z)] A(Z , 0)
(2)
= A(Z , 0) +T1 (Z) [A(Z , 1) − A(Z , 0)]
(3)
The first term in equation (2) captures year one achievement if the initial hire is retained. The teacher has one year of tenure, and the school obtains achievement A(Z , 1). If the initial hire leaves, the school hires a new
6
teacher with zero tenure and obtains achievement A(Z , 0), as the second term shows. Rearranging terms in the next line provides another intuitive formulation. Achievement in year one is baseline achievement for a zero tenure teacher plus the return to one year of tenure if the teacher is retained. Define τ tz ≡ A(z, t) − A(z, 0) as the return to t years of tenure for a type z teacher.9 Substituting this expression into equation (3) and differencing across teacher types yields h i E δ1
o n oi A(1, 0) +T1 (1)τ11 − A(0, 0) +T1 (0)τ10 h i h i h i = E δ0 + E T1 (1)τ11 − E T1 (0)τ10 . = E
hn
(4)
In words, the average return to TFA hiring after one year depends on the average difference in initial performance, E [δ0 ], and the difference in average returns to tenure weighted by the probability of retention, h i E T1 (z)τ1z . Average returns over longer time horizons take the same generic form. I focus here on identifying these components.
3.2
Identifying TFA Hiring Effects The primary challenge to identifying the terms in equation (4) is correlation between teachers’ observed
traits and their students’ potential achievement. This is the usual selection bias problem when measuring how teacher quality varies with credentials like tenure and TFA affiliation. Teachers with different credentials serve different schools and, within schools, different students, so the raw correlation between teacher traits and student outcomes confounds teacher quality with students’ latent ability. I address selection bias by leveraging quasirandom variation in teaching assignments at high turnover schools. Each year, North Carolina’s TFA schools lose more than a third of their teachers, on average. Another third transfer between grades within schools from yeartoyear. I show that in these chaotic staffing environments, teacher credentials are as good as randomly assigned across cohorts in each year. Specifically, I document that within schoolyears, cohorts’ exposure to teachers with varying tenure and TFA status is uncorrelated with their prior achievement trends. Crosscohort variation in teaching assignments therefore h i identifies the initial TFA hiring effect, E [δ0 ], and returns to tenure, E τ1z . If returns to tenure were constant h i across schools, then the joint terms, E T1 (z)τ1z , could be decomposed into the product of retention rates, 9 Defined in this way, the return to tenure combines both returns to total experience and returns to schoolspecific experience. Though labor economists have devoted considerable attention to disentangling these forces (see, e.g., Abraham and Farber, 1987), their sum is the relevant return to the employer.
7
h i E [T1 (z)], and returns to tenure, E τ1z , so that h i E δ1
h i h i h i h i h i = E δ0 + E T1 (1) · E τ11 − E T1 (0) · E τ10 .
Given data on retention rates, all of the components of equation (4) can then be identified. In principle, however, returns to tenure may vary across schools, and that variation may well be correlated with retention rates. Teachers that expect to improve may be more likely to stay.10 To address this second identification challenge, I divide schools into quartiles by retention rates so that retention is nearly h i h i h i constant within groups, and the decomposition E T1 (z)τ1z ≈ E T1 (z) · E τ1z holds approximately. I then estimate returns to tenure separately for each quartile, q, and average effects across quartiles to obtain the h i joint terms, E T1 (z)τ1z : 4
h i E T1 (z)τ1z
≈
h i 1X E T1 (z)  Q = q · E τ1z  Q = q . 4 q=1
(5)
In practice, this step has little empirical consequence. Turnover rates are fairly uniformly high in my sample of North Carolina’s TFA schools, and returns to tenure vary little across the higher and lower turnover schools among them.
3.3
Estimating TFA Hiring Effects I estimate TFA hiring effects using regressions of the following general form: A¯ csy
= β F¯csy +γ N¯ csy + α sy + ϕ cy + ε csy .
(6)
The outcome, A¯ csy , is the average test score for cohort c at school s in year y, and the treatments, F¯csy and N¯ csy , are the share of students taught by firstyear TFA teachers and nonTFA hires, respectively. The omitted treatment is the share of students taught by returning teachers, so βˆ and γˆ compare new hires to returners, and δˆ0 = βˆ − γˆ captures the TFA hiring difference. The schoolyear effects, α sy , generate crosscohort comparisons within schoolyears, and cohortyear effects, ϕ cy , boost precision. In principle, these regressions could also include cohortschool effects – the third pairwise interaction between the panel 10 Some schools may attract teachers with steeper tenure profiles. Some schools may better develop the teachers they manage to hire. Either scenario would result in correlation between retention rates and returns to tenure across schools.
8
dimensions. In practice, cohortschool effects reduce power without altering the tests of the identifying assumptions.11 If the hiring treatments are as good as randomly assigned in equation (6), then hiring should be uncorrelated with cohorts’ pretreatment characteristics conditional on schoolyear and cohortyear effects. Section 5 reports several identification tests in this spirit. In particular, I present event study plots which show that hiring shocks in year y affect lead scores, A¯ cs ,y+∆ , but not lagged scores, A¯ cs ,y −∆ . These plots provide graphical evidence that, within schoolyears, cohorts would follow parallel achievement path in the absence of hiring shocks. Equation (6) also has an instrumental variables (IV) interpretation. Consider the underlying studentlevel microdata in each cohortschoolyear. Let Ficsy and Nicsy denote binary treatments that identify students, i, taught by newlyhired teachers. The average treatments, F¯csy and N¯ csy , are the firststage fitted values from a twostage least squares (2SLS) system that uses the full set of cohortschoolyear indicators to instrument for these studentlevel treatments. A icsy
= βFicsy +γNicsy + µ sy +ν cy +η icsy
Ficsy
= λ csy + µ sy +ν cy +η icsy
Nicsy
n n n = λ csy + µ nsy +ν cy +η icsy
f
f
f
f
As Angrist (1988) shows, 2SLS estimation of this system delivers the same estimates of βˆ and γˆ as weighted least squares (WLS) estimation of equation (6), weighting each cohortschoolyear cell by its size. Viewing equation (6) through this IV lens offers another angle on identification. Cohortschoolyear instruments isolate the variation in studentteacher assignments that comes from celllevel hiring shocks. If students respond to celllevel shocks by sorting to teachers within cells – say, to avoid being placed in the new hire’s class – those margins of adjustment are embedded in the IV intenttotreat effects. Leveraging crosscohort variation therefore bypasses the primary selection bias problem in traditional analyses of teacher quality – the nonrandom sorting of students to teachers12 – and, arguably, captures the more policy relevant parameter. 11 Because
achievement data are only available for grades 38, most cohortschools are observed for just three years: grades 35 in elementary school and grades 68 in middle schools. 12 Rothstein (2010) and Chetty et al. (2014) provide detailed discussions of these concerns.
9
I extend the specification in (6) to estimate the returns to tenure for both TFA and nonTFA teachers. A¯ csy
=
3 X
t t β t F¯csy +γ N¯ csy + α sy + ϕ cy + ε csy .
(7)
t =1 t t are the share of teachers in each cohortschoolyear cell with t years of tenure. I and N¯ csy Here, F¯csy
top code TFA tenure at three years because corps members with four or more years are too rare to precisely estimate their impacts. Consequently, the omitted treatment in this group is nonTFA teachers with with four or more years of tenure. I document that exposure to the tenure treatments satisfies the same identification tests used to evaluate the estimate from equation (6); tenure appears to be as good as randomly assigned across cohorts just like hiring. I take this empirical strategy to two decades of teacher hiring records from North Carolina.
4
Setting and Data North Carolina was one of four states that hired teachers from TFA’s founding corps in 1990. Over the
next 25 years, TFA recruited nearly 2,700 teachers to the state. The North Carolina corps grew in lockstep with TFA’s expansion nationwide, as Figure A1 shows. In the 20142015 school year, 570 active corps members and more than 170 TFA alumni taught in North Carolina Public Schools.13 Roughly onethird of the North Carolina corps serves in CharlotteMecklenburg Schools, which ranks among the 20 largest districts in the country. Half are concentrated in rural districts to the northeast of the Research Triangle, and the remainder serve a cluster of four midsized cities to the west: Durham, Greensboro, High Point, and WinstonSalem.
4.1
Data Sources and Variable Definitions Data for this project come from two sources: TFA’s corps member rosters and administrative records
from the North Carolina Department of Public Instruction (DPI). The DPI data are archived at the North Carolina Education Research Data Center at Duke University. The database includes records from all North Carolina public schools between the 19941995 and 20132014 school years. I matched 95 percent of corps 13 TFA’s presence in North Carolina mirrors its share of the national teaching force: corps members make up roughly three percent of firsttime teachers annually.
10
members from these years to their state personnel records using Social Security Numbers.14 For concision, I hereafter refer to academic years by the fall in which they began so that 2013 denotes the 20132014 school year. I identify teachers using school master schedules, which list personnel assignments for every course offered in the school day. I define teachers as new hires in the first year they appear as full instructors in a school’s schedule; time spent training as a teaching assistant does not count toward tenure by this definition. Each course record contains a subject code and gives enrollment counts by grade. I can therefore calculate the share of enrollment taught by newlyhired teachers within schoolsubjectgradeyear cells. These are the treatment variables in equation (6). The online Appendix B further details their construction. In addition to the master schedules, DPI personnel data include teacher demographics, salaries, licenses, and postsecondary education. I measure achievement using studentlevel test scores from annual state exams in math and reading for grades 38. I normalize scores statewide within subject, grade, and year using the first exam administration for each student. The test records also contain standard demographic variables, including race, sex, limited English Proficiency (LEP), subsidized lunch, special education, and gifted status.
4.2
Sample Restrictions and Summary Statistics I construct my analysis sample starting with the raw exam files from 19972012.15 Table A1 details the
sample selection criteria. I keep all records with nonmissing scores from the first test administration for each studentsubjectgradeyear. I then drop charter schools, which do not participate in the staterun salary database and therefore have limited personnel data. Less than five percent of North Carolina corps members taught in charter schools during this period. Among the remaining schools, I select the 153 that hired at least one firstyear TFA corps member. Not all of those corps members taught math or reading in grades 38; I include the 24 schools that hired 212 TFA teachers for other grades and subjects to boost precision. I drop less than one percent of the remaining test records from schools with missing master schedule data for the corresponding subject, grade, and year. The resulting analysis sample contains 147 schools and 1,676,107 exams from more than 300,000 distinct students. 14 The
majority of missing records come from charter schools, which do not process salaries through the state. The state salary database provides the only crosswalk between DPI personnel identifiers and SSNs in some years. I therefore exclude charter schools from my analysis, as I describe below. 15 The tested grades were inconsistent prior to 1997, and master schedules were unavailable for 2013.
11
In keeping with TFA’s mission, corps members served schools with mostly lowincome, lowperforming students. Nearly twothirds of students in the analysis sample qualified for subsidized lunches, as shown in column 3 of Table A1. Average test scores were onethird of a standard deviation below the state mean, putting the average TFA school in the bottom quartile of schools statewide.16 The students were also disproportionately nonwhite. More than 60 percent were racial minorities in a state where nonwhites account for less than 40 percent of all public school enrollment. The sampled schools hired 635 corps members to teach math and reading in grades 38. Column 3 of Table 1 describes these teachers. Like most TFA recruits, North Carolina corps members were recent graduates of selective colleges. They averaged less than half a year removed from college graduation, and nearly 80 percent attended “highly competitive” schools, as measured by the Barron’s Profile of American Colleges.17 Corps members were also inexperienced. Nearly 95 percent were firsttime teachers, and less than 10 percent held formal teaching credentials.18 The nonTFA hires – more than 9,000 total – were older and more experienced, as column 4 reveals. They were 10 years out of college, on average, and less than oneinthree were firsttime teachers. Existing TFA evaluations have neglected this fact. Researchers typically compare TFA to other inexperienced teachers, but the counterfactual hire is rarely a rookie. At North Carolina TFA schools, more than one in four nonTFA hires had already taught for 10 or more years. In stark contrast to corps members, however, less than one in four of the counterfactual hires graduated from a “highly competitive” college. Turnover rates at these hardtostaff schools were high, even among nonTFA teachers. Figure 1 shows that barely 60 percent of nonTFA hires returned for a second year, and only 20 percent stayed through year five – much lower than national figures on persistence in teaching would suggest. Initial retention was higher among corps members due to their twoyear commitment: 82 percent returned for a second year, but just one in five reupped for a third. Nearly all corps members left their placement schools within five years. These findings underscore the importance of measuring annual returns to tenure when assessing teacher hiring options for hardtostaff schools. Most new hires leave after two years with or without TFA. 16 The distribution of average school performance is more compressed than the distribution of student achievement.
Less than 10 percent of schools have average scores below the first student quartile. 17 “Highly competitive” includes 191 colleges in the top three Barron’s ranks. There are five North Carolina colleges among them: Davidson, Duke, Elon, Wake Forest, and the University of North Carolina at Chapel Hill, which is the state’s flagship public institution. The other four are private schools. 18 DPI grants TFA corps members provisional licenses for their twoyear commitment. Many go on to earn permanent licenses through the state’s lateral entry licensing program, which allows teachers to complete coursework and testing requirements while they teach.
12
5
ShortRun Effects of Teacher Hiring on Student Achievement
5.1
Identification The summary statistics in Section 4 illustrate that high turnover schools have lowperforming students.
Within this sample of hardtostaff schools, newlyhired teachers served especially lowscoring children. This trend can be seen in panels A and B of Figure 2, which depict the negative correlation between students’ test scores in one year and their exposure to teacher hiring shocks in the next year. Each graph plots average lagged reading scores in cohortschoolyear cells, A¯ cs,y −1 , against the cell share of students taught by newlyhired reading teachers, F¯csy and N¯ csy .19 Each dot depicts the average for hiring share bins of width .05 with best fit lines estimated on the underlying celllevel data. The steep negative slope in panel A indicates that students in cells with newlyhired corps members performed much worse in the prior school year than cells staffed solely by returning teachers. The average lagged score in cells without new hires was .24σ, while cells taught solely by TFA hires averaged nearly three times lower (.71σ). NonTFA hiring was also negatively correlated with lagged achievement, though less steeply so. The gap in lagged achievement between cells with no new hires and all new hires was .10σ, as the slope in panel B indicates. These graphs show that naive comparisons of test scores among newlyhired and returning teachers likely overstate returners’ performance advantage. Students exposed to new hires would have lower test scores in the absence of hiring shocks, too. Most of the gap in lagged achievement between cells with and without new hires can be attributed to persistent differences in achievement across schools. Appendix Figure A2 presents these withinschool results. The vertical axis again measures average lagged reading scores in cohortschoolyear cells, and the horizontal axis plots binned residuals from regressions of the hiring shares on school fixed effects.20 The resulting best fit lines are the coefficients from a regression of average lagged scores on hiring shares that controls for school fixed effects: A¯ cs,y −1 = β F¯csy +γ N¯ csy + µ s +ν csy . 19 Appendix 20 The
Figure B1 present corresponding results that use lagged math scores to measure baseline achievement. regression that generates the residual TFA hiring share includes the nonTFA hiring share, and vice versa: F¯csy N¯ csy
= =
f f λ f N¯ csy + µ s +ν csy n λ n F¯csy + µ ns +ν csy .
13
(8)
Adding school fixed effects reduces the TFAhiring gradient substantially, from .60σ to .22σ, but the negative slope remains. Even within these hardtostaff schools, corps members served cohorts and years when incoming students were especially low achieving. Within schoolyears, however, hiring shocks were uncorrelated with cohorts’ prior achievement. Panels C and D of Figure 2 plot average lagged reading scores against binned hiring share residuals from models that control for both schoolyear and cohortyear effects A¯ cs,y −1 = β F¯csy +γ N¯ csy + α sy + ϕ cy + ε csy
(9)
Both graphs show tight linear fits through the binned residuals with precisely estimated zero slopes: .006σ for TFA hires and .005σ for nonTFA hires. These graphs establish that, with respect to prior achievement, hiring shocks were as good as randomly assigned across cohorts within schoolyears. Cohorts were balanced on a wide range of other pretreatment characteristics within schoolyears. Table 2 presents these balance tests, stacking data from both math and reading exams in the full analysis sample.21 Columns 2 and 3 report the unconditional correlation between the hiring shares and cohort demographics. Cells that hired corps members had far more nonwhite and low income students, as column 2 reports. They served more limited English proficient students and fewer gifted children – all strong predictors of lower achievement. NonTFA hires also served cells with lower predicted performance, though the gaps were not as stark. Columns 4 and 5 show that school fixed effects reduce but do not eliminate the residual correlation between hiring shocks and student demographics. Within schoolyears, however, covariates and hiring treatments were uncorrelated, as Columns 6 and 7 confirm. The coefficients in these columns are preciselyestimated zeros, and a Wald test of joint significance across all 10 traits fails to reject covariate balance with p = .73. Evidence that hiring treatments are conditionally independent of cohort observables bolsters the argument that hiring shocks are also conditionally independent of unobserved potential achievement. Crosscohort comparisons of test scores following hiring shocks should therefore capture the causal effect of teacher hiring on student achievement. 21 The
stacked model interacts all controls with a subject indicator to ensure that the identifying variation still comes from comparisons across cohorts, c, rather than across subjects, j, within cohorts. X¯ c jsy = β F¯c jsy +γ N¯ c jsy + α jsy + ϕ c jy + ε c jsy
14
(10)
5.2
Effects in the Hiring Year When schools hire nonTFA teachers, test scores drop in exposed cohorts. Panel A of Figure 3 presents
this result in event study form. Each point plots the estimated coefficient from a regression of cohorts’ average scores in year y ± ∆ on the nonTFA hiring share the cohort was exposed to in year y. A¯ c js,y ±∆ = β F¯c jsy +γ N¯ c jsy + α jsy + ϕ c jy + ε c jsy
(11)
These results pool effects on math and reading, so all specifications control for subjectschoolyear and subjectcohortyear effects as in equation (10). The three points to the left of the dotted line show effects on lagged scores; they confirm that exposure to teacher hiring shocks was uncorrelated with students’ prior achievement trends. The first point to the right of the dotted line shows the contemporaneous effect of nonTFA hiring. A oneunit increase in a cohort’s year y nonTFA hiring share was associated with a .082σ drop in average year y test scores relative to cohorts with all returning teachers. This hiring penalty appears to have some lasting effect on cohorts’ achievement. Test scores in exposed cohorts are significantly lower both and one and two years after the initial hiring shock. Cohorts exposed to new corps members do not pay the hiring penalty. This result can be seen in Panel B of Figure 3, which plots the coefficients on the TFAhiring share from equation (11). As in the previous graph, the first three points confirm that students exposed to corps members are on the same achievement trend as other cohorts prior to exposure. In contrast to Panel A, however, exposure to newlyhired corps members has no significant effect on test scores – either positive or negative. Since the omitted treatment in equation (11) is exposure to returning teachers, these results imply that newlyhired corps members perform as well, on average, as all other teachers with one or more year of tenure. The effect of TFA teachers relative to other new hires appears in Panel C. It shows that TFA teachers outperform other newlyhired teachers by roughly .05σ in the first posthiring year. Table 3 shows that TFA’s hiring advantage comes entirely from its differential effect on math achievement. Newlyhired corps members performed as well as the average returning math teacher, as the .005σ estimate in Panel A of column 3 indicates. In contrast, a one unit increase in cohorts’ nonTFA hiring share was associated with average math scores that were .107σ lower, leaving an average TFA hiring difference of .102σ. TFA hires had no measurable advantage in reading, however, as Panel B reveals. Both TFA and
15
nonTFA hiring were associated with small reductions in cohorts’ reading scores relative to cells with all returning teachers, but the difference between them was a small and insignificant .009σ. Table 3 also presents a series of robustness checks for the estimated hiring effects. Traditional valueadded estimates of teacher quality rely on student covariates to mitigate selection bias. Since student traits and hiring shares are uncorrelated within schoolyears, however, controlling for student covariates should have little effect on the crosscohort estimates. Columns 46 confirm this prediction. Column 4 adds the lagged score controls common to valueadded models of teacher quality (see, e.g., Chetty et al., 2014). Each specification controls for gradeinteracted cubics in lagged math and reading scores. These measures of past performance are strong predictors of future scores, boosting the model R2 from roughly .10 to .60 in both Panels A and B, and yet the treatment effect estimates move little. The TFA hiring advantage increases slightly from .102σ to .110σ in math and .009σ to .013σ in reading. The results are similarly robust to the inclusion of demographic controls: sex, race, parental education, subsidized lunch, limited English proficiency, special education, and gifted status. Substituting student fixed effects for lagged score and demographic controls in column 6 produces similar results. On balance, the TFA hiring advantage is a robust .1σ for math with zero difference in reading. In results not shown here, I find that impacts are larger when comparing TFA with other rookie hires (.15σ) and only slightly smaller relative to veterans (.08σ). TFA reading teachers perform on par with the average veteran hire and .03σ better than other rookies.
6
The LongRun Effects of TFA Hiring
6.1
The Tenure Profile of Teacher Effects Figure 5 reveals that corps members maintain their math advantage at every tenure level. Panel A plots
the tenure profile of math effects for both TFA and nonTFA hires, estimated via equation (7). NonTFA hires, shown in black, reduce math scores by .11σ relative to nonTFA teachers with four or more years of tenure, the omitted reference group. Experienced corps members, meanwhile, perform substantially better than veteran nonTFA teachers. TFA math instructors outperformed comparison hires by roughly .1σ at each step in the tenure profile. Panel B shows no differential impacts on reading achievement at any tenure level.
16
Table 4 validates the observational results against the experimental estimates. Column 1 reports the average TFA effect in the experimental sample, pooling data from Glazerman et al. and Clark et al..22 Column 2 restricts the experimental sample to the observational grades (38). As columns 35 show, weighting the tenurespecific observational effects by the experimental tenure distribution reproduces the average TFA effect in the observational sample.
6.2
Hiring Simulations Figure 5 uses the validated estimates to simulate expected teacher quality when schools continually fill
vacancies with either TFA corps members or nonTFA teachers. Each point averages the tenurespecific teacher effects reported in panels A and B of Figure 4 by the probability that schools employ a teacher of the given tenurelevel in each year. As in Figure 4, effects are normalized relative to the average impact of nonTFA teachers with three or more years of tenure. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset turnover, boosting steadystate achievement by .07σ in math and .03σ in reading. Figure 6 simulates a oneshot deviation from a policy of hiring nonTFA rookie teachers. The black lines plot expected teacher quality when schools continually fill vacancies with nonTFA rookie teachers. The green lines plot expected teacher quality if a school hires one corps member and then returns to the nonTFA rookie policy whenever the corps member leaves. Achievement gains during a corps member’s service years far exceed negative effects on future cohorts that receive replacement rookies.
7
Conclusion Randomized evaluations find TFA corps members have positive impacts on student achievement in hard
tostaff schools. Average effects in these select crosssections potentially mask TFA’s turnover dynamics. This paper develops and estimates a model of timevarying TFA treatment effects that incorporates teacher tenure as a key mediating variable. I identify tenurespecific treatment effects by exploiting quasirandom variation in teaching assignments across grades within schoolyears. I find that firstyear corps members perform as well as longtenure teachers in math instruction. Impacts are larger when comparing TFA with other rookie hires (.15σ) and only slightly smaller relative to veterans 22 Including data from Chiang et al. produces nearly identical findings. Those results are currently undergoing disclosure review with the Institute for Education Sciences.
17
(.08σ), who account for twothirds of all nonTFA hiring in my sample of TFA schools. TFA reading teachers perform on par with the average veteran hire and .03σ better than other rookies. I extend this empirical strategy to measure TFA’s longrun impacts. The results indicate that TFA teachers maintain their .10σ math advantage at every tenure level, with positive, though small and imprecise effects in reading. When schools replace exiting TFA teachers with new TFA recruits, these gains more than offset turnover, boosting steadystate achievement by .07σ in math and .03σ in reading. TFA may not provide a consistent source of staff, however. The program’s ranks have been thinning as the economic recovery improves employment options for recent college graduates (Rich, 2015). Schools that hire TFA may therefore risk having to replace TFA teachers with other new hires, who do perform worse than the counterfactual retained veteran. I show that these potential losses are small in magnitude. Shortrun gains from oneshot TFA hiring exceed negative effects on future students.
18
References Abraham, Katharine G. and Henry S. Farber, “Job Duration, Seniority, and Earnings,” American Economic Review, 1987, 77 (3), 278–297. Angrist, Joshua D., “Grouped Data Estimation and Testing in Simple Labor Supply Models,” 1988. Belculfine, Le, “Pittsburgh school board reverses on Teach for America contract,” Pittsburgh PostGazette, December 2013. Boyd, Donald, Pamela Grossman, Hamilton Lankford, Susanna Loeb, and James Wyckoff, “How Changes in Entry Requirements Alter the Teacher Workforce and Affect Student Achievement,” Education Finance and Policy, 2006, 1 (2), 176–216. Brandt, Steve, “Minneapolis trims use of muchdebated Teach for America,” Star Tribune, May 2014. Chetty, Raj, John N. Friedman, and Jonah E. Rockoff, “Measuring the Impacts of Teachers I: Evaluating Bias in Teacher ValueAdded Estimates,” American Economic Review, 2014, 104 (9), 2593–2632. Chiang, Hanley S., Melissa A. Clark, and Sheena McConnell, “Supplying Disadvantaged Schools with Effective Teachers: Experimental Evidence on Secondary Math Teachers from Teach For America,” 2014. Clark, Melissa A., Eric Isenberg, Albert Y. Liu, Libby Makowsky, and Marykate Zukiewicz, “Impacts of the Teach For America Investing in Innovation ScaleUp,” Technical Report, Mathematica Policy Research 2015. Clotfelter, Charles T., Elizabeth Glennie, Helen F. Ladd, and Jacob L. Vigdor, “Would higher salaries keep teachers in highpoverty schools? Evidence from a policy intervention in North Carolina,” Journal of Public Economics, 2008, 92 (56), 1352–1370. Dee, Thomas S. and James H. Wyckoff, “Incentives, Selection, and Teacher Performance: Evidence from IMPACT,” Journal of Policy Analysis and Management, 2015, 34 (2), 267–297. Dobbie, Will, “Teacher Characteristics and Student Achievement: Evidence from Teacher Surveys,” 2011. Donaldson, Morgaen L. and Susan Moore Johnson, “TFA Teachers: How Long Do They Teach? Why Do They Leave?,” Phi Delta Kappan, October 2011. Duflo, Esther, Rema Hanna, and Stephen P. Ryan, “Incentives Work: Getting Teachers to Come to School,” American Economic Review, 2012, 102 (4), 1241–1278. Fryer, Roland G., “Teacher Incentives and Student Achievement: Evidence from New York City Public Schools,” Journal of Labor Economics, 2013, 31 (2), 373–407. , Steven D. Levitt, John List, and Sally Sadoff, “Enhancing the Efficacy of Teacher Incentives Through Loss Aversion: A Field Experiment,” 2012. Glazerman, Steven, Daniel P. Mayer, and Paul T. Decker, “Alternative Routes to Teaching: The Impacts of Teach for America on Student Achievement and Other Outcomes,” Journal of Policy Analysis and Management, 2006, 25 (1), 75–96. Glewwe, Paul, Nauman Ilias, and Michael Kremer, “Teacher Incentives,” American Economic Journal: Applied Economics, 2010, 2 (3), 205–27. 19
Goodman, Sarena F. and Lesley J. Turner, “The Design of Teacher Incentive Pay and Educational Outcomes: Evidence from the New York City Bonus Program,” Journal of Labor Economics, 2013, 31 (2), 409–420. Hansen, Michael, Ben Backes, Victoria Brady, and Zeyu Xu, “Examining Spillover Effects from Teach for America Corps Members in MiamiDade County Public Schools,” 2014. Heilig, Julian Vasquez and Su Jin Jez, “Teach for America: A Return to the Evidence,” Technical Report, National Education Policy Center, Boulder, CO 2014. Henry, Gary T., Kevin C. Bastian, C. Kevin Fortner, David C. Kershaw, Kelly M. Purtell, Charles L. Thompson, and Rebecca A. Zulli, “Teacher preparation policies and their effects on student achievement,” Education Finance and Policy, 2014, 9 (3), 1–40. Holmstrom, Bengt and Paul Milgrom, “Multitask PrincipalAgent Analyses: Incentive Contracts, Asset Ownership, and Job Design,” Journal of Law, Economics, & Organization, 1991, 7 (2), 24–52. Ingersoll, Richard M. and Henry May, “The Magnitude, Destinations, and Determinants of Mathematics and Science Teacher Turnover,” Educational Evaluation and Policy Analysis, 2012, 34 (4), 435–464. Jacobson, Linda, “Teacher Pay Incentives Popular But Unproven,” Education Week, September 2006. Kane, Thomas J., Jonah E. Rockoff, and Douglas O. Staiger, “What Does Certification Tell Us About Teacher Effectiveness? Evidence from New York City,” Economics of Education Review, 2008, 27 (6), 615–631. Kopp, Wendy, One Day, All Children ... : The Unlikely Triumph of Teach for America and What I Learned Along the Way, New York: Public Affairs, 2001. Lavy, Victor, “Evaluating the Effect of Teachers’ Group Performance Incentives on Pupil Achievement,” Journal of Political Economy, 2002, 110 (6), 1286–1317. Luther, Joel, “Durham schools split with Teach for America,” Duke Chronicle, 2014. Mathews, Jay and Ray V. Spain, “North Carolina superintendent defends Teach For America,” Washington Post, March 2013. Muralidharan, Karthik and Venkatesh Sundararaman, “Teacher Performance Pay: Experimental Evidence from India,” Journal of Political Economy, 2011, 119 (1), 39–77. Papay, John P. and Matthew A. Kraft, “Productivity returns to experience in the teacher labor market: Methodological challenges and new evidence on longterm career improvement,” Journal of Public Economics, 2015. Rich, Motoko, “Fewer Top Graduate Want to Join Teach for America,” New York Times, February 2015. Rivkin, Steven G., Eric A. Hanushek, and John F. Kain, “Teachers, Schools, and Academic Achievement,” Econometrica, 2005, 73 (2), 417–458. Rockoff, Jonah E., “The Impact of Individual Teachers on Student Achievement: Evidence from Panel Data,” American Economic Review, 2004, 94 (2), 247–252. Rothstein, Jesse, “Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement,” Quarterly Journal of Economics, 2010, 125 (1), 175–214. 20
Springer, Matthew G., Dale Ballou, Laura Hamilton, ViNhuan Le, J.R. Lockwood, Daniel F. McCaffrey, Matthew Pepper, and Brian M. Stecher, “Teacher Pay for Performance: Experimental Evidence from the Project on Incentives in Teaching,” 2010. Staiger, Douglas O. and Jonah E. Rockoff, “Searching for Effective Teachers with Imperfect Information,” Journal of Economic Perspectives, August 2010, 24 (3), 97–118. Steele, Jennifer L., Richard J. Murnane, and John B. Willett, “Do Financial Incentives Help LowPerforming Schools Attract and Keep Academic Talented Teachers? Evidence from California,” Journal of Policy Analysis and Management, 2010, 29 (3), 451–478. Vorhees, Beth and Ashton Marra, “Is Teach for America Right for West Virginia?,” West Virginia Public Broadcasting, May 2015. Xu, Zeyu, Jane Hannaway, and Colin Taylor, “Making a Difference? The Effects of Teach for America in High School,” 2007.
21
1 .8
.8
.4
.6
.6 .4
0
.2
.2
share of teachers still employed at hiring school
1
Figure 1 Teacher Retention at TFA Schools Observational Sample
1
2
3
4
5
years after hiring TFA Hires
Non−TFA Hires
Notes: This graph plots survival curves for newlyhired math and reading teachers in grades 38 at North Carolina public schools that hired TFA corps members between 1997 and 2012.
22
Figure 2 Student Exposure to Teacher Hiring Observational Sample
B. Non−TFA Hires
share of students with TFA hires
share of students with non−TFA hires
.4
.6
.8
1
0
.2
.8
1
−.2 −.4 −1
−.8
−.6
average lagged score
−.4 −.6 −1
−.8
average lagged score
.6
slope = −0.10σ
−.2
slope = −0.60σ
C. TFA Hires
D. Non−TFA Hires
residualized TFA hire share −.1
0
.1
residualized non−TFA hire share .2
.3
−.2
0
.2
.4
.6
.8
0
−.2
0
−.3
−.4 −.6 −.8 −1
−1
−.8
−.6
−.4
average lagged score
−.2
slope = −0.005σ
−.2
slope = −0.006σ average lagged score
.4
0
.2
0
0
A. TFA Hires
Notes: These figures depict the relationship between students’ exposure to teacher hiring and their prior achievement. Each graph plots average lagged reading scores in cohortschoolyear cells against the cell share of students taught by newlyhired reading teachers. Each dot depicts the average for bins of width .05 on the horizontal axis with best fit lines estimated on the underlying celllevel data. Panels A and B present the unconditional correlations, which show that new hires served cells with lower lagged achievement. Panels C and D show that there is no correlation between teacher hiring and lagged achievement across cells within schoolyears. These panels plot averaged lagged scores against binned residuals from regressions of the hiring shares on schoolyear and cohortyear effects, as in equation (9). The plotted bins are censored at the 1st and 99th percentiles of the underlying celllevel data to restrict the horizontal axis range.
23
Figure 3 New Hire Effects on Test Scores Observational Estimates Pooled Subjects
1.1
.15 .05
.1 y−3
y−2
y−1
y
y+2 y−3 C. TFA Difference
y+1
−.9
−.1
−.9
−.05
0
effect on test score
.05 0 −.1
−.05
effect on test score
.1
1.1
B. TFA Hires
.15
A. Non−TFA Hires
y−2
y
y+1
y+2
1.1
year
.15
year
y−1
1.1
.05 0
yy −−3 2
y−2
y −y1− 1
y
y
y+1
y y++ 12
−.9
−.9
y−3
−.1
−.1
−.05
effect on test score
.1
.15
effect on test score −.05 0 .05 .1
C. TFA Difference
y+2
year
year TFA Hires
Non−TFA Hires
TFA Difference
Notes: These graphs plot the effects of newlyhired teachers on test scores. Green lines depict effects for TFA hires, black lines plot effects for nonTFA hires, and orange lines show the TFA difference. Each point is the coefficient from a regression of cohorts’ average score in year y ± ∆ on the share of students taught by newlyhired teachers in year y. These estimates pool effects on math and reading, so all specifications control for schoolsubjectyear and cohortsubjectyear effects as in equation (10). Standard errors are clustered by school, and whiskers indicate 95 percent confidence intervals.
24
Figure 4 The Tenure Profile of Teacher Effects Observational Estimates
2
.3 .15 .1
3
−.9 0
2
3
4
5
8
1.1 1.1
.3 .15 .1 .05 0 −.15 −.1 −.05
y− 37 6
−.9
−.1
effect on test scores
.2
.25
effect on test score −.05 0 .05 .1
.3 .25 .2 .15 .1 .05 0
effect on test scores
−.15 −.1 −.05
1
3
D. Reading, Long Run 1.1
C. Math, Long Run
0
2
years of tenure C. TFA Difference
.15
years of tenure
1
y9 − 2
y−1
years of tenure
0
1
y2
3
year TFA Hires
Non−TFA Hires
4
y5 + 16
−.9−.9
1
.05 −.15 −.1 −.05
−.9 0
0
effect on test scores
.2
.25
.3 .25 .2 .15 .1 .05 0 −.15 −.1 −.05
effect on test scores
1.1
B. Reading, Short Run 1.1
A. Math, Short Run
7
y +92
8
years of tenure
TFA Difference
Notes: These graphs plot teacher effects by tenure, estimated by equation (7). The reference group in panels A and B is nonTFA teachers with four or more years of tenure; in panels C and D, the reference group is nonTFA teachers with 10 or more years of tenure. Panels C and D are estimated on the subsample of data from 20042012 so that teachers hired prior to 1994 can be topcoded at 10 years of tenure. TFA tenure is topcoded at three years in all specifications due to limited observations above that value. Standard errors are clustered by school, and dashed lines plot 95 percent confidence intervals. Figure A3 shows that these tenure treatments are uncorrelated with lagged achievement.
25
B. Reading
0
1
2
3
.1 .05 0 −.05
5
y6 − 2
y−1
years since initial hire
0
year TFA Hires
1
y
2
3
y + 14
−.9−.9
−.15
y− 3 4
−.9
−.15
−.1
−.1
effect on test scores
0 −.05 −.1
effect on test scores
.05
effect on test score −.05 0 .05 .1
.1
1.1
A. Math
1.1 1.1
.15
Figure 5 C. TFA Difference Expected Teacher Quality
5
y +62
years since initial hire
Non−TFA Hires
TFA Difference
Notes: These graphs plot expected teacher quality when schools continually fill vacancies with either TFA corps members or nonTFA teachers. Each point averages the tenurespecific teacher effects reported in panels A and B of Figure 4 by the probability that schools employ a teacher of the given tenurelevel in each year. As in Figure 4, effects are normalized relative to the average impact of nonTFA teachers with three or more years of tenure.
26
B. Reading
0
1
2
3
.1 .05 0 −.05
5
y6 − 2
y−1
years since initial hire
0
year TFA Hires
1
y
2
3
y + 14
−.9−.9
−.15
y− 3 4
−.9
−.15
−.1
−.1
effect on test scores
0 −.05 −.1
effect on test scores
.05
effect on test score −.05 0 .05 .1
.1
1.1
A. Math
1.1 1.1
.15
Figure 6 TFA Difference OneShot TFAC. Hiring Simulation
5
y +62
years since initial hire
Non−TFA Hires
TFA Difference
Notes: These graphs simulate a oneshot deviation from a policy of hiring nonTFA rookie teachers. The black lines plot expected teacher quality when schools continually fill vacancies with nonTFA rookie teachers. The green lines plot expected teacher quality if a school hires one corps member and then returns to the nonTFA rookie policy whenever the corps member leaves. Each point averages tenurespecific teacher effects by the probability that schools employ a teacher of the given tenurelevel in each year. As in Figure (4), effects are normalized relative to the average impact of nonTFA teachers with three or more years of tenure.
27
Table 1 Teacher Descriptive Statistics Observational Sample
All Schools New Returning Hires Teachers (1) (2)
TFA Schools TFA NonTFA Returning Hires Hires Teachers (3) (4) (5)
share of fulltime teacheryears
.22
.78
.02
.26
.72
female
.85
.89
.77
.81
.85
nonwhite
.21
.18
.19
.41
.40
10.24
16.65
.49
9.89
15.24
highly competitive college graduate
.31
.30
.79
.26
.23
traditional teaching license
.77
.86
.08
.69
.77
years of experience credit
6.67
13.33
.06
6.25
11.93
zero experience credit
.30
.00
.94
.33
.00
36,491
43,587
30,980
35,993
42,411
2,309 77,413
2,366 76,404
123 635
146 9,475
147 8,278
years since college graduation
annual state salary sample schools teachers
Notes: This table describes the personnel who taught math and reading in grades 38 at North Carolina public schools between 1997 and 2012. Column 1 reports means for newlyhired teachers  those working in their first year at a given school  and column 2 describes teachers who returned to a prior employer. Columns 35 describe the subsample at schools that hired TFA corps members during this period. Means are weighted by fulltime equivalence, and sample sizes count distinct teachers. College rankings come from the Barron's Profile of American Colleges (2009); "highly competitive" includes the top three ranks. Traditional teaching licenses are credentials earned through accredited North Carolina colleges or interstate reciprocity; I use the initial license earned for each teacher. Experience credit is a proxy for teaching experience that includes years in nonclassroom positions that accrue credit on the state salary schedule. Salaries are scaled in 2010 dollars.
28
Table 2 Student Covariates and Exposure to Teacher Hiring Observational Sample
No Hire Mean (1)
Difference by New Hire Share No School, SchoolYear Controls Cohort, and Year and CohortYear TFA NonTFA TFA NonTFA TFA NonTFA Hires Hires Hires Hires Hires Hires (2) (3) (4) (5) (6) (7)
lagged math score
.33 [.95]
.496 *** .068 ** (.087) (.034)
.131 ** (.053)
.004 (.016)
.006 (.033)
.010 (.013)
lagged reading score
.31 [.95]
.536 *** .099 *** (.082) (.033)
.122 *** .016 (.046) (.013)
.018 (.028)
.005 (.011)
.004 (.007)
.001 (.002)
.005 (.008)
.001 (.002)
female
.49
.002 (.007)
White
.33
.465 *** .078 *** (.060) (.025)
.041 *** .011 (.015) (.007)
.003 (.005)
.000 (.002)
Black
.53
.350 *** .079 *** (.055) (.024)
.038 ** (.016)
.016 *** (.006)
.012 * (.007)
.000 (.002)
Hispanic
.10
.106 *** .004 (.034) (.009)
.018 (.012)
.005 (.004)
.003 (.006)
.002 (.001)
subsidized lunch
.65
.322 *** .020 (.047) (.024)
.055 ** (.024)
.010 (.012)
.004 (.006)
.005 * (.003)
limited English proficiency
.06
.064 *** .005 (.021) (.006)
.015 * (.008)
.004 (.003)
.004 (.005)
.001 (.001)
special education
.09
.001 (.009)
.006 (.007)
.003 (.002)
.002 (.005)
.002 (.002)
gifted
.10
.086 *** .036 *** (.022) (.008)
.015 (.010)
.005 * (.003)
.004 (.006)
.004 * (.002)
F(20, 146) p sample schools exams
.000 (.002)
.001 (.003)

6.08 .00
2.20 .07
.54 .70
144 334,875
147 1,676,107
147 1,676,107
147 1,676,107
Notes: This table presents tests of student covariate balance in the presence and absence of newlyhired teachers. Column 1 reports the sample mean for schoolcohortyear cells with no new hires. Standard deviations for continuous values are in brackets. The remaining columns present regressions of the given covariate on the cell share of students taught by TFA and nonTFA hires. The specification in columns 2 and 3 includes no other controls. Columns 4 and 5 control for school, cohort, and year effects. Columns 6 and 7 control for secondorder interactions of these terms: schoolyear and cohortyear. The sample stacks exams in math and reading, so all control are interacted with a subject indicator. I impute missing values using cell means. Table A2 presents corresponding estimates for the subsample with nonmissing data for all variables. Standard errors are clustered by school and reported in parentheses. Degrees of freedom in the Wald tests of joint significance reflect this clustering.
29
Table 3 New Hire Effects on Test Scores Observational Estimates
No Controls (1)
SchoolYear and CohortYear School, No Lagged Student Cohort, Student Lagged Scores, Fixed and Year Covariates Scores Demog. Effects (2) (3) (4) (5) (6) A. Math
TFA hire share
.525 *** (.079)
.054 (.045)
.005 (.034)
.000 (.034)
.001 (.033)
.006 (.032)
nonTFA hire share
.198 *** (.031)
.119 *** (.016)
.107 *** (.010)
.110 *** (.014)
.110 *** (.013)
.096 *** (.012)
TFA difference
.328 *** (.072)
.065 (.044)
.102 *** (.033)
.110 *** (.035)
.110 *** (.034)
.091 *** (.032)
R2 exams
.004 840,995
.062 840,995
.096 840,995
.625 840,995
.642 840,995
.874 840,995
B. Reading TFA hire share
.615 *** (.084)
.151 *** (.047)
.043 (.027)
.038 (.027)
.036 (.024)
.038 * (.020)
nonTFA hire share
.138 *** (.035)
.071 *** (.016)
.052 *** (.010)
.051 *** (.011)
.053 *** (.010)
.043 *** (.009)
TFA difference
.476 *** (.085)
.080 (.050)
.009 (.029)
.013 (.026)
.017 (.024)
.005 (.021)
.062 835,112
.086 835,112
.593 835,112
.615 835,112
.861 835,112
R2 exams
.003 835,112
Notes: This table reports estimates from regressions of student test scores on the new hire share of teachers in each schoolcohortyear cell. Specifications are estimated separately by subject: panel A reports effects on math scores, and panel B reports effects on reading. Regression controls vary across columns. Column 1 has no controls; column 2 adds fixed effects for school, cohort, and year; and columns 36 control for schoolyear and cohortyear effects. Columns 46 add studentlevel controls. Lagged score controls are gradeinteracted cubics in lagged scores from each subject. Demographic controls are binary indicators for sex, race, parental education, subsidized lunch, limited English proficiency, special education, and gifted status. All controls include dummies for missing values, which are imputed using cell means. Column 6 uses student fixed effects in lieu of lagged score and demographic controls. Standard errors are clustered by school and reported in parentheses.
30
Table 4 Validating Observational Estimates
Experimental All Obs. Grades Grades (1) (2)
Observational All Years OLS Years XCohort XCohort OLS (3) (4) (4) A. Math
TFA difference
.058 ** (.028)
.094 ** (.042)
N
3,476
1,480
.082 *** (.027) 840,995
.083 *** .099 *** (.030) (.017) 367,088
262,509
B. Reading TFA difference
.040 (.027)
.001 (.032)
.034 (.022)
N
3,476
1,480
835,112
.040 ** (.020) 364,193
.008 (.008) 246,316
Notes: Column 1 reports the average TFA effect in the experimental sample, pooling data from all studies and grades. Column 2 restricts the experimental sample to the observational grades (38). Columns 35 estimate the corresponding average TFA effect in the observational sample, weighting the tenurespecific observational effects by the experimental tenure distribution. Column 3 uses the crosscohort estimates from the full observational sample (19972012). Column 4 uses crosscohort estimates from the studentteacher matched years (20062012), and column 5 uses the OLS estimates from the studentteacher matched subsample. See Table 3 for specification details.
31
5,000 4,000
0
0
50
1,000
100
2,000
150
3,000
200
1990
1995
2000
2005
Total
2010
North Carolina
32
2015
new corps members in North Carolina
250
new corps members nationwide
300
6,000
Figure A1 Teach for America’s Expansion
Figure A2 WithinSchool Exposure to Teacher Hiring Observational Sample
residualized TFA hire share
residualized non−TFA hire share
.1
.2
.3
.4
.5
−.2
.2
.4
.6
.8
1
−.4 −.6 −.8 −1
−1
−.8
−.6
−.4
average lagged score
−.2
slope = 0.001σ
−.2
slope = −0.217σ average lagged score
0
0
0
B. Non−TFA Hires
0
−.1
A. TFA Hires
Notes: These figures depict the relationship between students’ exposure to teacher hiring and their prior achievement. Each graph plots average lagged reading scores in cohortschoolyear cells against the cell share of students taught by newlyhired reading teachers. Each dot depicts the average for bins of width .05 on the horizontal axis with best fit lines estimated on the underlying celllevel data. Panels A and B of Figure 2 present the unconditional correlation between teacher hiring and lagged achievement. The graphs shown here plot averaged lagged test scores against binned residuals from regressions of the hiring shares on school effects, as in equation (8).
33
Figure A3 Testing for Effects on Lagged Achievement by Teacher Tenure Observational Sample
1
2
.3 .1 0
3
−.9 0
years of tenure
1
.15
0
1
2
3
4
5
.3 −.1
0
.1
.2 8
y9 − 2
y−1
years of tenure
0
1
y2
3
year TFA Hires
Non−TFA Hires
4
y5 + 16
−.9−.9
−.3
y− 37 6
−.9
−.3
−.1
−.2
effect on test scores
0
.1
.2
effect on test score −.05 0 .05 .1
.3
1.1
D. Reading, Long Run
−.2
−.1
3
C. TFA Difference
C. Math, Long Run
effect on test scores
2
years of tenure
1.1 1.1
0
−.3
−.9
−.2
−.1
effect on test scores
.2
.3 .2 .1 0 −.1 −.3
−.2
effect on test scores
1.1
B. Reading, Short Run 1.1
A. Math, Short Run
7
y +92
8
years of tenure
TFA Difference
Notes: These graphs plot effects on lagged achievement by tenure. They serve as a placebo test for the estimates in Figure 4. See Figure 4 for specification details.
34
Table A1 Sample Restrictions Observational Sample
Analysis Sample TeacherMatched Years XCohort OLS (4) (4)
All Schools (1)
TFA Schools (2)
All Years (3)
normalized score
.00
.31
.31
.35
.29
female
.49
.49
.49
.49
.50
nonHispanic white
.58
.29
.29
.23
.24
nonHispanic black
.28
.55
.55
.54
.53
Hispanic
.08
.11
.11
.17
.16
other race
.06
.05
.05
.06
.06
parent has bachelor's degree
.29
.22
.22


subsidized lunch
.47
.64
.64
.70
.69
limited English proficiency
.04
.06
.06
.09
.08
special education
.10
.09
.09
.08
.07
gifted
.14
.09
.09
.08
.10
2,354 19,918,388 19972012
153 1,679,031 19972012
147 1,676,107 19972012
145 731,281 20062012
144 495,264 20062012
sample schools exams years
Notes: This table describes North Carolina public school students who were tested in math and reading in grades 38 between 1997 and 2012. Column 1 contains all tests with nonmissing scores from the first administration for each studentsubjectyear. Column 2 describes tests from schools that hired TFA corps members, excluding charter schools. The crosscohort analysis sample in column 3 contains the subset from schoolsubjectcohortyear cells with nonmissing personnel data. Columns 4 and 5 further restrict the crosscohort sample to the years 20062012, during which students can be matched to their specific course instructors. Column 5 describes students with unique teacher matches for the tested subject and nonmissing lagged scores in both subjects. I impute missing values using cell means. The following data are missing for all tests: parental education (20062012), subsidized lunch status (1997 and 2007), and gifted status (19971998). I impute these values using school means for all other years.
35
Table A2 Student Covariates and Exposure to Teacher Hiring Observational Sample
No Hire Mean (1)
Difference by New Hire Share No School, SecondOrder Controls Subject, and Year Interactions TFA NonTFA TFA NonTFA TFA NonTFA Hires Hires Hires Hires Hires Hires (2) (3) (4) (5) (6) (7)
lagged math score
.24 [.96]
.556 *** .102 *** (.080) (.032)
.114 ** (.049)
lagged reading score
.24 [.98]
.599 *** .121 *** (.077) (.033)
.017 (.029)
.004 (.011)
.134 *** .037 *** (.040) (.012)
.003 (.023)
.004 (.009)
.008 (.007)
.001 (.002)
.010 (.008)
.003 (.003)
female
.50
.008 (.007)
White
.34
.486 *** .078 *** (.063) (.026)
.040 *** .010 (.015) (.007)
.003 (.005)
.002 (.002)
Black
.52
.342 *** .077 *** (.057) (.025)
.025 (.016)
.014 ** (.006)
.007 (.008)
.000 (.003)
Hispanic
.09
.132 *** .002 (.037) (.009)
.029 ** (.014)
.004 (.004)
.002 (.007)
.002 (.001)
subsidized lunch
.64
.352 *** .022 (.050) (.024)
.063 ** (.026)
.010 (.013)
.006 (.006)
.004 (.003)
limited English proficiency
.04
.079 *** .003 (.021) (.005)
.021 ** (.009)
.003 (.003)
.001 (.005)
.001 (.002)
special education
.08
.002 (.008)
.006 (.006)
.004 ** (.002)
.004 (.005)
.003 * (.002)
gifted
.11
.094 *** .041 *** (.023) (.009)
.018 (.011)
.006 * (.003)
.004 (.007)
.003 (.002)
F(20, 146) p sample schools exams
.000 (.002)
.032 ** (.013)
.001 (.003)

7.18 .00
4.77 .00
.51 .73
144 282,887
147 1,418,826
147 1,418,826
147 1,418,826
Notes: This table replicates the covariance balance tests from Table 2 on the subsample of students with nonmissing data for all variables. Grade 3 students with missing lagged scores account for onethird of the excluded observations.
36
Figure B1 Exposure to Teacher Hiring Observational Sample
B. Non−TFA Hires
share of students with TFA hires
share of students with non−TFA hires
.4
.6
.8
1
0
.2
slope = −0.58σ
.6
.8
1
−.4 −1
−1
−.8
−.6
−.6
−.4
average lagged score
−.2
−.2
slope = −0.12σ
−.8
average lagged score
C. TFA Hires
D. Non−TFA Hires
residualized TFA hire share
residualized non−TFA hire share
−.1
0
.1
.2
.3
−.2
0
.2
.4
.6
.8
0
−.2
0
−.3
−.4 −.6 −.8 −1
−1
−.8
−.6
−.4
average lagged score
−.2
slope = 0.006σ
−.2
slope = 0.020σ average lagged score
.4
0
.2
0
0
A. TFA Hires
Notes: These figures depict the relationship between students’ exposure to teacher hiring and their prior math achievement. Figure 2 presents corresponding results for reading achievement, along with estimation notes.
37