Education Bill: Accountability for Appearances

A key fallacy in the Bush education bill is that testing equals reform. The bill mandates yearly testing in grades three through eight. As Arianna Huffington pointed out in a recent column, "introducing real reform into the public education system is so extraordinarily difficult that the political establishment invariably chooses to settle for the appearance of reform." In place of real reform, this bill offers "high-stakes, standardized, shallow, discriminatory, meaningless and underfunded testing...." The misguided emphasis on testing has led to protests, boycotts, and other civil disobedience by parents, teachers, and students. Moreover, prominent educators have warned that the testing could actually drive down standards rather than raising achievement.

In May parents in Scarsdale, NY organized a protest in which two-thirds of the town's eighth graders boycotted a standardized test. In Massachusetts a social studies teacher refused to give a state-mandated test, and was suspended. In Marin County, CA, a member of the school board urged parents to prohibit their children from taking state tests. In Whitefish Bay, WI, the state discontinued using standardized tests as a graduation requirement because so many parents protested.

The education bills passed by the House and the Senate give schools ten to twelve years to bring all students -- regardless of economic status or language spoken at home -- to an undefined level of proficiency, and requires that they report the performance of students in various demographic groups. The problems of defining proficiency and how to measure it are left to the states. Since 1994 states have been legally required to establish and use objective criteria to measure reading and math skills. Many states use cheaper and politically popular relative scoring, however. Although technically illegal, the federal government has tended not to punish states for using relative scoring methods.

Speaking at the National Press Club in July, Roy Romer, superintendent of schools for Los Angeles, CA, and Harold O. Levy, chancellor of the New York City school system, agreed that the structure of the bills would tempt school administrators to lower their definitions of proficiency so that they would not suffer reductions of federal aid. Further, Mr. Romer observed that despite the bills' requirements that children be transferred out of schools that did not meet the state-defined standards, in Los Angeles that is not logistically possible. Mr. Levy reported watching the congressional debate "with some amusement." While, on the one hand, many of the ideas under discussion as means of improving the public school environment are in use in New York City, Levy ascribed the programs' successes to linkage with other items, including, especially, teacher quality and training. This is ironic in the current context because funds previously targeted to assist with teacher training and reducing class size have been lumped together and cut. The New York City and Los Angeles schools systems are responsible for educating six percent of the nation's school children.

A New York Times investigation into state testing procedures and results bore out Levy and Romer's concerns. A panel in California recommended that 70% correct answers would be a passing grade, but state schools superintendent Delaine Eastin arbitrarily decided that 55% would be a passing grade in math, and 60% in English. 98% of students taking a standardized test in Ohio this spring passed, while in California less than half passed a similar test. In addition to inconsistency in standards from state to state, critics of testing have cited problems with large scale testing processes. In Minnesota 47,000 students received lower scores than they should have, which prevented some from graduating. Similar problems occurred in Arizona, Michigan, and Washington.

Observers expect the problems to get worse as increasing reliance is placed on testing at the same time that adequate funds are not provided. The Bush bills budget $320 million for testing, while the National Association of State Boards of Education has estimated that $7 billion would be required. Bush likes to say that the Department of Education will receive an 11.5% budget increase, which would be the largest for any agency in 2002. The Center on Budget and Policy Priorities reports, however, that when gimmicks and accounting fictions -- including understating the school-age population -- are adjusted for, the actual increase is more like 2.9%, which makes the increase less than that it received in each of the last four years.

Some observers have suggested that the U.S. education model was flawed from the beginning, designed to produce compliant assembly-line workers. Clearly such a model would not value creativity or critical thinking. Moreover, parents have been regarded as obstacles to rather than partners in the education of their children. Bush's "unprecedented new choices" for parents in his proposals are merely allowing them to use taxpayer money to hire tutors.

Bush's experience does prove that standardized testing provides political campaign rhetoric. During the presidential campaign Bush proudly invoked the phrase "the Texas miracle" which referred to significant gains by Texas children, particularly minorities, on state standardized tests. The claim was made that during Bush's tenure the gap between white and minority test scores shrank significantly. A test used throughout the country called the NAEP, however, indicated that the gap between white and black fourth graders actually widened. Experts quoted by the Washington Post in April 2000 suggested that many of the gains in Texas resulted from intense drilling to pass the test, and lowering of test difficulty. Walter Haney of Boston College asserted that the so-called system of accountability in Texas in fact forced tens of thousands of high school students to quit each year, which in turn boosted the test scores of the remaining students. Researchers from Harvard University found that white, middle-class students were reading literature and honing problem solving skills, while poor and minority children were squandering class time practicing for the standardized state tests.

Under the Texas system teachers' and administrators' careers can plateau if their students' Texas Assessment of Academic Skills (TAAS) scores do not improve. Students cannot graduate if they fail the tests. Bush supporters insist that the improvements were real, but an unnamed U.S. Department of Education official confirmed that the NAEP did not indicate the reported narrowing in scores between black and white students. A RAND corporation researcher suggested that students who drilled intensively for the TAAS tests gained only a superficial understanding of the subject matter, and were not equipped to pass tests other than TAAS. Blatant cheating has also been reported. Three teachers and an administrator in Houston, TX were forced to resign after they corrected students' TAAS tests. Administrators in Austin were caught tampering with tests, as well. Further, researcher Haney and the Intercultural Development Research Association has reported drastic undercounting of high school dropout rates. Dropout rates are significant because they contribute to a school's ranking into one of four categories, from low-performing to exemplary.

The most damning expose of "the Texas miracle" can be found in a RAND corporation study released in October 2000. In brief the questions and conclusions of the study were as follows:

  1. Have the reading and math skills of Texas students improved since the full statewide implementation of the TAAS program in 1994 (e.g., are fourth graders reading better today than fourth graders a few years ago); and, if their skills did improve: (a) how much improvement occurred and (b) was the amount of improvement in reading the same as it was in math? The study showed that only fourth grade math scores exceeded the national average over a four-year period. Gains as indicated by the TAAS were in general much larger than those indicated by the NAEP
  2. Are the gains in reading and math on the TAAS consistent with what would be expected given NAEP scores in Texas and the rest of the country? The study found that increases were almost identical to nationwide increases over the period measured.
  3. Has Texas narrowed the gap in average reading and math skills between whites and students of color? As measured by the TAAS, the gap between black and white students started out smaller than indicated by the NAEP, and grew smaller. As measured by the NAEP, the gap was initially quite large (36th percentile vs. 67th percentile) and widened over time. Over the period measured an increasing number of students with disabilities did not participate in the TAAS, while nationally that trend was reversed. Also, as mentioned earlier, the number of dropouts and students not being promoted increased in Texas. Both of these factors would have contributed to the skewed results that were found.
  4. Do other tests given in Texas at a sample of 20 schools produce results that are consistent with those obtained with the TAAS?
    Schools in relatively more affluent areas performed better on non-TAAS tests than did schools in less affluent areas.

The RAND study concluded in part:

To sum up, states that use high-stakes exams may encounter a plethora of problems that would undermine the interpretation of the scores obtained. Some of these problems include the following: (1) students being coached to develop skills that are unique to the specific types of questions that are asked on the statewide exam (i.e., as distinct from what is generally meant by reading, math, or the other subjects tested); (2) narrowing the curriculum to improve scores on the state exam at the expense of other important skills and subjects that are not tested; (3) an increase in the prevalence of activities that substantially reduce the validity of the scores; and (4) results being biased by various features of the testing program (e.g., if a significant percentage of students top out or bottom out on the test, it may produce results that suggest that the gap among racial and ethnic groups is closing when no such change is occurring).

There are a number of strategies that states might try to lessen the risk of inflated and misleading gains in scores. They can reduce the pressure to "raise scores at any cost" by using one set of measures to make decisions about individual students and another set (employing sampling and third-party administration) to make decisions about teachers, schools, and educational programs. States can replace their traditional paper-and-pencil multiple-choice exams with computer based "adaptive" tests that are tailored to each student's abilities, that draw on "banks" of thousands of questions, and that are delivered over the Internet into the school building (for details, see Bennett, 1998; Hamilton, Klein, & Lorie, 2000). States can also periodically conduct audit testing to validate score gains. They can study the positive and negative effects of the testing program on curriculum and instruction, and whether these effects are similar for different groups of students. For instance, what knowledge, skills, and abilities are and are not being developed when the focus is concentrated on preparing students to do well on a particular statewide, high-stakes exam? However, given the findings reported above for Texas, it is evident that something needs to be done to ensure that high-stakes testing programs, such as the TAAS, produce results that merit public confidence and thereby provide a sound basis for educational policy decisions.


