·         Maggie Law & Vivien Petras                                              SIMS 247: Info Visualization & Presentation

·         Assignment #2                                                                    Instructor: Marti Hearst

·         Due: The Ides of March, 2002                                              Spring 2002 * F 2:00 – 5:00

 

 

 

Many of our images use color to denote a characteristic. For an online version of this paper, please see http://www.sims.berkeley.edu/~mlaw/is247/a2/is247_assignment2.html.

 

 

 

The Dataset

 

The focus of our analysis is a multilevel dataset (downloaded from the website of the Centre for Multilevel Modelling, http://multilevel.ioe.ac.uk/intro/datasets.html) that examines school effectiveness in Inner London schools. The original data, collected by the Inner London Education Authority (ILEA), represents 140 secondary schools over the 3-year period of 1985-1987, and consists of 15,362 student records with 10 variables. ILEA reports that the data are the result of a random 50% sample, and have been used to evaluate and compare the effectiveness of schools.

 

For the sake of this analysis, we chose to focus on a smaller (presumably more manageable) subset of this data. Arbitrarily, we used student records from 1985 only, of which there are 4,915 representing 96 of the original 140 schools.

 

Our data consists of 5 nominal, 1 ordinal and 3 interval data types. The format of the set is as follows:

 

·         School

·         Exam Score -- Numeric score

·         Percent of students eligible for free school meals -- % FSM

·         Percent of students in school in VR band 1 -- % VR1 band

·         Gender -- Male (0); Female (1)

·         VR band of student[1] -- VR1 (2); VR2 (3); VR3 (1)

·         Ethnic group of student -- ESWI[2] (1); African (2); Arab (3); Bangladeshi (4); Caribbean (5); Greek (6); Indian (7); Pakistani (8); S.E. Asian (9); Turkish (10); Other (11)

·         School gender -- Mixed (1); Male (2); Female (3)

·         School denomination – State-Maintained (1); Church of England (2); Roman Catholic (3)

 

It’s worth noting that, in the absence of more specific details about exam scores or the VR (verbal reasoning) tests, our analysis makes several qualitative assumptions. Specifically, we treat higher exam scores as indicators of superior academic performance; likewise, we assume that students in VR band 1 are academically superior to those in VR band 2, and those in VR band 2 superior to those in VR band 3.

 

Without knowing more about the contents of the two types of tests used in these datasets, we were at least able to observe that there is a correlation between them -- the highest exam scores, across all students, were most heavily associated with high verbal reasoning skills (see Figs. 1-3 below). Additionally, Figure 4 shows that schools with low percentages of students in the VR1 band  (best) category seem to have lower exam scores. . Nevertheless, we concede up-front that a more formal analysis of these data would require a closer evaluation of the tests as true reflectors of academic abilities.

 

 

 

Fig. 1 – VR1 band students (variable VR_BAND value 2) had best exam scores overall.

 

 

 

 

Fig. 2 – VR2 band students (variable VR_BAND value 3) had generally low to mediocre exam scores.

 

 

Fig. 3 – VR3 band students (variable VR_BAND value 1) had generally low exam scores -- none higher than 38.

 

 

 

Fig. 4 – Students in schools with low percentages of students in the VR 1 band seem to lower exam results in general.

 

 

Hypotheses and Findings

 

We began our analysis with a list of six hypotheses. A treatment of each hypothesis is outlined below, along with related questions (for some) that arose from our discussion. We used two data visualization tools for our analysis: Inxight’s Eureka and MDG Corporation’s ParallAX. (Spotfire did not seem an appropriate visualization tool for this dataset, given the heavy emphasis on nominal data types.) The visualizations created from one or both of these tools illustrate our final conclusions for each hypothesis.

 

 

Hypothesis #1: Girls did better in all-girls schools.

Hypothesis #2: In general (across all types of schools), girls do better than boys.

 

As these are related hypotheses, we shall discuss them in parallel. Our theory behind hypothesis #1 is that an all-girls learning environment is more accommodating than a mixed school. (An obvious ancillary question is: did boys do better in all-boys schools?) Hypothesis #2 simply seems self-evident to us. In fact, our dataset supports it. The average exam score for all girls was 21.88, compared with 18.37 for boys.

 

Using Eureka, we were able to determine that the average exam score for girls attending all-girls schools was higher (22.7) than for girls attending mixed schools (20.82). As it turns out, we determined that boys did better in all-boys schools than in mixed schools, as reflected in average exam scores (19.18 vs. 17.43). The overall average exam score in mixed schools for both girls and boys was 19.17[3].

 

We found it difficult to “eyeball” these – or any other averages-based – conclusions in either Eureka or ParallAX. Subtle degrees of variation within ranges proved inconclusive when simply observing the visualizations in either tool. While we were able to determine if the top score within one range was greater than the top score within another, we had to consult Eureka’s averages calculations (displayed in the bottom corners of  the display) in order to conclude definitively whether scores in one range overall were better than those in another range.

 

 

 

Fig. 5 – Filter applied to isolate records for students whose exam scores are higher than 58 indicate that girls outperform boys.

 

 

From the visualization illustrated in Fig. 5, we observed the following:

 

·         There are more girls (1) than boys (0) in this group

·         All-girls schools (3) dominate this sample

·         Most boys in this sample are in all-boys schools (2)

·         No girl in this group attends a mixed-gender school (1)

 

We also noted that:

 

·         This sample is overwhelmingly of native-ethnicity (1)

·         Most representative schools are state-maintained (1)

·         With only one exception (at 70%), the schools these kids attend tend to have lower percentages of state meal eligibility (%FSM)

·         With only one exception (the second student in the list), all of these highest-scoring students are in VR band 1 (VR band variable value = 1).

 

There was a curious outlier in our data, which stood out more in our ParallAX visualization than in Eureka. Unless this data point is erroneous, there is a female student who attends an all-boys school (Fig. 6).

 

 

 

Fig. 6 – One female student attends an all-boys school (the poor dear).

 

 

Hypothesis #3: Schools with higher percentages of students eligible for school meals show lower exam scores.

 

Our theory is that eligibility for school meals correlates directly with low family income. From this, we derive a hypothesis that poorer kids don’t do as well as richer, more privileged ones.

 

We split the schools into three groups, representing low (11-32%), medium (33-53%) and high (54-70%) percentages of kids eligible for school meals (%FSM). We proved our hypothesis by calculating the average exam score in each range: 22.1 (low %FSM), 18.54 (medium %FSM) and 17.22 (high %FSM).

 

In ParallAX, we are able to visualize clear correlations of high exam scores with low %FSM (Fig. 7) and low exam scores with high %FSM (Fig. 8) in support of our claim.

 

It has been pointed out to us that Figs. 7 and 8 also indicate a meaningful pattern with respect to ethnic groups, implying that students with higher exam scores attending schools with lower percentages of school meal eligibility are less representative of the full ethnic array than those in lower exam score / higher %FSM categories. While we recognize that such a pattern is apparent in our ParallAX visualizations, we disagree that meaningful conclusions can be drawn from this. In the next section (hypothesis #4) we note proportional representations of each ethnic group in the dataset, using numbers to draw attention to what a ParallAX visualization does not make obvious. Specifically, native students far dominate the dataset (64%) with nearly every other group representing just 1-2% of the whole. One should also note that Fig. 7 represents data for only 56 students (see “Total size” count in lower right-hand corner of screen) while Fig. 8 represents 302 students. A larger set could make it more likely for a wider range of ethnic groups to be included.

 

One of our strong conclusions made during the course of this research is that just as data visualizations such as those presented in this paper are capable of revealing fascinating patterns and revelations otherwise lost in the granularity of hard data, it is also possible for them to mislead researchers. In this case, ParallAX fails to make visually obvious both the relative ethnic representations in the entire dataset and the substantially different sizes of the two groups represented in Figs. 7 and 8. A critical eye, along with a careful understanding of the limitations of the visualization tools used, are necessary elements to any visualization project.

 

 

 

Fig. 7 – Correlation between students with high exam scores and schools with low percentages of school meal eligibility (%FSM).

 

 

 

Fig. 8 – Correlation between students with low exam scores and schools with high percentages of school meal eligibility (%FSM).

 

 

Hypothesis #4: Native students (ESWI group) do better than other ethnic groups.

 

We believe that native students will enjoy a more accommodating academic environment than kids of non-native ethnicities, therefore will display stronger academic performance (based on exam scores). An ancillary question arises: to which ethnic group can the poorest performance be attributed?

 

We found that this is only marginally true. The average exam score for native students was 20.35, and for non-native students 20.01. We calculated average exam scores for each individual ethnic group as well (overall group size and percentage noted parenthetically):

 

·         ESWI ( 3187 | 64% ) = 20.35

·         African ( 107 | 2% ) = 21.46

·         Arab ( 15 | <1% ) = 18.4

·         Bangladeshi ( 56 | 1% ) = 18.55

·         Caribbean ( 859 | 17% ) = 16.84

·         Greek ( 72 | 1% ) = 22.0

·         Indian ( 135 | 2% ) = 26.12

·         Pakistani ( 72 | 1% ) = 25.4

·         S.E. Asian ( 62 | 1% ) = 28.15

·         Turkish ( 67 | 1%) = 15.82

·         Other ( 283 | 5% ) = 23.87

 

We are skeptical of deriving conclusions from these numbers; the ranges of relative proportions of some groups to others are, in many cases, quite wide. Having said this, we observed that the Arab student population fared particularly poorly, as measured by exam scores (Fig. 9).

 

 

 

Fig. 9 – Arab students showed a pattern of consistently low exam scores.

 

 

We are also able to discern a greater relative proportion of VR1 band students among the native group, in comparison with the aggregate of all other ethnic groups (Fig. 10).

 

 

 

Fig. 10 – A comparison of VR band distribution of native (underlay) and non-native (overlay) students indicates a higher proportion of native students in the VR band 1 category. Grey lines denote students in the VR3 band, blue half-length lines denote VR1 band and full blue lines denote VR2 band.

 

 

Hypothesis #5: The percentage of students eligible for school meals is higher among the other ethnic groups than among native students.

 

This hypothesis cleverly combines the previous two. First, we hypothesized (and proved) that schools with higher percentages of students eligible for school meals show lower exam scores, then that non-native students don’t perform as well academically as native kids. Here we hypothesize that the schools with high rates of school meal programs (which presumably don’t do as well) are more populous with non-native students.

 

We also theorized that families of non-native kids are more financially challenged in comparison to those of native kids, and are therefore eligible for school meals in higher percentages. The averages support our hypothesis: Ethnic groups represent an average of 34.24% FSM eligibility, compared with a 31.26% average for native students.

 

An ancillary question: Is there a higher percentage of meal-eligible students in state-maintained schools than in the two other (religious denomination) school types? For this dataset, the answer is yes. Up to 70% eligibility in state-maintained schools (average 34.29%), yet up to just 44% eligibility in Church of England schools (average 26.52%), and up to 58% in Catholic schools (average 30.05%).

 

 

Hypothesis #6: Church of England and Roman Catholic denomination schools do better than state maintained schools.

 

Our theory is, generally, that private (religious) schools produce better academic performance than public (state) schools. The numbers support this – the average exam scores for each school type is as follows:

 

·         State: 19.26

·         Church of England: 20.95

·         Roman Catholic: 22.84

 

Using our visualization tools, we were not as certain. On the one hand, we could support this hypothesis visually by illustrating that all 45 students who attend schools with the very highest %FSM sample, with correlate low exam scores, attend state-maintained schools (Fig. 11).

 

 

 

Fig. 11 – All 45 students in the highest %FSM sample, with correlating low exam scores, attend state-maintained schools.

 

 

Beyond this, however, little conclusive information is gleaned from visualizations in either tool. As Fig. 12 illustrates, it is only marginally apparent to the casual eye that more religious schools cluster toward the higher exam scores.

 

 

Fig. 12 – An only marginally conclusive visualization: the correlation between higher exam scores (column #3) and religious denomination schools (right-most column). Note the tighter cluster of blue colored lines toward the top of the school column.

 

 

Additional Thoughts about the Data

 

In the course of our analysis, we envisioned actual applications of these tools for school evaluation. For example, how might a committee use visualization techniques to determine which schools to close down (based on performance) in the face of budget cuts? By isolating records that fall within the very lowest exam score (1) and the lowest percentage of VR1 band students (8), we are left with just 3 student records, all from the same school (Fig. 13). This school would likely be an obvious closure candidate. (Perhaps notably, these three records are native boys attending mixed-gender, state-maintained schools.)

 

 

Fig. 13 - Isolating the very lowest exam score and school VR1 band percentage draws negative attention to a single school.

 

 

Comments about Tools

 

We found that both tools we worked with have certain quirks that make them better for certain types of visualizations, and also add challenges to their learning curves.

 

For example, we were initially misled by visualizations in ParallAX that displayed large numbers of records on top of an identical path. Since the tool displays these records as a single, uniform line, it can seem to the casual observer that there is just one record being represented; in fact, the line could represent thousands of records.

 

Eureka’s requirement that each filtered view be saved as a discrete .txt file seemed somewhat inconvenient and unruly to us. An alternate approach might be one similar to Excel’s workbook/worksheet model, in which multiple views are captured in a single file. We also encountered what appeared to be a bug in its categorizing feature.

 

It was helpful that both applications kept numerical calculations and stats within fairly close reach, usually in one or the other bottom corner of the screen. It was not so helpful, however, that we needed to go through several steps each time we wanted to read the average or median for a certain category of a variable. For example in Eureka, in order to see the average exam students for all male students in male-only schools, one needs to perform 2 filter functions (opening a new window each time) before pointing to the exam score variable column and reading the average. For interval data types it was also necessary to categorize the data before we performed the filtering. It would have been much less complex if the program also calculated a new average when a category of a variable was only highlighted (providing the average for the highlighted region without having to go through those filtering steps).

 

Both visualization tools, we thought, provided techniques for quickly manipulating and isolating points and subsets. This is especially true when comparing what would be involved with proving our hypotheses in, for example, Excel. Despite some bugs and clumsy behavior, we were pleased with our experience using both ParallAX and Eureka. It did strike us, however, that the best overall understanding of our dataset came from combining the visual cues enabled by our tools with certain statistical calculations (averages, record counts, medians, and so on) that provided valuable context.

 

 

 



[1] Note that this labeling is unintuitive. We have attempted to compensate with clarifications throughout our report.

[2] ESWI: Students born in England, Scotland, Wales or Ireland.

[3] Naturally, we didn’t test those average scores for statistical significance. However, for most of them the difference seems big enough to make an educated guess that they in fact are meaningful.