·
Maggie
Law & Vivien Petras SIMS
247: Info
Visualization & Presentation
·
Assignment
#2 Instructor: Marti Hearst
·
Due:
The Ides of March, 2002 Spring 2002 * F 2:00 – 5:00
Many of our images use color
to denote a characteristic. For an online version of this paper, please see http://www.sims.berkeley.edu/~mlaw/is247/a2/is247_assignment2.html.
Our
data consists of 5 nominal, 1 ordinal and 3 interval data types. The format of
the set is as follows:
·
School
·
Exam Score -- Numeric score
·
Percent of students eligible for free school meals -- % FSM
·
Percent of students in school in VR band 1 -- % VR1 band
·
Gender -- Male (0); Female (1)
·
VR band of student[1] -- VR1 (2); VR2 (3); VR3
(1)
·
Ethnic group of student -- ESWI[2]
(1); African (2); Arab (3); Bangladeshi (4); Caribbean (5); Greek (6); Indian
(7); Pakistani (8); S.E. Asian (9); Turkish (10); Other (11)
·
School gender -- Mixed (1); Male (2); Female (3)
·
School denomination – State-Maintained (1); Church of England (2); Roman
Catholic (3)
It’s
worth noting that, in the absence of more specific details about exam scores or
the VR (verbal reasoning) tests, our analysis makes several qualitative
assumptions. Specifically, we treat higher exam scores as indicators of
superior academic performance; likewise, we assume that students in VR band 1
are academically superior to those in VR band 2, and those in VR band 2
superior to those in VR band 3.
Without
knowing more about the contents of the two types of tests used in these
datasets, we were at least able to observe that there is a correlation between
them -- the highest exam scores, across all students, were most heavily
associated with high verbal reasoning skills (see Figs. 1-3 below).
Additionally, Figure 4 shows that schools with low percentages of students in
the VR1 band (best) category seem to
have lower exam scores. . Nevertheless, we concede up-front that a more formal
analysis of these data would require a closer evaluation of the tests as true
reflectors of academic abilities.

Fig.
1 –
VR1 band students (variable VR_BAND value 2) had best exam scores overall.

Fig.
2 –
VR2 band students (variable VR_BAND value 3) had generally low to mediocre exam
scores.

Fig.
3 –
VR3 band students (variable VR_BAND value 1) had generally low exam scores --
none higher than 38.

Fig.
4 –
Students in schools with low percentages of students in the VR 1 band seem to
lower exam results in general.
Hypothesis #1: Girls did better in all-girls schools.
Hypothesis
#2: In general (across all types of schools), girls do better than boys.
As
these are related hypotheses, we shall discuss them in parallel. Our theory
behind hypothesis #1 is that an all-girls learning environment is more
accommodating than a mixed school. (An obvious ancillary question is: did boys
do better in all-boys schools?) Hypothesis #2 simply seems self-evident to us.
In fact, our dataset supports it. The average exam score for all girls was
21.88, compared with 18.37 for boys.
Using
Eureka, we were able to determine that the average exam score for girls attending
all-girls schools was higher (22.7) than for girls attending mixed schools
(20.82). As it turns out, we determined that boys did better in all-boys
schools than in mixed schools, as reflected in average exam scores (19.18 vs.
17.43). The overall average exam score in mixed schools for both girls and boys
was 19.17[3].
We
found it difficult to “eyeball” these – or any other averages-based –
conclusions in either Eureka or ParallAX. Subtle degrees of variation within
ranges proved inconclusive when simply observing the visualizations in either
tool. While we were able to determine if the top score within one range was
greater than the top score within another, we had to consult Eureka’s averages
calculations (displayed in the bottom corners of the display) in order to conclude definitively whether scores in
one range overall were better than those in another range.

Fig.
5 –
Filter applied to isolate records for students whose exam scores are higher
than 58 indicate that girls outperform boys.
From
the visualization illustrated in Fig. 5, we observed the following:
·
There
are more girls (1) than boys (0) in this group
·
All-girls
schools (3) dominate this sample
·
Most
boys in this sample are in all-boys schools (2)
·
No
girl in this group attends a mixed-gender school (1)
We
also noted that:
·
This
sample is overwhelmingly of native-ethnicity (1)
·
Most
representative schools are state-maintained (1)
·
With
only one exception (at 70%), the schools these kids attend tend to have lower
percentages of state meal eligibility (%FSM)
·
With
only one exception (the second student in the list), all of these
highest-scoring students are in VR band 1 (VR band variable value = 1).
There
was a curious outlier in our data, which stood out more in our ParallAX
visualization than in Eureka. Unless this data point is erroneous, there is a
female student who attends an all-boys school (Fig. 6).

Fig.
6 –
One female student attends an all-boys school (the poor dear).
Hypothesis
#3: Schools with higher percentages of students eligible for school meals show
lower exam scores.
Our
theory is that eligibility for school meals correlates directly with low family
income. From this, we derive a hypothesis that poorer kids don’t do as well as
richer, more privileged ones.
In
ParallAX, we are able to visualize clear correlations of high exam scores with
low %FSM (Fig. 7) and low exam scores with high %FSM (Fig. 8) in support of our
claim.
It has been pointed out to us that Figs. 7 and 8 also indicate a
meaningful pattern with respect to ethnic groups, implying that students with
higher exam scores attending schools with lower percentages of school meal
eligibility are less representative of the full ethnic array than those in
lower exam score / higher %FSM categories. While we recognize that such a
pattern is apparent in our ParallAX visualizations, we disagree that meaningful
conclusions can be drawn from this. In the next section (hypothesis #4) we note
proportional representations of each ethnic group in the dataset, using numbers
to draw attention to what a ParallAX visualization does not make obvious.
Specifically, native students far dominate the dataset (64%) with nearly every
other group representing just 1-2% of the whole. One should also note that Fig.
7 represents data for only 56 students (see “Total size” count in lower
right-hand corner of screen) while Fig. 8 represents 302 students. A larger set
could make it more likely for a wider range of ethnic groups to be included.
One of our strong conclusions made during the course of this
research is that just as data visualizations such as those presented in this
paper are capable of revealing fascinating patterns and revelations otherwise
lost in the granularity of hard data, it is also possible for them to mislead
researchers. In this case, ParallAX fails to make visually obvious both the
relative ethnic representations in the entire dataset and the substantially
different sizes of the two groups represented in Figs. 7 and 8. A critical eye,
along with a careful understanding of the limitations of the visualization
tools used, are necessary elements to any visualization project.

Fig.
7 –
Correlation between students with high exam scores and schools with low
percentages of school meal eligibility (%FSM).

Fig.
8 –
Correlation between students with low exam scores and schools with high
percentages of school meal eligibility (%FSM).
Hypothesis
#4: Native students (ESWI group) do better than other ethnic groups.
We
believe that native students will enjoy a more accommodating academic
environment than kids of non-native ethnicities, therefore will display
stronger academic performance (based on exam scores). An ancillary question
arises: to which ethnic group can the poorest performance be attributed?
We
found that this is only marginally true. The average exam score for native
students was 20.35, and for non-native students 20.01. We calculated average
exam scores for each individual ethnic group as well (overall group size and
percentage noted parenthetically):
·
ESWI
( 3187 | 64% ) = 20.35
·
African
( 107 | 2% ) = 21.46
·
Arab
( 15 | <1% ) = 18.4
·
Bangladeshi
( 56 | 1% ) = 18.55
·
Caribbean
( 859 | 17% ) = 16.84
·
Greek
( 72 | 1% ) = 22.0
·
Indian
( 135 | 2% ) = 26.12
·
Pakistani
( 72 | 1% ) = 25.4
·
S.E.
Asian ( 62 | 1% ) = 28.15
·
Turkish
( 67 | 1%) = 15.82
·
Other
( 283 | 5% ) = 23.87
We
are skeptical of deriving conclusions from these numbers; the ranges of
relative proportions of some groups to others are, in many cases, quite wide.
Having said this, we observed that the Arab student population fared
particularly poorly, as measured by exam scores (Fig. 9).

Fig.
9 – Arab
students showed a pattern of consistently low exam scores.
We
are also able to discern a greater relative proportion of VR1 band students
among the native group, in comparison with the aggregate of all other ethnic
groups (Fig. 10).

Fig.
10 –
A comparison of VR band distribution of native (underlay) and non-native
(overlay) students indicates a higher proportion of native students in the VR
band 1 category. Grey lines denote students in the VR3 band, blue half-length
lines denote VR1 band and full blue lines denote VR2 band.
Hypothesis
#5: The percentage of students eligible for school meals is higher among the other
ethnic groups than among native students.
This
hypothesis cleverly combines the previous two. First, we hypothesized (and
proved) that schools with higher percentages of students eligible for school
meals show lower exam scores, then that non-native students don’t perform as
well academically as native kids. Here we hypothesize that the schools with
high rates of school meal programs (which presumably don’t do as well) are more
populous with non-native students.
We
also theorized that families of non-native kids are more financially challenged
in comparison to those of native kids, and are therefore eligible for school
meals in higher percentages. The averages support our hypothesis: Ethnic groups
represent an average of 34.24% FSM eligibility, compared with a 31.26% average
for native students.
An
ancillary question: Is there a higher percentage of meal-eligible students in
state-maintained schools than in the two other (religious denomination) school
types? For this dataset, the answer is yes. Up to 70% eligibility in
state-maintained schools (average 34.29%), yet up to just 44% eligibility in
Church of England schools (average 26.52%), and up to 58% in Catholic schools
(average 30.05%).
Hypothesis
#6: Church of England and Roman Catholic denomination schools do better than
state maintained schools.
Our
theory is, generally, that private (religious) schools produce better academic
performance than public (state) schools. The numbers support this – the average
exam scores for each school type is as follows:
·
State:
19.26
·
Church
of England: 20.95
·
Roman
Catholic: 22.84
Using
our visualization tools, we were not as certain. On the one hand, we could
support this hypothesis visually by illustrating that all 45 students who
attend schools with the very highest %FSM sample, with correlate low exam
scores, attend state-maintained schools (Fig. 11).

Fig.
11 – All 45
students in the highest %FSM sample, with correlating low exam scores, attend
state-maintained schools.
Beyond
this, however, little conclusive information is gleaned from visualizations in
either tool. As Fig. 12 illustrates, it is only marginally apparent to the
casual eye that more religious schools cluster toward the higher exam scores.

Fig.
12 – An
only marginally conclusive visualization: the correlation between higher exam
scores (column #3) and religious denomination schools (right-most column). Note
the tighter cluster of blue colored lines toward the top of the school column.
In
the course of our analysis, we envisioned actual applications of these tools
for school evaluation. For example, how might a committee use visualization
techniques to determine which schools to close down (based on performance) in
the face of budget cuts? By isolating records that fall within the very lowest
exam score (1) and the lowest percentage of VR1 band students (8), we are left
with just 3 student records, all from the same school (Fig. 13). This school
would likely be an obvious closure candidate. (Perhaps notably, these three
records are native boys attending mixed-gender, state-maintained schools.)

Fig.
13 -
Isolating the very lowest exam score and school VR1 band percentage draws
negative attention to a single school.
We
found that both tools we worked with have certain quirks that make them better
for certain types of visualizations, and also add challenges to their learning
curves.
For
example, we were initially misled by visualizations in ParallAX that displayed
large numbers of records on top of an identical path. Since the tool displays
these records as a single, uniform line, it can seem to the casual observer
that there is just one record being represented; in fact, the line could
represent thousands of records.
Eureka’s
requirement that each filtered view be saved as a discrete .txt file seemed
somewhat inconvenient and unruly to us. An alternate approach might be one
similar to Excel’s workbook/worksheet model, in which multiple views are
captured in a single file. We also encountered what appeared to be a bug in its
categorizing feature.
It
was helpful that both applications kept numerical calculations and stats within
fairly close reach, usually in one or the other bottom corner of the screen. It
was not so helpful, however, that we needed to go through several steps each
time we wanted to read the average or median for a certain category of a
variable. For example in Eureka, in order to see the average exam students for
all male students in male-only schools, one needs to perform 2 filter functions
(opening a new window each time) before pointing to the exam score variable
column and reading the average. For interval data types it was also necessary
to categorize the data before we performed the filtering. It would have been
much less complex if the program also calculated a new average when a category
of a variable was only highlighted (providing the average for the highlighted
region without having to go through those filtering steps).
Both
visualization tools, we thought, provided techniques for quickly manipulating
and isolating points and subsets. This is especially true when comparing what
would be involved with proving our hypotheses in, for example, Excel. Despite
some bugs and clumsy behavior, we were pleased with our experience using both
ParallAX and Eureka. It did strike us, however, that the best overall
understanding of our dataset came from combining the visual cues enabled by our
tools with certain statistical calculations (averages, record counts, medians,
and so on) that provided valuable context.
[1] Note that this labeling is unintuitive. We have attempted to compensate with clarifications throughout our report.
[2] ESWI: Students born in England, Scotland, Wales or Ireland.
[3] Naturally, we didn’t test those average scores for statistical significance. However, for most of them the difference seems big enough to make an educated guess that they in fact are meaningful.