Exploring SAT Scores of New York City: Visualization & Analysis

Part One of this project focused entirely on preparing our data for analysis. While not the most exciting task, it is necessary to some degree for nearly every real-world data set. For the second half of this project, our goal is to understand how differences in school, borough and district relate to a school's mean SAT score and whether these differences actually have any meaningful correlation to the test scores. We will utilize several visualizations, some basic correlation analysis and hypothesis testing to help us discover trends and examine the significance of those trends.

Preliminary Analysis

Up to this point, we haven't set any research questions for our project other than the broad inquiry of whether we can identify any outside factors influencing SAT scores in New York City. Despite having only explored datasets in isolation, we should define some narrower research questions at this point in order to keep our analysis focused. As with all research, we may discover interesting findings outside of our intended scope along the way; however, having concrete questions that we aim to answer will ensure that we do not get lost amidst the numerous possibilities. In this notebook we will explore five potential influencing demographics: sex/gender, cultural, socioeconomic, concentration of advanced/high-achieving students, and physical location of the school.


The research questions we will attempt to answer are as follows:


  • Is a school's average SAT score correlated to... :
    • ...the location of a school (borough or district)?
    • ...the proportion of males to females attending that school?
    • ...any cultural aspects of its student population (e.g., ethnicity, English proficiency)?
    • ...the socioeconomic spread of its students?
    • ...the proportion of high-achieving or advanced placement students?
  • Are any of these correlations strong and significant enough to warrant looking into further?

Setting the Stage with Maps

Just as in our first notebook, we know we are likely to use NumPy and Pandas, so we will start by importing those and setting up Pandas to display up to 500 rows or columns of a table. This time, rather than having to read in nine separate datasets, we only need to take a look at our clean data.

In [1]:
import numpy as np
import pandas as pd

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

full = pd.read_csv('sat_data_clean.csv')
full.head()
Out[1]:
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score is_suppressed AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS schoolyear frl_percent total_enrollment grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per Total Cohort Total Grads - % of cohort Total Regents - % of cohort Total Regents - % of grads Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - % of cohort Local - % of grads Still Enrolled - % of cohort Dropped Out - % of cohort borough grade_span_min grade_span_max city postcode total_students school_type language_classes advancedplacement_courses online_ap_courses online_language_courses start_time end_time number_programs Location 1 Community Board Council District lat lon has_lang has_ap has_online_lang has_online_ap has_spanish has_french has_chinese has_russian rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 school_dist District YTD % Attendance (Avg) YTD Enrollment(Avg) is_consort is_CTE is_allgirls is_intl is_consort_intl is_specialized is_allboys
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29.0 355.0 404.0 363.0 1122.0 0 0.0 0.0 0.0 65.384615 2.923077 22.700000 20.115385 25.307692 20112012.0 88.6 422.0 98.0 79.0 80.0 50.000000 94.0 22.3 105.0 24.9 34.0 35.0 59.0 14.0 123.0 29.1 227.0 53.8 7.0 1.7 259.0 61.4 163.0 38.6 56.000000 61.500000 41.675000 69.125000 0.000000 0.000000 41.675000 69.125000 19.850000 30.875000 20.275000 11.950000 Manhattan 6.000000 12.0 New York 10002.00000 323.00000 Traditional Chinese (Mandarin), Spanish Psychology Chinese Language and Culture, Spanish Literatu... Chinese (Mandarin), Spanish 830.000000 330.000000 1.000000 220 Henry Street\nNew York, NY 10002\n(40.7137... 3.000000 1.000000 40.713764 -73.985260 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 89.0 70.0 39.0 379.000000 26.0 151.0 7.8 7.7 7.4 7.6 6.3 5.3 6.1 6.5 6.000000 5.600000 6.100000 6.700000 6.7 6.2 6.6 7.0 1 DISTRICT 01 91.18 12367 0 0 0 0 0 0 0
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91.0 383.0 423.0 366.0 1172.0 0 39.0 49.0 10.0 103.923077 4.423077 23.800000 20.115385 28.000000 20112012.0 71.8 394.0 109.0 97.0 93.0 95.000000 83.0 21.1 86.0 21.8 55.0 10.0 115.0 29.2 89.0 22.6 181.0 45.9 9.0 2.3 226.0 57.4 168.0 42.6 97.714286 60.485714 37.157143 62.471429 8.657143 14.071429 28.528571 48.385714 23.314286 37.528571 26.957143 9.828571 Manhattan 9.000000 12.0 New York 10002.00000 299.00000 Traditional Chinese, Spanish Calculus AB, Chinese Language and Culture, Eng... No courses Chinese (Cantonese), Chinese (Mandarin), Spanish 815.000000 315.000000 3.000000 200 Monroe Street\nNew York, NY 10002\n(40.712... 3.000000 1.000000 40.712332 -73.984797 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 84.0 95.0 10.0 385.000000 37.0 46.0 7.9 7.4 7.2 7.3 6.6 5.8 6.6 7.3 6.000000 5.700000 6.300000 7.000000 6.8 6.3 6.7 7.2 1 DISTRICT 01 91.18 12367 0 0 0 0 0 0 0
2 01M450 EAST SIDE COMMUNITY SCHOOL 70.0 377.0 402.0 370.0 1149.0 0 19.0 21.0 0.0 53.535714 2.464286 21.928571 20.464286 23.250000 20112012.0 71.8 598.0 101.0 93.0 77.0 86.000000 30.0 5.0 158.0 26.4 91.0 19.0 58.0 9.7 143.0 23.9 331.0 55.4 62.0 10.4 327.0 54.7 271.0 45.3 79.571429 70.385714 66.000000 93.828571 0.000000 0.000000 66.000000 93.828571 4.357143 6.171429 17.614286 10.742857 Manhattan 6.000000 12.0 New York 10009.00000 649.00000 Consortium School No Language Classes Calculus AB, English Literature and Composition No courses American Sign Language, Arabic, Chinese (Manda... 830.000000 330.000000 1.000000 420 East 12 Street\nNew York, NY 10009\n(40.72... 3.000000 2.000000 40.729783 -73.983041 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 98.0 28.0 516.257511 42.0 150.0 8.7 8.2 8.1 8.4 7.3 8.0 8.0 8.8 6.725751 6.166953 6.719313 7.429828 7.9 7.9 7.9 8.4 1 DISTRICT 01 91.18 12367 1 0 0 0 0 0 0
3 01M458 FORSYTH SATELLITE ACADEMY 7.0 414.0 401.0 359.0 1174.0 0 0.0 0.0 0.0 28.600000 1.200000 23.000000 22.600000 23.400000 20112012.0 72.8 224.0 131.0 49.0 44.0 147.334928 9.0 4.0 20.0 8.9 3.0 0.0 5.0 2.2 77.0 34.4 133.0 59.4 8.0 3.6 97.0 43.3 127.0 56.7 175.547166 62.915625 45.121491 66.876392 10.910809 14.105702 34.210494 52.770587 17.803826 33.140172 24.645990 9.779497 Manhattan 8.457766 12.0 NaN 10725.96477 772.02168 Traditional No courses No courses No courses No courses 816.252033 315.111111 1.821138 NaN 6.782016 22.237057 40.719022 -73.982377 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 40.0 100.0 23.0 66.000000 10.0 37.0 8.1 7.0 6.7 7.6 8.5 8.2 8.9 8.9 6.800000 6.100000 6.100000 6.800000 7.8 7.1 7.2 7.8 1 DISTRICT 01 91.18 12367 0 0 0 0 0 0 0
4 01M509 MARTA VALLE HIGH SCHOOL 44.0 390.0 433.0 384.0 1207.0 0 0.0 0.0 0.0 49.851852 2.296296 19.370370 17.370370 21.481481 20112012.0 80.7 367.0 143.0 100.0 51.0 73.000000 41.0 11.2 95.0 25.9 28.0 36.0 34.0 9.3 116.0 31.6 209.0 56.9 6.0 1.6 170.0 46.3 197.0 53.7 73.571429 49.914286 31.385714 61.157143 10.571429 19.628571 20.814286 41.514286 18.514286 38.842857 29.857143 14.342857 Manhattan 9.000000 12.0 New York 10002.00000 401.00000 Traditional French, Spanish English Literature and Composition, Studio Art... No courses Spanish 800.000000 330.000000 1.000000 145 Stanton Street\nNew York, NY 10002\n(40.72... 3.000000 1.000000 40.720569 -73.985673 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 90.0 100.0 21.0 306.000000 29.0 69.0 7.7 7.4 7.2 7.3 6.4 5.3 6.1 6.8 6.400000 5.900000 6.400000 7.000000 6.9 6.2 6.6 7.0 1 DISTRICT 01 91.18 12367 0 0 0 0 0 0 0

To this point, we have only worked with numbers and words. However, data is often best understood and communicated through images. Since we have geodata at our disposal, we can draw up some maps to gain a 20,000-foot overview of the schools and school districts that is difficult to picture by simply reading the numbers.

For this task we will use Folium, a Python library that allows us to create interactive maps with the power of the leaflet.js JavaScript library. Our first map will be a map of New York City with a marker identifying each school. If the map is zoomed in, we can see individual schools, while zooming out displays clusters that indicate the number of schools in the area.

In [2]:
import folium
from folium import plugins # Needed for adding a marker cluster & heatmap 

schools_map = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
marker_cluster = plugins.MarkerCluster().add_to(schools_map)
for name, row in full.iterrows():
    # add markers that display DBN and school name when clicked
    folium.Marker([row['lat'], row['lon']], popup='{0}: {1}'.format(row['DBN'], row['SCHOOL NAME'])).add_to(marker_cluster)

schools_map.save('schools.html')
schools_map
Out[2]:

We can also create a heatmap with Folium for a different view of the concentration of schools across the city. Once again, zooming in provides a finer level of detail.

In [3]:
schools_heatmap = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
schools_heatmap.add_child(plugins.HeatMap([[row["lat"], row["lon"]] for name, row in full.iterrows()]))
schools_heatmap.save("heatmap.html")
schools_heatmap
Out[3]:

Now that we have a general idea of where schools are located throughout the city, we can take a closer look at where each district lies. To do this, we will create a map using the GeoJSON file we downloaded at the beginning of our first notebook.

One advantage that Folium provides is the ability to apply a fully-integrated choropleth layer over our map. Using our GeoJSON file to define the district boundaries within the choropleth, we can view values of any column from our data in a color spectrum on our map. However, in order to do this, we need to re-format our data to match the format of our JSON data. Opening up the GeoJSON file we see that each district is indicated by a string containing the (non-zero-padded) district number.

{"type":"Feature","properties":{"school_dist":"6"...

The following code groups our full dataset by the school district, converts our school_dist column into strings and strips the zeros from in front of districts 1 through 9.

In [4]:
district_data = full.groupby('school_dist').mean().reset_index()
district_data['school_dist'] = district_data['school_dist'].apply(lambda x: str(int(x)))
district_data.to_csv('district_data.csv', index=False) # Saving to CSV for later use in Tableau
district_data
Out[4]:
school_dist Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score is_suppressed AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS schoolyear frl_percent total_enrollment grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per Total Cohort Total Grads - % of cohort Total Regents - % of cohort Total Regents - % of grads Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - % of cohort Local - % of grads Still Enrolled - % of cohort Dropped Out - % of cohort grade_span_min grade_span_max postcode total_students start_time end_time number_programs Community Board Council District lat lon has_lang has_ap has_online_lang has_online_ap has_spanish has_french has_chinese has_russian rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 YTD % Attendance (Avg) YTD Enrollment(Avg) is_consort is_CTE is_allgirls is_intl is_consort_intl is_specialized is_allboys
0 1 73.333333 423.777778 468.444444 413.555556 1305.777778 0.000000 37.444444 52.555556 25.000000 96.341444 4.242394 22.438021 19.084220 25.581993 20112012.0 63.722222 557.222222 120.788457 117.000000 98.666667 100.926103 80.222222 16.422222 59.333333 13.277778 23.777778 11.333333 143.222222 21.388889 106.444444 24.333333 175.555556 40.244444 126.444444 13.255556 272.777778 48.577778 284.444444 51.422222 101.886193 65.130942 52.135483 74.430631 14.394217 19.026348 37.745055 55.399510 13.000901 25.588670 23.471221 9.189309 8.092340 12.0 10244.099368 697.007227 826.528455 333.370370 1.495935 4.260672 8.412352 40.719022 -73.982377 0.555556 0.555556 0.444444 0.111111 0.555556 0.222222 0.444444 0.000000 74.333333 88.666667 36.333333 412.806390 32.111111 210.111111 8.311111 7.711111 7.588889 7.977778 7.511111 6.711111 7.344444 7.966667 7.069528 6.262995 6.835479 7.469981 7.688889 6.955556 7.300000 7.833333 91.18 12367.0 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1 2 95.215385 421.584615 438.292308 416.707692 1276.584615 0.107692 57.800000 90.461538 61.261538 117.518143 4.617674 24.057153 20.407704 27.104752 20112012.0 63.734085 587.791556 168.652068 163.679870 141.849770 132.143762 66.172991 14.974137 69.735521 12.796629 33.063881 13.984510 102.799009 13.738872 143.916752 25.882137 268.596103 48.605874 65.752444 10.462198 267.564821 46.200168 320.226701 53.799819 159.481007 65.260013 47.308477 67.554182 12.139508 14.893343 35.170046 52.661096 17.964940 32.470529 22.565430 9.715520 8.690254 12.0 10185.284178 655.235772 825.242777 317.733333 1.497186 4.838650 7.381429 40.739540 -73.991099 0.738462 0.630769 0.123077 0.123077 0.676923 0.138462 0.076923 0.000000 79.600000 85.061538 37.553846 456.573192 31.507692 183.000000 8.296923 7.578462 7.458462 7.789231 7.444615 6.592308 7.206154 7.627692 7.057319 6.347184 6.804913 7.486613 7.593846 6.824615 7.152308 7.629231 89.01 60823.0 0.107692 0.092308 0.015385 0.015385 0.015385 0.015385 0.000000
2 3 83.647059 419.235294 422.941176 411.647059 1253.823529 0.235294 59.941176 95.588235 55.411765 123.533350 4.830710 22.370030 18.967461 25.351618 20112012.0 59.542092 594.732418 167.535065 162.423032 146.153971 140.667042 40.896732 9.948170 63.518170 14.257699 20.257888 15.941043 61.407974 8.195686 172.858170 34.708170 224.102745 45.628340 127.876993 10.443699 249.806667 47.012405 344.925621 52.987542 180.200594 62.733718 47.259706 69.996154 10.440779 12.987728 36.813394 57.011203 15.472511 30.008920 25.572306 8.886548 8.487578 12.0 10230.283756 733.829906 819.191774 308.326797 1.947394 7.288828 11.128546 40.781574 -73.977370 0.705882 0.588235 0.117647 0.000000 0.647059 0.294118 0.058824 0.000000 76.235294 78.352941 28.764706 425.352941 25.764706 168.710523 8.207187 7.497328 7.479142 7.750270 6.788235 6.041176 6.605882 7.052941 6.776471 6.188235 6.641176 7.335294 7.247059 6.564706 6.923529 7.394118 89.28 21962.0 0.058824 0.058824 0.000000 0.000000 0.000000 0.058824 0.000000
3 4 99.428571 393.142857 405.285714 392.714286 1191.142857 0.000000 55.285714 71.285714 41.428571 93.291096 3.764391 23.987716 20.732331 26.678992 20112012.0 70.342857 532.571429 133.285714 148.000000 119.428571 108.857143 23.857143 6.171429 59.428571 14.942857 26.714286 11.428571 59.000000 6.000000 139.285714 29.514286 321.000000 62.185714 8.857143 1.385714 210.285714 39.371429 322.285714 60.628571 113.755102 67.693878 44.028571 61.207143 11.591837 14.705102 32.444898 46.502041 23.669388 38.792857 19.652041 9.871429 8.493967 12.0 10129.423539 624.860240 832.321719 310.587302 1.260163 10.397431 10.033865 40.793572 -73.942534 0.857143 0.857143 0.142857 0.000000 0.857143 0.142857 0.142857 0.000000 88.857143 92.571429 46.000000 484.142857 31.714286 191.714286 8.185714 7.542857 7.400000 7.800000 7.514286 6.614286 7.014286 7.471429 6.728571 6.014286 6.557143 7.371429 7.471429 6.728571 6.971429 7.542857 91.13 14252.0 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000
4 5 63.600000 406.700000 410.000000 400.500000 1217.200000 0.100000 34.800000 42.200000 23.000000 90.196566 3.739961 23.214686 20.062172 26.188577 20112012.0 63.200000 533.400000 130.100000 121.600000 102.630876 93.433493 27.100000 5.820000 59.800000 11.810000 26.538409 18.699773 23.300000 5.230000 307.900000 53.450000 176.700000 35.830000 22.500000 4.870000 258.100000 46.720000 275.300000 53.280000 105.890386 63.161815 48.935012 70.744921 13.426567 16.826736 35.506504 53.919594 14.232075 29.258392 22.605269 10.098399 7.937330 12.0 10238.889431 658.506504 812.875610 327.033333 1.346341 8.634605 12.371117 40.817077 -73.949251 0.700000 0.600000 0.000000 0.000000 0.700000 0.300000 0.000000 0.000000 81.700000 79.200000 38.500000 387.300000 25.400000 165.900000 8.200000 7.470000 7.450000 7.700000 6.850000 6.100000 6.490000 7.080000 6.270000 5.920000 6.380000 7.210000 7.100000 6.510000 6.770000 7.340000 89.08 13170.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000
5 6 66.818182 382.818182 396.000000 378.272727 1157.090909 0.090909 51.909091 73.545455 26.090909 117.696183 4.579324 24.731261 20.856773 28.028701 20112012.0 82.637778 629.859192 169.281465 157.017412 127.664432 122.060896 158.385859 28.501717 81.255354 14.870990 35.706198 20.726860 17.539596 1.593333 94.689899 13.939899 501.431515 82.480162 12.900808 1.349354 335.883030 53.400990 293.975960 46.598929 168.690270 65.100609 43.938340 64.956802 9.115303 12.377916 34.826848 52.582734 21.152389 35.035143 21.574151 10.324447 8.132524 12.0 10098.905888 641.911062 808.295639 325.010101 1.347376 10.778548 11.861283 40.848970 -73.932502 0.909091 0.818182 0.000000 0.090909 0.909091 0.090909 0.000000 0.000000 83.000000 80.818182 53.181818 450.454545 31.363636 244.909091 8.636364 8.127273 7.900000 8.181818 7.627273 6.800000 7.218182 7.772727 7.090909 6.227273 6.818182 7.600000 7.790909 7.054545 7.309091 7.872727 91.34 25733.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
6 7 49.941176 380.764706 385.411765 375.000000 1141.176471 0.117647 22.941176 35.470588 8.294118 83.143887 3.624158 22.618375 19.326739 25.450079 20112012.0 77.976471 434.764706 105.647059 108.470588 88.764706 88.235294 62.941176 14.517647 77.176471 16.205882 29.058824 27.588235 6.764706 1.494118 134.235294 31.305882 287.705882 65.711765 3.411765 0.805882 241.176471 51.535294 193.588235 48.464706 97.800534 56.699765 35.330119 54.655052 6.278709 8.772800 29.050324 45.885097 21.374092 45.353620 28.415565 10.918708 8.343004 12.0 10516.991711 537.887454 820.588714 314.026144 1.546150 2.889886 15.055778 40.816815 -73.919971 0.705882 0.647059 0.176471 0.176471 0.529412 0.176471 0.000000 0.000000 77.882353 79.235294 40.058824 332.588235 24.705882 157.117647 8.476471 7.817647 7.641176 7.976471 7.017647 6.282353 6.823529 7.341176 6.917647 6.323529 6.811765 7.547059 7.470588 6.800000 7.094118 7.617647 86.75 19717.0 0.117647 0.058824 0.000000 0.058824 0.000000 0.000000 0.000000
7 8 48.380952 387.761905 390.476190 381.571429 1159.809524 0.285714 27.666667 35.428571 10.428571 94.160941 3.866522 22.984436 19.775820 25.689651 20112012.0 75.572910 555.519153 189.304021 146.875384 128.564406 125.429027 56.975661 12.468466 108.124656 18.160085 44.084199 39.618831 30.517672 3.407302 160.389418 31.517989 332.023492 61.955407 28.943704 2.437418 282.163175 47.329608 273.355767 52.670307 162.621731 54.316886 33.923229 56.545056 5.541562 8.147207 28.381401 48.398495 20.398792 43.460466 29.796418 12.762213 8.364863 12.0 10566.224674 626.293973 815.476965 305.947090 1.455672 7.250292 18.947450 40.823004 -73.864576 0.619048 0.476190 0.142857 0.047619 0.619048 0.095238 0.047619 0.000000 67.173860 79.277102 34.702026 355.643572 29.833131 133.098995 8.148675 7.621647 7.497401 7.783552 7.122553 6.326920 6.882695 7.421044 6.654833 6.134948 6.678030 7.331412 7.298531 6.694813 7.004691 7.505218 87.15 31625.0 0.000000 0.000000 0.047619 0.000000 0.000000 0.000000 0.000000
8 9 52.818182 373.772727 381.090909 373.772727 1128.636364 0.045455 19.227273 24.363636 5.818182 80.463122 3.531670 23.223048 19.946652 26.492725 20112012.0 75.631818 426.318182 116.818182 104.136364 81.363636 76.106133 79.681818 20.786364 72.818182 17.350000 24.818182 26.863636 6.500000 1.400000 145.590909 33.690909 268.636364 63.568182 3.045455 0.709091 216.318182 51.513636 209.954545 48.477273 87.078510 66.398787 43.687812 64.303079 5.856128 8.060139 37.825057 56.237082 22.733184 35.721907 21.238717 9.254737 8.268888 12.0 10480.633161 479.001971 811.932003 327.555556 1.211013 3.843820 16.339732 40.836349 -73.906240 0.818182 0.590909 0.045455 0.045455 0.727273 0.227273 0.000000 0.000000 81.000000 86.045455 38.636364 332.772727 25.363636 152.181818 8.522727 7.950000 7.772727 8.095455 6.918182 6.350000 6.918182 7.354545 6.627273 6.145455 6.686364 7.481818 7.350000 6.818182 7.113636 7.663636 89.27 34518.0 0.000000 0.000000 0.000000 0.045455 0.000000 0.000000 0.045455
9 10 98.142857 395.785714 409.928571 391.035714 1196.750000 0.071429 91.714286 158.892857 107.392857 129.446808 5.199036 22.885003 18.959498 26.175094 20112012.0 73.083254 779.496508 213.864008 197.549395 157.379197 139.488209 130.803175 19.797778 100.950635 13.898635 30.255438 39.464042 103.816825 6.212619 174.756349 25.113492 432.660476 62.241556 63.636349 5.842349 393.443810 49.011492 386.052540 50.988444 166.921736 64.184826 47.296412 70.087588 13.485337 16.743680 33.809280 53.347286 16.905776 29.931955 21.222046 11.555591 8.348093 12.0 10519.492451 760.897503 810.268293 331.130952 1.568815 6.953289 14.372227 40.870345 -73.898360 0.714286 0.714286 0.071429 0.071429 0.642857 0.214286 0.035714 0.000000 78.166109 78.600684 36.776520 582.544911 41.053419 255.395675 8.236506 7.666235 7.508765 7.855521 7.052629 6.277333 6.776307 7.372926 6.654491 6.098820 6.686404 7.415351 7.291755 6.660395 6.978518 7.528913 88.92 56757.0 0.000000 0.000000 0.000000 0.035714 0.000000 0.071429 0.000000
10 11 66.894737 388.894737 394.052632 380.263158 1163.210526 0.105263 20.421053 25.894737 5.473684 97.396423 3.930534 24.014070 20.156705 27.446385 20112012.0 67.206901 571.310643 183.746959 161.809635 110.979869 100.140519 66.709942 12.680936 104.874620 17.408515 32.619378 40.841866 30.519532 4.550175 233.640936 40.435673 274.289123 50.013871 28.253567 4.225567 319.443509 55.232725 251.866901 44.767181 131.077772 70.193311 47.324192 65.194708 8.123932 10.432367 39.198686 54.764698 22.866668 34.807036 19.118901 8.205386 8.885845 12.0 10522.255741 607.530880 806.842533 328.233918 1.593924 10.480424 14.365696 40.873138 -73.856120 0.736842 0.684211 0.052632 0.105263 0.736842 0.105263 0.000000 0.000000 79.263158 77.421053 41.631579 376.157895 27.842105 182.894737 8.205263 7.878947 7.747368 7.984211 7.152632 6.947368 7.105263 7.636842 6.400000 5.889474 6.468421 7.363158 7.247368 6.905263 7.110526 7.657895 89.84 38230.0 0.000000 0.052632 0.000000 0.000000 0.000000 0.000000 0.000000
11 12 33.166667 368.555556 377.222222 361.944444 1107.722222 0.166667 5.111111 5.777778 0.000000 79.656523 3.532684 22.253377 19.155263 25.211129 20112012.0 78.450864 370.469506 108.177346 99.399530 85.406042 76.055821 82.235802 21.645494 63.989383 17.082272 20.465783 21.499874 10.996420 1.945926 112.532716 31.602160 238.041481 64.948988 7.383827 1.074605 181.539630 47.522827 188.929753 52.477123 102.839557 56.131267 37.736463 63.898083 4.674040 6.442377 33.063154 57.456664 18.410072 36.148708 27.292909 12.981459 8.485922 12.0 10550.766034 530.007227 818.750678 326.981481 1.329268 5.649561 18.745686 40.831412 -73.886946 0.611111 0.611111 0.111111 0.000000 0.611111 0.000000 0.000000 0.000000 73.166667 83.944444 40.833333 240.277778 21.888889 131.944444 8.555556 7.944444 7.772222 8.100000 7.222222 6.538889 7.033333 7.600000 6.950000 6.350000 6.938889 7.600000 7.566667 6.938889 7.250000 7.766667 87.33 23118.0 0.055556 0.000000 0.000000 0.055556 0.000000 0.000000 0.000000
12 13 122.000000 410.666667 414.333333 401.666667 1226.666667 0.111111 143.666667 237.555556 154.944444 151.051350 5.684546 23.043802 19.782463 25.839729 20112012.0 64.611111 706.500000 191.005339 179.388889 164.388889 160.796385 26.555556 6.744444 53.722222 11.661111 23.653788 19.555303 197.388889 7.833333 345.444444 72.200000 87.722222 15.888889 70.000000 2.877778 380.277778 50.605556 326.222222 49.394444 167.039078 63.127644 47.086836 66.735827 13.897516 16.233728 33.191855 50.495944 16.071524 33.299473 27.448472 7.280075 8.516046 12.0 11073.490214 860.950467 825.736676 334.586420 2.005872 3.772782 30.621405 40.692865 -73.977016 0.722222 0.611111 0.000000 0.111111 0.666667 0.277778 0.055556 0.000000 81.500000 81.888889 34.166667 585.000000 32.722222 244.666667 8.455556 7.833333 7.738889 8.005556 7.127778 6.605556 6.994444 7.533333 6.622222 6.188889 6.711111 7.450000 7.372222 6.850000 7.138889 7.644444 89.56 22785.0 0.000000 0.111111 0.055556 0.000000 0.055556 0.055556 0.000000
13 14 68.687500 395.500000 396.125000 385.250000 1176.875000 0.125000 19.687500 26.625000 2.375000 93.507076 3.775618 23.910637 20.247981 27.255405 20112012.0 74.675972 556.090694 164.818507 138.699471 111.581797 100.208433 56.515278 9.276181 94.488056 17.948806 39.274006 23.124858 24.433472 4.020417 224.349306 43.333681 280.796667 48.186361 22.869306 3.733931 312.607083 55.400681 243.483472 44.586764 98.013783 69.010221 48.789223 67.340068 8.521446 10.286709 40.270792 57.051107 20.234869 32.689824 19.910930 8.258135 8.557221 12.0 11150.183096 573.690210 815.156504 306.576389 1.977642 1.847752 32.217132 40.711599 -73.948360 0.875000 0.687500 0.187500 0.187500 0.812500 0.062500 0.000000 0.000000 77.687500 85.500000 32.250000 372.250000 30.687500 143.750000 8.343750 7.693750 7.612500 7.975000 7.362500 6.743750 7.306250 7.818750 6.775000 6.243750 6.787500 7.543750 7.493750 6.912500 7.256250 7.787500 89.41 20181.0 0.062500 0.062500 0.000000 0.000000 0.000000 0.062500 0.000000
14 15 49.846154 393.000000 397.692308 384.153846 1174.846154 0.153846 7.000000 8.153846 0.000000 89.690023 3.794926 22.160676 18.672783 25.303108 20112012.0 69.892308 439.384615 131.399402 120.230769 88.000000 83.564225 34.615385 6.538462 76.538462 15.346154 26.615385 24.384615 19.846154 4.215385 196.153846 47.423077 193.076923 41.800000 28.461538 6.023077 214.000000 49.892308 225.384615 50.107692 93.976156 49.516953 29.118686 52.765034 2.810722 4.263442 26.307767 48.505979 20.415313 47.250526 38.471486 8.726262 7.910082 12.0 11063.989160 634.314363 822.692933 308.034188 1.714196 6.009851 31.919095 40.675972 -73.989255 0.692308 0.538462 0.153846 0.230769 0.692308 0.153846 0.153846 0.000000 74.923077 80.538462 34.769231 375.865962 27.769231 140.615385 8.269231 7.715385 7.492308 7.992308 6.892308 6.246154 6.761538 7.376923 6.555827 6.143612 6.578409 7.394602 7.276923 6.707692 6.969231 7.615385 91.27 26786.0 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
15 16 47.000000 370.600000 376.400000 364.600000 1111.600000 0.200000 14.200000 22.400000 0.000000 111.695505 4.624700 23.011849 18.777358 26.536763 20112012.0 64.640000 585.600000 118.800000 128.800000 152.600000 180.266986 16.400000 3.460000 115.600000 19.280000 27.000000 49.400000 4.400000 1.060000 495.800000 79.840000 76.000000 17.400000 4.000000 0.880000 327.600000 51.360000 258.000000 48.640000 228.226009 62.606518 45.912894 69.151550 7.882676 10.553897 38.030106 58.613781 16.699439 30.858389 24.561880 9.363888 9.000000 12.0 11221.800000 400.600000 815.000000 292.400000 1.600000 3.000000 38.000000 40.686497 -73.928188 1.000000 0.800000 0.200000 0.400000 1.000000 0.400000 0.000000 0.000000 70.000000 77.400000 16.400000 408.000000 29.000000 99.600000 7.620000 7.320000 7.140000 7.460000 5.580000 5.200000 5.940000 6.280000 5.860000 5.800000 6.300000 7.180000 6.260000 6.020000 6.400000 6.920000 85.55 10196.0 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
16 17 62.850000 386.550000 390.850000 375.600000 1153.000000 0.100000 37.700000 57.600000 8.900000 103.903095 4.190471 24.342899 20.161051 27.845455 20112012.0 69.515778 487.322556 133.109611 131.609577 101.580876 102.483493 43.962222 9.700944 53.490444 12.024044 24.219205 16.549886 18.346778 3.376333 398.529444 81.731944 55.237333 11.999089 11.695444 2.117144 216.885667 46.760544 270.436778 53.239411 117.482574 63.808920 45.293792 65.928508 7.040367 9.043677 38.246978 56.878821 18.516025 34.075862 26.395063 7.760450 8.114441 12.0 11096.591192 591.655420 807.063008 306.027778 1.655285 9.645504 33.859264 40.661319 -73.954519 0.700000 0.500000 0.050000 0.150000 0.650000 0.400000 0.050000 0.050000 76.182553 80.140957 38.987128 375.462876 26.524787 176.553945 8.001109 7.677729 7.617271 7.852729 6.873681 6.603266 7.086830 7.522096 6.266288 5.968348 6.560966 7.331491 7.053457 6.749553 7.084926 7.570479 89.67 26551.0 0.000000 0.100000 0.000000 0.050000 0.000000 0.000000 0.000000
17 18 39.214286 378.714286 374.214286 370.357143 1123.285714 0.000000 0.000000 0.000000 0.000000 54.285745 2.530983 21.534449 19.259438 23.966516 20112012.0 64.121429 305.571429 87.785714 93.142857 65.714286 58.928571 23.928571 7.221429 46.285714 14.957143 30.714286 6.642857 3.714286 1.178571 267.071429 87.392857 29.857143 9.771429 3.285714 1.085714 153.500000 51.742857 152.071429 48.257143 66.751571 51.795015 36.428790 61.320142 8.572779 11.083052 27.855865 50.237009 15.373840 38.692873 35.981968 9.785081 8.883807 12.0 11117.278165 435.718931 819.768293 325.523810 1.247387 15.310432 40.336512 40.641863 -73.914726 0.785714 0.571429 0.214286 0.357143 0.642857 0.428571 0.000000 0.000000 72.857143 82.714286 29.214286 194.428571 17.357143 79.000000 8.264286 7.742857 7.571429 7.978571 6.878571 6.285714 7.000000 7.457143 6.500000 6.200000 6.671429 7.400000 7.207143 6.742857 7.085714 7.628571 89.83 18641.0 0.000000 0.071429 0.000000 0.000000 0.000000 0.000000 0.000000
18 19 54.428571 371.928571 382.642857 364.142857 1118.714286 0.142857 20.928571 23.214286 4.785714 79.683736 3.332686 23.189496 19.913765 26.069705 20112012.0 70.915397 480.889365 140.149722 122.299395 101.522054 105.166781 63.017460 16.487063 72.486349 15.005778 29.313149 24.285552 22.066825 3.730476 261.042063 50.788492 182.910476 43.184413 11.707778 1.660206 273.622381 53.286492 207.266825 46.713444 142.420003 59.301244 38.512522 61.047436 6.657942 9.646424 31.855279 51.393577 20.791673 38.976847 26.307751 11.462065 8.708252 12.0 11138.709253 487.788811 804.607433 290.944444 1.903020 5.254574 37.033865 40.676547 -73.882158 0.857143 0.714286 0.357143 0.285714 0.857143 0.071429 0.000000 0.000000 69.428571 86.785714 31.357143 291.571429 27.785714 127.000000 7.657143 7.600000 7.400000 7.692857 6.685714 6.557143 6.900000 7.428571 6.350000 6.021429 6.607143 7.250000 6.900000 6.728571 6.964286 7.442857 87.81 25632.0 0.000000 0.214286 0.000000 0.000000 0.000000 0.000000 0.000000
19 20 298.833333 394.333333 466.500000 386.500000 1247.333333 0.166667 168.333333 266.666667 122.500000 218.885864 8.114801 22.854126 17.922480 26.025602 20112012.0 65.952593 2170.575185 645.016018 681.698589 436.769585 419.278309 515.374074 19.519815 299.801481 14.963481 101.397348 119.999621 722.989259 29.004444 192.598148 16.956481 684.457778 34.163630 563.818148 19.473815 1180.785556 45.685148 989.789259 54.314704 533.491913 60.690923 47.773830 76.665940 19.405984 30.430472 28.374927 46.233053 12.925085 23.356248 23.212949 13.921737 8.409628 12.0 11129.494128 2229.836947 848.542005 336.685185 3.470190 10.297003 39.872843 40.626751 -74.006191 0.833333 0.833333 0.000000 0.000000 0.833333 0.166667 0.500000 0.166667 68.500000 81.833333 21.666667 1170.500000 96.666667 503.833333 7.966667 7.200000 7.250000 7.616667 7.816667 7.566667 7.816667 8.200000 7.200000 6.366667 7.100000 7.650000 7.650000 7.050000 7.416667 7.816667 92.77 44214.0 0.000000 0.000000 0.166667 0.000000 0.000000 0.000000 0.000000
20 21 161.923077 396.692308 416.538462 387.923077 1201.153846 0.153846 65.000000 96.692308 37.846154 129.896696 5.116267 23.332248 19.847699 26.019050 20112012.0 64.101197 1050.880855 291.392008 318.476272 202.562212 211.974604 142.941880 15.655299 141.216068 13.798530 59.029545 47.461364 230.995043 15.540513 332.276068 34.626068 221.826667 26.052444 260.992991 23.357145 555.054872 56.254684 495.825812 43.745248 244.760187 59.177335 44.526942 71.661967 12.271824 18.247908 32.248101 53.414759 14.670591 28.360712 27.549957 10.885266 8.455041 12.0 11144.840734 1048.080258 806.038774 303.478632 3.049406 11.966464 43.036470 40.593596 -73.978465 0.846154 0.615385 0.000000 0.076923 0.846154 0.307692 0.230769 0.153846 78.665466 88.832242 35.210966 836.404424 58.115057 305.929145 7.940167 7.450353 7.372724 7.658045 7.082586 6.358871 6.948969 7.480147 6.517365 6.105150 6.578409 7.233064 7.182242 6.637774 6.969116 7.454583 90.50 34342.0 0.000000 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000
21 22 393.600000 455.400000 480.600000 453.400000 1389.400000 0.000000 294.400000 464.000000 271.000000 263.231012 8.940673 26.438064 20.923430 29.482129 20112012.0 46.000000 2099.600000 554.200000 604.200000 453.000000 476.200000 194.400000 7.860000 183.400000 7.940000 76.600000 61.600000 452.400000 16.780000 736.200000 40.440000 277.800000 13.040000 624.600000 29.200000 1017.600000 46.080000 1082.000000 53.920000 580.000000 75.442857 62.197143 80.345714 27.771429 35.040000 34.417143 45.305714 13.294286 19.688571 15.142857 8.262857 8.891553 12.0 11123.592954 1873.604336 817.250407 284.622222 2.164228 11.556403 40.447411 40.618285 -73.952288 0.800000 0.600000 0.000000 0.000000 0.800000 0.400000 0.200000 0.200000 89.800000 84.200000 41.200000 1861.600000 91.000000 925.800000 8.120000 7.240000 7.440000 7.660000 7.860000 7.200000 7.480000 8.000000 6.760000 5.940000 6.700000 7.380000 7.600000 6.780000 7.200000 7.700000 92.57 36352.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
22 23 36.500000 358.666667 375.166667 351.333333 1085.166667 0.000000 14.500000 15.500000 0.000000 85.382105 3.628471 21.879838 18.452419 24.762065 20112012.0 69.683333 362.000000 90.833333 86.500000 55.833333 70.500000 6.166667 1.750000 45.833333 12.600000 15.000000 14.833333 4.166667 1.066667 310.000000 85.216667 44.000000 12.883333 2.000000 0.433333 162.833333 45.983333 199.166667 54.016667 77.222222 44.595833 28.761111 62.800000 4.841667 6.633333 23.920833 56.156944 15.823611 37.211111 43.084722 9.606944 8.228883 12.0 10972.482385 581.510840 834.292683 292.888889 1.577236 11.391008 31.618529 40.668586 -73.912298 0.500000 0.500000 0.000000 0.000000 0.500000 0.166667 0.000000 0.000000 74.166667 71.833333 32.166667 272.666667 18.666667 110.666667 8.350000 7.850000 7.783333 8.183333 6.833333 6.183333 6.916667 7.516667 6.883333 6.416667 7.116667 7.800000 7.366667 6.833333 7.266667 7.833333 86.98 11833.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
23 24 124.533333 405.200000 431.200000 401.066667 1237.466667 0.133333 58.000000 76.400000 28.200000 137.802570 5.799242 23.277269 19.544007 26.511753 20112012.0 62.614370 922.763407 257.339741 251.158872 216.774501 211.822329 162.082963 19.374593 91.320593 9.665393 37.543485 30.132879 167.329037 16.055111 83.372593 10.869259 563.049778 61.972119 105.593926 10.749526 528.647556 51.420726 394.115704 48.579215 219.586448 62.068522 48.250647 73.146707 11.331368 16.096378 36.920512 57.047768 13.825845 26.865177 26.842849 9.085423 8.794369 12.0 11142.128636 937.069557 820.833604 328.681481 2.176152 3.370936 25.231608 40.740621 -73.911518 0.866667 0.666667 0.133333 0.000000 0.800000 0.200000 0.200000 0.000000 81.666667 84.333333 38.133333 683.000000 44.333333 231.133333 8.546667 7.746667 7.626667 7.973333 7.706667 6.686667 7.333333 7.800000 7.313333 6.400000 6.986667 7.640000 7.860000 6.940000 7.313333 7.813333 92.21 52936.0 0.066667 0.133333 0.000000 0.066667 0.066667 0.000000 0.000000
24 25 140.000000 423.000000 458.545455 418.090909 1299.636364 0.000000 114.090909 149.818182 85.272727 170.122766 6.560779 25.214998 20.607206 29.085464 20112012.0 55.265051 1106.041010 347.926565 269.653776 234.391705 198.939539 197.658586 16.265354 113.255354 10.398263 38.125826 30.908884 356.448687 32.738788 218.144444 20.712626 403.613333 31.507434 121.173535 14.494808 549.519394 49.310081 556.521414 50.689838 227.016151 60.392776 49.142544 73.835318 16.609758 20.026361 32.543142 53.814782 11.247449 26.167694 28.465829 9.353090 8.033936 12.0 11187.808574 1147.915004 823.250554 304.212121 1.860310 7.667823 21.973743 40.745414 -73.815558 0.636364 0.636364 0.181818 0.000000 0.545455 0.272727 0.363636 0.000000 85.181818 82.000000 37.181818 800.636364 47.363636 290.454545 8.309091 7.681818 7.500000 7.800000 7.463636 6.945455 7.300000 7.772727 7.118182 6.327273 6.936364 7.581818 7.627273 6.990909 7.263636 7.736364 91.90 34371.0 0.000000 0.000000 0.000000 0.090909 0.000000 0.000000 0.000000
25 26 607.800000 445.200000 487.600000 444.800000 1377.600000 0.000000 384.800000 593.000000 361.400000 316.743254 10.672006 26.353720 20.303250 29.495985 20112012.0 43.800000 2991.600000 750.600000 778.000000 751.800000 711.200000 248.600000 7.480000 350.000000 11.960000 112.400000 132.400000 1259.200000 38.140000 676.400000 28.420000 604.800000 19.380000 432.400000 13.420000 1475.000000 49.220000 1516.600000 50.780000 799.476190 76.325238 64.186667 84.060952 28.884762 37.772857 35.311429 46.290952 12.154762 15.976190 15.244762 6.277143 9.000000 12.0 11388.600000 2837.400000 821.000000 337.800000 4.600000 11.800000 21.600000 40.748507 -73.759176 1.000000 1.000000 0.200000 0.200000 1.000000 0.600000 0.600000 0.000000 67.000000 83.400000 23.800000 1930.800000 129.800000 684.200000 7.720000 6.980000 7.220000 7.260000 7.000000 6.580000 6.840000 7.260000 6.760000 6.060000 6.660000 7.380000 7.140000 6.540000 6.900000 7.300000 93.34 31988.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
26 27 145.727273 405.636364 419.818182 390.636364 1216.090909 0.000000 44.545455 60.454545 16.727273 152.182301 5.751941 24.876535 20.841643 27.889288 20112012.0 61.718182 1106.000000 291.090909 276.727273 248.545455 193.181818 114.090909 7.227273 135.909091 12.845455 41.000000 49.454545 240.454545 15.536364 361.545455 40.400000 385.818182 32.554545 98.363636 10.345455 590.909091 52.863636 515.090909 47.136364 294.251825 63.514773 45.475672 66.606675 8.742242 11.789086 36.732063 54.823071 18.053664 33.404543 23.319646 10.610401 8.132524 12.0 11480.814979 1044.729244 791.022912 296.191919 2.438285 11.525638 29.748823 40.638828 -73.807823 0.909091 0.818182 0.363636 0.545455 0.909091 0.090909 0.000000 0.000000 74.090909 81.636364 37.454545 742.272727 55.454545 253.545455 7.645455 7.336364 7.327273 7.590909 6.827273 6.754545 7.063636 7.636364 6.281818 5.909091 6.436364 7.218182 6.881818 6.636364 6.909091 7.463636 89.88 46007.0 0.000000 0.090909 0.000000 0.000000 0.000000 0.000000 0.000000
27 28 157.357143 433.642857 452.428571 423.071429 1309.142857 0.214286 110.714286 164.071429 72.571429 153.625343 5.551129 25.415336 21.096636 28.554577 20112012.0 55.857143 1025.428571 271.714286 260.285714 242.115537 238.500342 81.571429 6.464286 104.857143 9.657143 36.785714 29.714286 337.357143 30.235714 303.714286 39.450000 238.714286 19.864286 132.785714 8.371429 503.142857 46.800000 522.285714 53.200000 314.067801 68.936758 53.568398 73.558366 21.436765 25.814280 32.132798 47.746450 15.370383 26.452127 20.267281 8.807537 8.065395 12.0 11323.280681 1163.360240 806.821719 297.373016 2.331591 8.397431 25.176722 40.709697 -73.805547 0.857143 0.785714 0.071429 0.071429 0.857143 0.285714 0.142857 0.000000 81.903647 81.844225 36.910182 785.589822 48.106839 316.719921 8.137298 7.525327 7.410387 7.753899 7.533830 6.876094 7.074043 7.717280 6.623268 6.026211 6.572808 7.387845 7.433511 6.806505 7.014179 7.629255 91.70 37009.0 0.000000 0.071429 0.071429 0.000000 0.000000 0.071429 0.000000
28 29 49.200000 392.700000 398.000000 382.400000 1173.100000 0.100000 12.000000 18.100000 0.800000 73.998430 2.789649 26.645084 23.522580 29.262070 20112012.0 50.750000 423.300000 134.800000 97.900000 89.630876 86.533493 13.700000 3.270000 57.900000 13.720000 21.200000 16.400000 20.800000 4.920000 355.800000 84.050000 37.900000 8.920000 4.000000 1.010000 221.700000 53.070000 201.600000 46.930000 85.577574 68.779062 51.182506 73.657639 6.652390 9.466046 44.531645 64.199202 17.595383 26.344017 20.313647 9.095807 8.291553 12.0 11276.092954 533.704336 815.250407 299.122222 1.364228 11.156403 27.247411 40.685276 -73.752740 0.800000 0.800000 0.200000 0.200000 0.800000 0.100000 0.000000 0.000000 78.300000 92.500000 43.000000 295.600000 21.200000 142.800000 7.660000 7.310000 7.260000 7.520000 7.280000 7.220000 7.450000 7.830000 5.960000 5.860000 6.340000 7.130000 6.970000 6.790000 7.020000 7.500000 92.14 27232.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
29 30 172.000000 430.333333 465.222222 429.111111 1324.666667 0.111111 114.222222 186.444444 64.000000 167.157553 6.383145 24.426389 19.581769 27.712779 20112012.0 54.211111 1221.555556 327.333333 347.444444 267.333333 247.037214 220.555556 16.133333 125.111111 7.988889 44.931566 49.777525 281.777778 24.866667 124.222222 10.444444 607.222222 43.522222 204.444444 20.788889 604.111111 44.777778 617.444444 55.222222 306.409952 69.621250 59.824352 83.596553 21.146265 27.987934 38.677147 55.606480 9.817782 16.427763 19.245670 9.184438 8.444444 12.0 11103.000000 1123.333333 820.444444 318.333333 2.555556 1.222222 25.111111 40.755398 -73.932306 1.000000 0.777778 0.000000 0.000000 1.000000 0.444444 0.333333 0.000000 89.000000 85.444444 42.333333 963.444444 63.111111 318.777778 8.222222 7.444444 7.433333 7.700000 7.500000 6.655556 6.911111 7.533333 7.033333 6.166667 6.844444 7.511111 7.577778 6.744444 7.077778 7.588889 92.79 39742.0 0.000000 0.111111 0.111111 0.111111 0.000000 0.000000 0.000000
30 31 252.083333 453.833333 462.916667 442.416667 1359.166667 0.000000 158.500000 246.666667 123.666667 198.536039 7.321520 25.297286 20.209668 28.780992 20112012.0 42.209630 1616.287593 473.174676 419.682628 346.442396 321.139155 60.937037 3.659907 261.567407 16.540074 63.365341 91.833144 157.911296 9.510556 280.215741 21.611574 377.895556 25.406815 792.825741 42.945241 827.142778 51.350907 789.144630 48.649019 372.476115 70.470064 55.883344 74.467891 24.279579 29.918014 31.600162 44.543907 14.591908 25.545584 18.834252 7.785472 8.864441 12.0 10376.910795 1668.253613 802.375339 281.685185 4.470190 2.630336 45.372843 40.595680 -74.125726 0.833333 0.833333 0.166667 0.083333 0.666667 0.166667 0.083333 0.083333 88.250000 92.750000 44.083333 1246.500000 76.916667 503.750000 7.958333 7.475000 7.491667 7.733333 7.450000 7.375000 7.616667 8.041667 6.733333 6.150000 6.866667 7.375000 7.391667 7.008333 7.316667 7.716667 90.98 59373.0 0.000000 0.083333 0.000000 0.000000 0.000000 0.083333 0.000000
31 32 52.285714 369.714286 376.000000 361.571429 1107.285714 0.000000 23.428571 29.571429 5.857143 85.597174 3.791643 21.971905 18.397676 25.292561 20112012.0 82.714286 415.285714 113.156587 138.142857 103.329822 114.762133 71.000000 16.628571 63.000000 14.757143 20.285714 27.142857 5.571429 1.542857 89.857143 22.128571 313.142857 74.657143 4.285714 1.085714 210.857143 50.642857 204.428571 49.357143 88.947279 59.155510 30.680000 46.764082 3.501224 5.047347 27.182857 41.711633 28.476531 53.235918 27.725306 11.400408 8.493967 12.0 11159.423539 437.288811 800.893148 325.730159 1.117305 4.397431 34.033865 40.696295 -73.917124 0.857143 0.857143 0.000000 0.000000 0.857143 0.142857 0.000000 0.000000 73.571429 76.285714 35.000000 298.857143 23.000000 135.428571 8.471429 8.057143 7.900000 8.214286 7.214286 6.228571 6.985714 7.557143 6.985714 6.100000 6.900000 7.571429 7.557143 6.785714 7.271429 7.785714 89.28 15297.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
32 75 6.285714 405.000000 417.428571 399.571429 1222.000000 0.857143 0.000000 0.000000 0.000000 117.140546 4.647296 23.595245 19.904134 26.687738 20112012.0 66.515556 710.451111 199.096110 189.191537 155.308756 147.334928 85.244444 13.118889 92.808889 14.080889 34.384091 29.997727 114.935556 9.426667 221.588889 38.638889 276.746667 43.481778 91.908889 7.642889 358.713333 49.510889 351.735556 50.488222 175.547166 62.915625 45.121491 66.876392 10.910809 14.105702 34.210494 52.770587 17.803826 33.140172 24.645990 9.779497 8.457766 12.0 10725.964770 772.021680 816.252033 315.111111 1.821138 6.782016 22.237057 40.735418 -73.928741 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 53.000000 77.857143 27.857143 149.000000 61.285714 102.000000 8.114286 7.685714 7.642857 7.757143 6.771429 6.942857 6.714286 7.271429 6.657143 6.371429 7.328571 7.400000 7.142857 6.985714 7.171429 7.457143 83.21 21435.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
33 79 6.333333 421.333333 394.333333 393.333333 1209.000000 0.333333 0.000000 0.000000 0.000000 117.140546 4.647296 23.595245 19.904134 26.687738 20112012.0 66.515556 710.451111 199.096110 189.191537 155.308756 147.334928 85.244444 13.118889 92.808889 14.080889 34.384091 29.997727 114.935556 9.426667 221.588889 38.638889 276.746667 43.481778 91.908889 7.642889 358.713333 49.510889 351.735556 50.488222 175.547166 62.915625 45.121491 66.876392 10.910809 14.105702 34.210494 52.770587 17.803826 33.140172 24.645990 9.779497 8.457766 12.0 10725.964770 772.021680 816.252033 315.111111 1.821138 6.782016 22.237057 40.775291 -73.900635 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 77.651064 82.819149 36.742553 516.257511 36.495745 213.078891 8.222175 7.654584 7.545416 7.854584 7.173617 6.565319 7.036596 7.541915 6.725751 6.166953 6.719313 7.429828 7.369149 6.791064 7.098511 7.609574 63.81 7288.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

For our first school district choropleth map we will compare SAT scores among districts. Because we may later want to compare a different feature (and creating a map from scratch each time is somewhat time-consuming) we will write a function that can be reused for any column of district_data.

In [5]:
# Takes any column of district_data and returns a district choropleth map with that column 
def show_district_map(col):
    geo_path = 'districts.geojson'
    districts = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
    choropleth = folium.features.Choropleth(
        geo_data=geo_path,
        data=district_data,
        columns=['school_dist', col],
        key_on='feature.properties.school_dist', # found in GeoJSON file
        fill_color='YlGn',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name='Average {}'.format(col),
        highlight=True
    ).add_to(districts)
    dist_markers = district_data.iloc[:32] # to avoid physical markers for Districts 75 and 79
    for name, row in dist_markers.iterrows(): # adds circle marker with hover displaying district
        folium.CircleMarker(
            location=[row['lat'], row['lon']], 
            tooltip='{0}: {1}'.format('District', row['school_dist']),
            radius=2).add_to(districts)
    return districts

show_district_map('sat_score')
Out[5]:

Already we see a noticeable difference in SAT scores from district to district. Districts 9, 12, 16, 18, 19, 23 and 32 have an average SAT score between 1085 and 1136, while districts 22, 26 and 31 average roughly 300 points higher. We will take a closer look into this below, but for now it is worth noting the strength of this map as a communication device. From one glance we can determine the lowest- and highest-scoring districts, where they lie geographically and a rough estimate on the spread between district scores. No other visualization would have provided this much easily digestible information, so it was well worth the time it took to fine-tune the map to our liking.

Finding Relationships in Our Data

With our maps as reference points, we now want to dive a little deeper into the features of our data. We'll begin by taking a look at the shape of our SAT data. Our map shows a range of 1085 to 1389, but we know the range of scores is much greater. When this data was compiled by the City of New York, the minimum score possible on the SAT was 600, while the maximum possible was 2400. As with all standardized tests, after a certain threshold it gets exponentially harder to achieve the higher scores so we are unlikely to see many schools with averages on the higher end. Thus, we can likely expect our distribution to be skewed right to some degree. Let's take a look with a simple histogram.


Outside of maps we'll use a combination of three plotting techniques for our visualizations moving forward, depending on our needs and which method makes the quickest or cleanest plot. We will use the Pandas built-in plotting method for "quick and dirty" plots, Seaborn for clean-looking plots requiring only a little customization, and Matplotlib for anything that needs a significant amount of tweaking. The Pandas method and Seaborn are both based on Matplotlib, but each sacrifices customizability for ease of creation to some degree.



To start, we import Matplotlib and Seaborn and enable our Jupyter notebook to show the plots as output. We can also set the style of our plots so that they all look similar regardless of the plotting technique we use. For our histogram, the Pandas plotting method will suffice.

In [6]:
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

plt.style.use('seaborn') # All plots we make will use this style

full['sat_score'].hist(bins=20)
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x1eef372b5f8>

Just as expected, the majority of the schools had average SAT scores between 1000 and 1400, giving us a skew-right distribution. Having confirmed the distribution of SAT scores in our data, we will dig deeper into potential correlations.

Location of School

We have already noticed a stark difference between certain school districts' average SAT scores from our maps. To get a more detailed view of the disparity among school districts we can use boxplots. Seaborn does an excellent job at making quick and polished boxplots, so we'll use it here.

In [7]:
plt.style.use('seaborn')
f, ax = plt.subplots(figsize=(16, 12))
ax = sns.boxplot(data=full, x='sat_score', y='District', showmeans=True, meanline=True) # Red dotted line indicates the mean
plt.show()

As with our map, we notice right away that the median SAT score (the solid black line inside each box) is higher for districts 22, 26 and 31 than any others. We also see that the lowest scores for these districts are higher than the lowest scores for any of the other districts and that of the three, only District 31 has a high-scoring outlier school. We can say with certainty that a noticable difference in district SAT performance exists. However, we still don't know if that difference is significant.

Before we attempt to answer that question, we will continue exploring correlations in location. Upon a closer look, we see that our three top-scoring districts are all from different boroughs: District 22 is in Brooklyn, District 26 is in Queens and District 31 is the district for all of Staten Island.

Does this mean that we won't see much of a difference in performance by borough? A boxplot by borough can help clear that up.

In [8]:
f, ax = plt.subplots(figsize=(16, 12))
ax = sns.boxplot(data=full, x='sat_score', y='borough', palette='muted', showmeans=True, meanline=True)
plt.show()

Interestingly, even with the three highest-scoring districts being spread accross different boroughs, it appears the borough a school is located in may still be related in some way to its student body's performance on the SAT. This is likely influenced by a number of factors, such as outlier schools or the fact that Staten Island is comprised of only one district (District 31) and 13 schools. For the moment we can say that schools in Staten Island, Queens and Manhattan tend, on average, to have higher-scoring SAT test takers, but to gain a better understanding of the underlying factors we will have to explore the other features of our dataset.

Narrowing Down Numerical Features

To investigate what factors may be driving differences among boroughs and districts we will look for correlations between our sat_score column and the other numerical columns from the full dataset. The default correlation test in Pandas' .corr() method is the Pearson correlation coefficient (or Pearson's r). Pearson's r is a measure of the linear correlation between two data features.

Scores of 1 or -1 indicate perfect positive or negative linear relationships respectively, while a score of 0 indicates no relationship. Generally speaking, r scores of $\pm$ 0.6 are considered strong correlations and anything below $\pm$ 0.3 weak correlations.

Note: As with all statistical tests, Pearson's correlation coefficient is based on assumptions about the data. In particular, it assumes a normal distribution in our data as well as homoscedasticity (homogeneity of variance). Because of this, Pearson's r can be quite sensitive to outliers. Keeping this in mind, we will still run this test (it provides a good starting point for uncovering which features to investigate further) but we will not put too much stock in the accuracy of the r-values themselves.
In [9]:
full.corr()['sat_score'].sort_values()
Out[9]:
frl_percent                            -0.690782
Local - % of grads                     -0.487021
Dropped Out - % of cohort              -0.459963
Local - % of cohort                    -0.420838
sped_percent                           -0.401149
Still Enrolled - % of cohort           -0.373215
ell_percent                            -0.353428
hispanic_per                           -0.347408
black_per                              -0.298421
Regents w/o Advanced - % of grads      -0.216672
is_intl                                -0.175639
ell_num                                -0.129127
lon                                    -0.128337
lat                                    -0.121645
end_time                               -0.104474
male_per                               -0.096487
com_p_11                               -0.084269
Council District                       -0.081392
postcode                               -0.066905
Community Board                        -0.060518
is_consort_intl                        -0.057083
has_online_ap                          -0.050644
start_time                             -0.036989
grade_span_min                         -0.032146
is_allboys                             -0.020760
is_suppressed                          -0.000405
is_consort                              0.005243
is_allgirls                             0.011363
rr_t                                    0.011555
is_CTE                                  0.020591
ctt_num                                 0.026258
selfcontained_num                       0.026747
eng_p_11                                0.027884
aca_p_11                                0.029398
school_dist                             0.029547
black_num                               0.036775
eng_t_11                                0.041858
has_online_lang                         0.043416
hispanic_num                            0.048459
sped_num                                0.055018
Regents w/o Advanced - % of cohort      0.068012
com_tot_11                              0.077030
com_t_11                                0.082347
eng_tot_11                              0.084201
female_per                              0.096534
rr_p                                    0.102142
saf_p_11                                0.103706
number_programs                         0.112059
aca_t_11                                0.121250
YTD % Attendance (Avg)                  0.142879
has_russian                             0.145859
com_s_11                                0.147875
eng_s_11                                0.150950
aca_tot_11                              0.158354
has_french                              0.185955
has_lang                                0.190621
has_spanish                             0.201407
has_ap                                  0.205296
SIZE OF SMALLEST CLASS                  0.205947
YTD Enrollment(Avg)                     0.222860
saf_s_11                                0.245603
saf_tot_11                              0.252684
aca_s_11                                0.261800
rr_s                                    0.263214
saf_t_11                                0.272186
Total Cohort                            0.272337
N_t                                     0.292926
grade9                                  0.303409
grade10                                 0.316743
male_num                                0.333068
has_chinese                             0.346651
SIZE OF LARGEST CLASS                   0.349110
AVERAGE CLASS SIZE                      0.360053
total_enrollment                        0.371481
grade11                                 0.376710
grade12                                 0.388337
female_num                              0.388636
total_students                          0.389229
N_s                                     0.420773
N_p                                     0.426887
NUMBER OF SECTIONS                      0.450922
white_num                               0.454454
Num of SAT Test Takers                  0.469323
asian_num                               0.471302
Total Grads - % of cohort               0.473012
Total Regents - % of grads              0.486904
NUMBER OF STUDENTS / SEATS FILLED       0.491368
asian_per                               0.527458
Number of Exams with scores 3 4 or 5    0.548118
Total Exams Taken                       0.548238
AP Test Takers                          0.558501
is_specialized                          0.570065
Total Regents - % of cohort             0.609374
white_per                               0.630166
Advanced Regents - % of grads           0.722432
Advanced Regents - % of cohort          0.754615
SAT Math Avg. Score                     0.953010
SAT Critical Reading Avg. Score         0.974757
SAT Writing Avg. Score                  0.981016
sat_score                               1.000000
schoolyear                                   NaN
grade_span_max                               NaN
Name: sat_score, dtype: float64

Looking at the top positive and negative results, we see that our AP columns all have an r-value greater than 0.5. However, they only represent the raw number of tests/test takers and not a percentage. Columns showing 1) the percentage of AP test takers per school and 2) the percentage of AP exams at each school with scores of 3 or above will help account for size differences among schools and may tell a different story. We can add these new columns using the code below.

In [10]:
full['pct_AP'] = full['AP Test Takers '] / full['total_enrollment']
full['pct_3_plus'] = (full['Number of Exams with scores 3 4 or 5'] / full['Total Exams Taken']).fillna(0)
In [11]:
full.corr()['sat_score'].sort_values().tail(20)
Out[11]:
Total Grads - % of cohort               0.473012
Total Regents - % of grads              0.486904
NUMBER OF STUDENTS / SEATS FILLED       0.491368
pct_3_plus                              0.519764
asian_per                               0.527458
Number of Exams with scores 3 4 or 5    0.548118
Total Exams Taken                       0.548238
AP Test Takers                          0.558501
is_specialized                          0.570065
pct_AP                                  0.585918
Total Regents - % of cohort             0.609374
white_per                               0.630166
Advanced Regents - % of grads           0.722432
Advanced Regents - % of cohort          0.754615
SAT Math Avg. Score                     0.953010
SAT Critical Reading Avg. Score         0.974757
SAT Writing Avg. Score                  0.981016
sat_score                               1.000000
schoolyear                                   NaN
grade_span_max                               NaN
Name: sat_score, dtype: float64

What if we want to know more than the r-value? A particularly useful Python library for more extensive statistical analysis is called Pingouin. Pingouin has many of the statistical tests found in SciPy and Statsmodels, but also seeks to bring to Python statistical tests that were previously available only in R or Matlab. It is also built to use directly with Pandas, which makes it generally more straightforward to use than SciPy. We will import it using the code below.

In [12]:
import pingouin as pg

Running a correlation test in Pingouin returns several other useful figures besides the r value. In particular, Pingouin returns the p-value (p-unc) and Bayes Factor (BF10) for each pair of features tested. Both of these values are measures of statistical significance. Generally, we reject the null hypothesis (that the two features have no correlation) when the p-value is less than 0.05. For the Bayes Factor, the larger the value, the stronger the evidence of correlation. If you need them, the confidence interval is also returned by default (CI95%), as is Fisher's z (z). Pingouin also makes it effortless to switch between correlation tests (Pearson's r, Spearman's rho, etc.), which we will explore later. For now, we will run the same basic correlation in Pingouin (Pearson method) and explore the DataFrame it outputs.

In [13]:
corr = pg.pairwise_corr(full, columns=['sat_score'])
corr.sort_values(by=['r']).reset_index()
Out[13]:
index X Y method tail n r CI95% r2 adj_r2 z p-unc BF10 power
0 13 sat_score frl_percent pearson two-sided 478 -0.691 [-0.73, -0.64] 0.477 0.475 -0.850 4.902523e-69 3.24e+65 1.000
1 46 sat_score Local - % of grads pearson two-sided 478 -0.487 [-0.55, -0.42] 0.237 0.234 -0.532 7.721751e-30 4.229e+26 1.000
2 48 sat_score Dropped Out - % of cohort pearson two-sided 478 -0.460 [-0.53, -0.39] 0.212 0.208 -0.497 2.123609e-26 1.681e+23 1.000
3 45 sat_score Local - % of cohort pearson two-sided 478 -0.421 [-0.49, -0.34] 0.177 0.174 -0.449 6.118468e-22 6.641e+18 1.000
4 22 sat_score sped_percent pearson two-sided 478 -0.401 [-0.47, -0.32] 0.161 0.157 -0.425 6.608265e-20 6.569e+16 1.000
5 47 sat_score Still Enrolled - % of cohort pearson two-sided 478 -0.373 [-0.45, -0.29] 0.139 0.136 -0.392 3.030167e-17 1.576e+14 1.000
6 20 sat_score ell_percent pearson two-sided 478 -0.353 [-0.43, -0.27] 0.125 0.121 -0.369 1.647484e-15 3.107e+12 1.000
7 30 sat_score hispanic_per pearson two-sided 478 -0.347 [-0.42, -0.27] 0.121 0.117 -0.362 5.262691e-15 9.936e+11 1.000
8 28 sat_score black_per pearson two-sided 478 -0.298 [-0.38, -0.21] 0.089 0.085 -0.307 2.744013e-11 2.285e+08 1.000
9 44 sat_score Regents w/o Advanced - % of grads pearson two-sided 478 -0.217 [-0.3, -0.13] 0.047 0.043 -0.221 1.738548e-06 5099.587 0.998
10 95 sat_score is_intl pearson two-sided 478 -0.176 [-0.26, -0.09] 0.031 0.027 -0.178 1.133784e-04 96.264 0.972
11 19 sat_score ell_num pearson two-sided 478 -0.129 [-0.22, -0.04] 0.017 0.013 -0.130 4.690113e-03 3.082 0.809
12 58 sat_score lon pearson two-sided 478 -0.128 [-0.22, -0.04] 0.016 0.012 -0.129 4.952031e-03 2.935 0.804
13 57 sat_score lat pearson two-sided 478 -0.122 [-0.21, -0.03] 0.015 0.011 -0.123 7.756702e-03 1.962 0.760
14 53 sat_score end_time pearson two-sided 478 -0.104 [-0.19, -0.01] 0.011 0.007 -0.104 2.234739e-02 0.772 0.628
15 34 sat_score male_per pearson two-sided 478 -0.096 [-0.18, -0.01] 0.009 0.005 -0.096 3.495175e-02 0.526 0.560
16 74 sat_score com_p_11 pearson two-sided 478 -0.084 [-0.17, 0.01] 0.007 0.003 -0.084 6.564555e-02 0.31 0.453
17 56 sat_score Council District pearson two-sided 478 -0.081 [-0.17, 0.01] 0.007 0.002 -0.081 7.543887e-02 0.277 0.428
18 50 sat_score postcode pearson two-sided 478 -0.067 [-0.16, 0.02] 0.004 0.000 -0.067 1.441342e-01 0.166 0.309
19 55 sat_score Community Board pearson two-sided 478 -0.061 [-0.15, 0.03] 0.004 -0.001 -0.061 1.865452e-01 0.137 0.262
20 96 sat_score is_consort_intl pearson two-sided 478 -0.057 [-0.15, 0.03] 0.003 -0.001 -0.057 2.128526e-01 0.124 0.238
21 62 sat_score has_online_ap pearson two-sided 478 -0.051 [-0.14, 0.04] 0.003 -0.002 -0.051 2.691357e-01 0.105 0.198
22 52 sat_score start_time pearson two-sided 478 -0.037 [-0.13, 0.05] 0.001 -0.003 -0.037 4.197432e-01 0.079 0.127
23 49 sat_score grade_span_min pearson two-sided 478 -0.032 [-0.12, 0.06] 0.001 -0.003 -0.032 4.832028e-01 0.073 0.108
24 98 sat_score is_allboys pearson two-sided 478 -0.021 [-0.11, 0.07] 0.000 -0.004 -0.021 6.507327e-01 0.063 0.074
25 4 sat_score is_suppressed pearson two-sided 478 -0.000 [-0.09, 0.09] 0.000 -0.004 -0.000 9.929599e-01 0.057 0.050
26 92 sat_score is_consort pearson two-sided 478 0.005 [-0.08, 0.09] 0.000 -0.004 0.005 9.089708e-01 0.058 0.051
27 94 sat_score is_allgirls pearson two-sided 478 0.011 [-0.08, 0.1] 0.000 -0.004 0.011 8.043059e-01 0.059 0.057
28 68 sat_score rr_t pearson two-sided 478 0.012 [-0.08, 0.1] 0.000 -0.004 0.012 8.010521e-01 0.059 0.057
29 93 sat_score is_CTE pearson two-sided 478 0.021 [-0.07, 0.11] 0.000 -0.004 0.021 6.533932e-01 0.063 0.073
30 23 sat_score ctt_num pearson two-sided 478 0.026 [-0.06, 0.12] 0.001 -0.004 0.026 5.668687e-01 0.067 0.088
31 24 sat_score selfcontained_num pearson two-sided 478 0.027 [-0.06, 0.12] 0.001 -0.003 0.027 5.596542e-01 0.068 0.090
32 75 sat_score eng_p_11 pearson two-sided 478 0.028 [-0.06, 0.12] 0.001 -0.003 0.028 5.430815e-01 0.069 0.093
33 76 sat_score aca_p_11 pearson two-sided 478 0.029 [-0.06, 0.12] 0.001 -0.003 0.029 5.214043e-01 0.07 0.098
34 89 sat_score school_dist pearson two-sided 478 0.030 [-0.06, 0.12] 0.001 -0.003 0.030 5.192934e-01 0.07 0.099
35 27 sat_score black_num pearson two-sided 478 0.037 [-0.05, 0.13] 0.001 -0.003 0.037 4.224462e-01 0.079 0.126
36 79 sat_score eng_t_11 pearson two-sided 478 0.042 [-0.05, 0.13] 0.002 -0.002 0.042 3.611648e-01 0.087 0.150
37 61 sat_score has_online_lang pearson two-sided 478 0.043 [-0.05, 0.13] 0.002 -0.002 0.043 3.435536e-01 0.09 0.157
38 29 sat_score hispanic_num pearson two-sided 478 0.048 [-0.04, 0.14] 0.002 -0.002 0.048 2.903709e-01 0.1 0.185
39 21 sat_score sped_num pearson two-sided 478 0.055 [-0.03, 0.14] 0.003 -0.001 0.055 2.298976e-01 0.118 0.225
40 43 sat_score Regents w/o Advanced - % of cohort pearson two-sided 478 0.068 [-0.02, 0.16] 0.005 0.000 0.068 1.375980e-01 0.172 0.318
41 86 sat_score com_tot_11 pearson two-sided 478 0.077 [-0.01, 0.17] 0.006 0.002 0.077 9.252535e-02 0.235 0.391
42 78 sat_score com_t_11 pearson two-sided 478 0.082 [-0.01, 0.17] 0.007 0.003 0.082 7.206481e-02 0.287 0.437
43 87 sat_score eng_tot_11 pearson two-sided 478 0.084 [-0.01, 0.17] 0.007 0.003 0.084 6.586354e-02 0.309 0.453
44 36 sat_score female_per pearson two-sided 478 0.097 [0.01, 0.18] 0.009 0.005 0.097 3.486299e-02 0.527 0.561
45 69 sat_score rr_p pearson two-sided 478 0.102 [0.01, 0.19] 0.010 0.006 0.102 2.553892e-02 0.688 0.609
46 73 sat_score saf_p_11 pearson two-sided 478 0.104 [0.01, 0.19] 0.011 0.007 0.104 2.335779e-02 0.743 0.622
47 54 sat_score number_programs pearson two-sided 478 0.112 [0.02, 0.2] 0.013 0.008 0.112 1.423512e-02 1.145 0.690
48 80 sat_score aca_t_11 pearson two-sided 478 0.121 [0.03, 0.21] 0.015 0.011 0.122 7.959570e-03 1.917 0.757
49 90 sat_score YTD % Attendance (Avg) pearson two-sided 478 0.143 [0.05, 0.23] 0.020 0.016 0.144 1.737939e-03 7.606 0.881
50 66 sat_score has_russian pearson two-sided 478 0.146 [0.06, 0.23] 0.021 0.017 0.147 1.385193e-03 9.367 0.893
51 82 sat_score com_s_11 pearson two-sided 478 0.148 [0.06, 0.23] 0.022 0.018 0.149 1.185296e-03 10.812 0.901
52 83 sat_score eng_s_11 pearson two-sided 478 0.151 [0.06, 0.24] 0.023 0.019 0.152 9.309770e-04 13.51 0.913
53 88 sat_score aca_tot_11 pearson two-sided 478 0.158 [0.07, 0.24] 0.025 0.021 0.159 5.108240e-04 23.559 0.936
54 64 sat_score has_french pearson two-sided 478 0.186 [0.1, 0.27] 0.035 0.031 0.188 4.301269e-05 240.125 0.984
55 59 sat_score has_lang pearson two-sided 478 0.191 [0.1, 0.28] 0.036 0.032 0.193 2.726261e-05 369.769 0.988
56 63 sat_score has_spanish pearson two-sided 478 0.201 [0.11, 0.29] 0.041 0.037 0.204 9.103959e-06 1048.519 0.994
57 60 sat_score has_ap pearson two-sided 478 0.205 [0.12, 0.29] 0.042 0.038 0.208 6.040392e-06 1550.291 0.995
58 11 sat_score SIZE OF SMALLEST CLASS pearson two-sided 478 0.206 [0.12, 0.29] 0.042 0.038 0.209 5.635480e-06 1656.399 0.995
59 91 sat_score YTD Enrollment(Avg) pearson two-sided 478 0.223 [0.14, 0.31] 0.050 0.046 0.227 8.579268e-07 1.004e+04 0.999
60 81 sat_score saf_s_11 pearson two-sided 478 0.246 [0.16, 0.33] 0.060 0.056 0.251 5.353924e-08 1.453e+05 1.000
61 85 sat_score saf_tot_11 pearson two-sided 478 0.253 [0.17, 0.33] 0.064 0.060 0.259 2.129055e-08 3.544e+05 1.000
62 84 sat_score aca_s_11 pearson two-sided 478 0.262 [0.18, 0.34] 0.069 0.065 0.268 6.230939e-09 1.165e+06 1.000
63 67 sat_score rr_s pearson two-sided 478 0.263 [0.18, 0.34] 0.069 0.065 0.269 5.127383e-09 1.408e+06 1.000
64 77 sat_score saf_t_11 pearson two-sided 478 0.272 [0.19, 0.35] 0.074 0.070 0.279 1.449571e-09 4.798e+06 1.000
65 37 sat_score Total Cohort pearson two-sided 478 0.272 [0.19, 0.35] 0.074 0.070 0.279 1.418563e-09 4.9e+06 1.000
66 71 sat_score N_t pearson two-sided 478 0.293 [0.21, 0.37] 0.086 0.082 0.302 6.518324e-11 9.825e+07 1.000
67 15 sat_score grade9 pearson two-sided 478 0.303 [0.22, 0.38] 0.092 0.088 0.313 1.230973e-11 4.996e+08 1.000
68 16 sat_score grade10 pearson two-sided 478 0.317 [0.23, 0.4] 0.100 0.097 0.328 1.338570e-12 4.369e+09 1.000
69 33 sat_score male_num pearson two-sided 478 0.333 [0.25, 0.41] 0.111 0.107 0.346 7.579638e-14 7.265e+10 1.000
70 65 sat_score has_chinese pearson two-sided 478 0.347 [0.27, 0.42] 0.120 0.116 0.362 6.079998e-15 8.624e+11 1.000
71 12 sat_score SIZE OF LARGEST CLASS pearson two-sided 478 0.349 [0.27, 0.43] 0.122 0.118 0.364 3.799283e-15 1.368e+12 1.000
72 10 sat_score AVERAGE CLASS SIZE pearson two-sided 478 0.360 [0.28, 0.44] 0.130 0.126 0.377 4.458527e-16 1.121e+13 1.000
73 14 sat_score total_enrollment pearson two-sided 478 0.371 [0.29, 0.45] 0.138 0.134 0.390 4.349482e-17 1.105e+14 1.000
74 17 sat_score grade11 pearson two-sided 478 0.377 [0.3, 0.45] 0.142 0.138 0.397 1.453185e-17 3.247e+14 1.000
75 18 sat_score grade12 pearson two-sided 478 0.388 [0.31, 0.46] 0.151 0.147 0.409 1.181484e-18 3.837e+15 1.000
76 35 sat_score female_num pearson two-sided 478 0.389 [0.31, 0.46] 0.151 0.147 0.411 1.106221e-18 4.094e+15 1.000
77 51 sat_score total_students pearson two-sided 478 0.389 [0.31, 0.46] 0.151 0.148 0.411 9.705319e-19 4.657e+15 1.000
78 70 sat_score N_s pearson two-sided 478 0.421 [0.34, 0.49] 0.177 0.174 0.449 6.218289e-22 6.536e+18 1.000
79 72 sat_score N_p pearson two-sided 478 0.427 [0.35, 0.5] 0.182 0.179 0.456 1.363089e-22 2.921e+19 1.000
80 9 sat_score NUMBER OF SECTIONS pearson two-sided 478 0.451 [0.38, 0.52] 0.203 0.200 0.486 2.567645e-25 1.432e+22 1.000
81 31 sat_score white_num pearson two-sided 478 0.454 [0.38, 0.52] 0.207 0.203 0.490 9.782708e-26 3.716e+22 1.000
82 0 sat_score Num of SAT Test Takers pearson two-sided 478 0.469 [0.4, 0.54] 0.220 0.217 0.509 1.485238e-27 2.33e+24 1.000
83 25 sat_score asian_num pearson two-sided 478 0.471 [0.4, 0.54] 0.222 0.219 0.511 8.373964e-28 4.107e+24 1.000
84 38 sat_score Total Grads - % of cohort pearson two-sided 478 0.473 [0.4, 0.54] 0.224 0.220 0.514 5.089191e-28 6.719e+24 1.000
85 40 sat_score Total Regents - % of grads pearson two-sided 478 0.487 [0.42, 0.55] 0.237 0.234 0.532 8.002970e-30 4.082e+26 1.000
86 8 sat_score NUMBER OF STUDENTS / SEATS FILLED pearson two-sided 478 0.491 [0.42, 0.56] 0.241 0.238 0.537 2.022669e-30 1.592e+27 1.000
87 100 sat_score pct_3_plus pearson two-sided 478 0.520 [0.45, 0.58] 0.270 0.267 0.576 1.965914e-34 1.491e+31 1.000
88 26 sat_score asian_per pearson two-sided 478 0.527 [0.46, 0.59] 0.278 0.275 0.586 1.379931e-35 2.071e+32 1.000
89 7 sat_score Number of Exams with scores 3 4 or 5 pearson two-sided 478 0.548 [0.48, 0.61] 0.300 0.297 0.616 7.784329e-39 3.427e+35 1.000
90 6 sat_score Total Exams Taken pearson two-sided 478 0.548 [0.48, 0.61] 0.301 0.298 0.616 7.442607e-39 3.583e+35 1.000
91 5 sat_score AP Test Takers pearson two-sided 478 0.559 [0.49, 0.62] 0.312 0.309 0.631 1.483977e-40 1.736e+37 1.000
92 97 sat_score is_specialized pearson two-sided 478 0.570 [0.51, 0.63] 0.325 0.322 0.648 1.525280e-42 1.624e+39 1.000
93 99 sat_score pct_AP pearson two-sided 478 0.586 [0.52, 0.64] 0.343 0.341 0.672 2.122086e-45 1.105e+42 1.000
94 39 sat_score Total Regents - % of cohort pearson two-sided 478 0.609 [0.55, 0.66] 0.371 0.369 0.707 6.308734e-50 3.425e+46 1.000
95 32 sat_score white_per pearson two-sided 478 0.630 [0.57, 0.68] 0.397 0.395 0.741 2.876059e-54 6.971e+50 1.000
96 42 sat_score Advanced Regents - % of grads pearson two-sided 478 0.722 [0.68, 0.76] 0.522 0.520 0.912 2.673511e-78 5.2e+74 1.000
97 41 sat_score Advanced Regents - % of cohort pearson two-sided 478 0.755 [0.71, 0.79] 0.569 0.568 0.984 3.835073e-89 3.128e+85 1.000
98 2 sat_score SAT Math Avg. Score pearson two-sided 478 0.953 [0.94, 0.96] 0.908 0.908 1.863 5.125499e-249 nan 1.000
99 1 sat_score SAT Critical Reading Avg. Score pearson two-sided 478 0.975 [0.97, 0.98] 0.950 0.950 2.185 4.111075e-312 nan 1.000
100 3 sat_score SAT Writing Avg. Score pearson two-sided 478 0.981 [0.98, 0.98] 0.962 0.962 2.323 0.000000e+00 nan 1.000

With so many different features, it can be difficult to select which ones to analyze further. However, with our new table we see in the negative correlations that the Bayes Factor drops from around 96 to 3 after the twelfth feature (ell_num), and this as good a start point as any. Below we will separate the features with the 12 strongest positive and negative r scores into their own DataFrames and combine them into a DataFrame named top_corrs.

In [14]:
pos_corr = corr.sort_values(by=['r']).iloc[86:98].reset_index()
pos_corr
Out[14]:
index X Y method tail n r CI95% r2 adj_r2 z p-unc BF10 power
0 8 sat_score NUMBER OF STUDENTS / SEATS FILLED pearson two-sided 478 0.491 [0.42, 0.56] 0.241 0.238 0.537 2.022669e-30 1.592e+27 1.0
1 100 sat_score pct_3_plus pearson two-sided 478 0.520 [0.45, 0.58] 0.270 0.267 0.576 1.965914e-34 1.491e+31 1.0
2 26 sat_score asian_per pearson two-sided 478 0.527 [0.46, 0.59] 0.278 0.275 0.586 1.379931e-35 2.071e+32 1.0
3 7 sat_score Number of Exams with scores 3 4 or 5 pearson two-sided 478 0.548 [0.48, 0.61] 0.300 0.297 0.616 7.784329e-39 3.427e+35 1.0
4 6 sat_score Total Exams Taken pearson two-sided 478 0.548 [0.48, 0.61] 0.301 0.298 0.616 7.442607e-39 3.583e+35 1.0
5 5 sat_score AP Test Takers pearson two-sided 478 0.559 [0.49, 0.62] 0.312 0.309 0.631 1.483977e-40 1.736e+37 1.0
6 97 sat_score is_specialized pearson two-sided 478 0.570 [0.51, 0.63] 0.325 0.322 0.648 1.525280e-42 1.624e+39 1.0
7 99 sat_score pct_AP pearson two-sided 478 0.586 [0.52, 0.64] 0.343 0.341 0.672 2.122086e-45 1.105e+42 1.0
8 39 sat_score Total Regents - % of cohort pearson two-sided 478 0.609 [0.55, 0.66] 0.371 0.369 0.707 6.308734e-50 3.425e+46 1.0
9 32 sat_score white_per pearson two-sided 478 0.630 [0.57, 0.68] 0.397 0.395 0.741 2.876059e-54 6.971e+50 1.0
10 42 sat_score Advanced Regents - % of grads pearson two-sided 478 0.722 [0.68, 0.76] 0.522 0.520 0.912 2.673511e-78 5.2e+74 1.0
11 41 sat_score Advanced Regents - % of cohort pearson two-sided 478 0.755 [0.71, 0.79] 0.569 0.568 0.984 3.835073e-89 3.128e+85 1.0
In [15]:
neg_corr = corr.sort_values(by=['r']).head(12).reset_index()
neg_corr
Out[15]:
index X Y method tail n r CI95% r2 adj_r2 z p-unc BF10 power
0 13 sat_score frl_percent pearson two-sided 478 -0.691 [-0.73, -0.64] 0.477 0.475 -0.850 4.902523e-69 3.24e+65 1.000
1 46 sat_score Local - % of grads pearson two-sided 478 -0.487 [-0.55, -0.42] 0.237 0.234 -0.532 7.721751e-30 4.229e+26 1.000
2 48 sat_score Dropped Out - % of cohort pearson two-sided 478 -0.460 [-0.53, -0.39] 0.212 0.208 -0.497 2.123609e-26 1.681e+23 1.000
3 45 sat_score Local - % of cohort pearson two-sided 478 -0.421 [-0.49, -0.34] 0.177 0.174 -0.449 6.118468e-22 6.641e+18 1.000
4 22 sat_score sped_percent pearson two-sided 478 -0.401 [-0.47, -0.32] 0.161 0.157 -0.425 6.608265e-20 6.569e+16 1.000
5 47 sat_score Still Enrolled - % of cohort pearson two-sided 478 -0.373 [-0.45, -0.29] 0.139 0.136 -0.392 3.030167e-17 1.576e+14 1.000
6 20 sat_score ell_percent pearson two-sided 478 -0.353 [-0.43, -0.27] 0.125 0.121 -0.369 1.647484e-15 3.107e+12 1.000
7 30 sat_score hispanic_per pearson two-sided 478 -0.347 [-0.42, -0.27] 0.121 0.117 -0.362 5.262691e-15 9.936e+11 1.000
8 28 sat_score black_per pearson two-sided 478 -0.298 [-0.38, -0.21] 0.089 0.085 -0.307 2.744013e-11 2.285e+08 1.000
9 44 sat_score Regents w/o Advanced - % of grads pearson two-sided 478 -0.217 [-0.3, -0.13] 0.047 0.043 -0.221 1.738548e-06 5099.587 0.998
10 95 sat_score is_intl pearson two-sided 478 -0.176 [-0.26, -0.09] 0.031 0.027 -0.178 1.133784e-04 96.264 0.972
11 19 sat_score ell_num pearson two-sided 478 -0.129 [-0.22, -0.04] 0.017 0.013 -0.130 4.690113e-03 3.082 0.809
In [16]:
top_corrs = pd.concat([neg_corr, pos_corr]).reset_index().drop(columns=['level_0', 'index'])
top_corrs
Out[16]:
X Y method tail n r CI95% r2 adj_r2 z p-unc BF10 power
0 sat_score frl_percent pearson two-sided 478 -0.691 [-0.73, -0.64] 0.477 0.475 -0.850 4.902523e-69 3.24e+65 1.000
1 sat_score Local - % of grads pearson two-sided 478 -0.487 [-0.55, -0.42] 0.237 0.234 -0.532 7.721751e-30 4.229e+26 1.000
2 sat_score Dropped Out - % of cohort pearson two-sided 478 -0.460 [-0.53, -0.39] 0.212 0.208 -0.497 2.123609e-26 1.681e+23 1.000
3 sat_score Local - % of cohort pearson two-sided 478 -0.421 [-0.49, -0.34] 0.177 0.174 -0.449 6.118468e-22 6.641e+18 1.000
4 sat_score sped_percent pearson two-sided 478 -0.401 [-0.47, -0.32] 0.161 0.157 -0.425 6.608265e-20 6.569e+16 1.000
5 sat_score Still Enrolled - % of cohort pearson two-sided 478 -0.373 [-0.45, -0.29] 0.139 0.136 -0.392 3.030167e-17 1.576e+14 1.000
6 sat_score ell_percent pearson two-sided 478 -0.353 [-0.43, -0.27] 0.125 0.121 -0.369 1.647484e-15 3.107e+12 1.000
7 sat_score hispanic_per pearson two-sided 478 -0.347 [-0.42, -0.27] 0.121 0.117 -0.362 5.262691e-15 9.936e+11 1.000
8 sat_score black_per pearson two-sided 478 -0.298 [-0.38, -0.21] 0.089 0.085 -0.307 2.744013e-11 2.285e+08 1.000
9 sat_score Regents w/o Advanced - % of grads pearson two-sided 478 -0.217 [-0.3, -0.13] 0.047 0.043 -0.221 1.738548e-06 5099.587 0.998
10 sat_score is_intl pearson two-sided 478 -0.176 [-0.26, -0.09] 0.031 0.027 -0.178 1.133784e-04 96.264 0.972
11 sat_score ell_num pearson two-sided 478 -0.129 [-0.22, -0.04] 0.017 0.013 -0.130 4.690113e-03 3.082 0.809
12 sat_score NUMBER OF STUDENTS / SEATS FILLED pearson two-sided 478 0.491 [0.42, 0.56] 0.241 0.238 0.537 2.022669e-30 1.592e+27 1.000
13 sat_score pct_3_plus pearson two-sided 478 0.520 [0.45, 0.58] 0.270 0.267 0.576 1.965914e-34 1.491e+31 1.000
14 sat_score asian_per pearson two-sided 478 0.527 [0.46, 0.59] 0.278 0.275 0.586 1.379931e-35 2.071e+32 1.000
15 sat_score Number of Exams with scores 3 4 or 5 pearson two-sided 478 0.548 [0.48, 0.61] 0.300 0.297 0.616 7.784329e-39 3.427e+35 1.000
16 sat_score Total Exams Taken pearson two-sided 478 0.548 [0.48, 0.61] 0.301 0.298 0.616 7.442607e-39 3.583e+35 1.000
17 sat_score AP Test Takers pearson two-sided 478 0.559 [0.49, 0.62] 0.312 0.309 0.631 1.483977e-40 1.736e+37 1.000
18 sat_score is_specialized pearson two-sided 478 0.570 [0.51, 0.63] 0.325 0.322 0.648 1.525280e-42 1.624e+39 1.000
19 sat_score pct_AP pearson two-sided 478 0.586 [0.52, 0.64] 0.343 0.341 0.672 2.122086e-45 1.105e+42 1.000
20 sat_score Total Regents - % of cohort pearson two-sided 478 0.609 [0.55, 0.66] 0.371 0.369 0.707 6.308734e-50 3.425e+46 1.000
21 sat_score white_per pearson two-sided 478 0.630 [0.57, 0.68] 0.397 0.395 0.741 2.876059e-54 6.971e+50 1.000
22 sat_score Advanced Regents - % of grads pearson two-sided 478 0.722 [0.68, 0.76] 0.522 0.520 0.912 2.673511e-78 5.2e+74 1.000
23 sat_score Advanced Regents - % of cohort pearson two-sided 478 0.755 [0.71, 0.79] 0.569 0.568 0.984 3.835073e-89 3.128e+85 1.000

To showcase the relative r-value "strengths" of these correlations, we can use a bar chart with a diverging color scheme. This can be done in Seaborn, but to harness the maximum effect of the coolwarm colormap, we will manually tweak the colors in Matplotlib.

In [17]:
plt.style.use('seaborn')

xvals = range(len(top_corrs))
my_cmap = plt.cm.get_cmap('coolwarm')
colors = my_cmap([0 + (x * 0.045) for x in range(24)]) # fully saturates colors at opposing ends of our data
f, ax = plt.subplots(figsize=(12, 10))
tops = plt.bar(x=xvals, height=top_corrs['r'], color=colors)
ax.set_xticks(xvals)
ax.set_xticklabels(top_corrs['Y'], rotation=90)
ax.set_xlabel('Feature')
ax.set_ylabel('Pearson Correlation Coefficient')
plt.show()

Our bar chart excels at showing the the r-values, but what do they really mean? Scatter plots are often used to vizualize the relationship indicated by Pearson's r. The stronger the (linear) correlation, the more we should see an upward or downward linear trend in the data points. This will also help us identify the extent to which outliers might be affecting our r-values.

Seaborn has a handy plot (regplot) that combines a scatter plot with a regression line of best fit, which will be perfect this job. To avoid individually making 24 graphs from scratch, we will write a function to display a grid of 12 plots at once. So we can use it later with any data/correlation tests we will include several keyword arguments and docstrings.

In [18]:
def graph_corrs(data, corrs, xcols='Y', ycol='sat_score', vals='r', coef='r', orientation=None, outliers=None, nrows=4, ncols=3):
    '''
    Plots a grid of Seaborn regplots with correlation labels.
    
    Parameters
    ----------
    data : Pandas DataFrame
        DataFrame containing all data
    corrs : Pandas DataFrame
        DataFrame containing correlations we want to graph.
    xcols : str
        `corrs` column with `data` feature names to be plotted
    ycol : str
        `data` column name for y axis feature
    vals : str
        `corrs` column with correlation values
    coef : str
        Symbol for correlation, e.g. 'r', 'rho', "r'", 'pi'
    orientation : {'none', 'negative', 'positive'}:
        Orientation of correlation (for color of regression line)
    outliers : list
        List of `data` column names that indicate outliers.  Must be in same order as `xcols`.
    nrows : int
        Number of rows for subplots
    ncols : int
        Number of columns for subplots
        
    '''
    # Line Color
    if orientation == 'negative':
        color = 'tab:red'
    elif orientation == 'positive':
        color = 'tab:green'
    else:
        color = 'tab:cyan'

    
    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(16, 16), sharey=True)
    
    for x in range(len(corrs)):
        # Scatter/outlier colors
        if coef == 'pi':
            outlier_colors = ['r' if point == True else 'b' for point in full[outliers[x]]]
            scatter = dict(color=outlier_colors)
        else:
            scatter=None
            
        g = sns.regplot(data=data,
                        y=ycol,
                        x=(corrs[xcols][x]),
                        color='b',
                        marker='+',
                        ax=fig.get_axes()[x],
                        label='{} = {}'.format(coef, corrs[vals][x].round(3)),
                        line_kws=dict(color=color, label='Regression Line'),
                        scatter_kws=scatter
                       )
        g.legend(loc='best')
    plt.tight_layout()
    plt.ylim(800, 2200)

Because of our docstrings, we can now remind ourselves which kewords we need by typing the following in our Jupyter Notebook:

In [19]:
?graph_corrs

Exploring Negative Correlations

If we take a look first at our neg_corrs plots we see that some of our features have clear downward trends, while others are blatantly influenced by outliers. One feature with a relatively clear downward linear trend is frl_percent, the percentage of students receiving free or reduced-price lunch. We will take a closer look at this later, as it is one of our only socioecomic indicators and appears to have a strong correlation to SAT score.

In [20]:
graph_corrs(full, neg_corr, orientation='negative')
In [21]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'frl_percent']][full['frl_percent'] > 85]
Out[21]:
SCHOOL NAME sat_score borough frl_percent
0 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 1122.0 Manhattan 88.6
24 EMMA LAZARUS HIGH SCHOOL 1188.0 Manhattan 97.0
38 MANHATTAN ACADEMY FOR ARTS & LANGUAGE 1208.0 Manhattan 97.5
39 LEGACY SCHOOL FOR INTEGRATED STUDIES 1062.0 Manhattan 93.9
64 MANHATTAN COMPREHENSIVE NIGHT AND DAY HIGH SCHOOL 1205.0 Manhattan 85.6
110 WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL 1174.0 Manhattan 87.5
111 HIGH SCHOOL FOR EXCELLENCE AND INNOVATION 1208.0 Manhattan 99.2
113 HIGH SCHOOL FOR INTERNATIONAL BUSINESS AND FIN... 1127.0 Manhattan 89.3
116 HIGH SCHOOL FOR HEALTH CAREERS AND SCIENCES 1224.0 Manhattan 90.7
118 GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M... 1014.0 Manhattan 92.8
126 UNIVERSITY HEIGHTS SECONDARY SCHOOL 1201.0 Bronx 90.1
128 FOREIGN LANGUAGE ACADEMY OF GLOBAL STUDIES 1186.0 Bronx 89.9
130 NEW EXPLORERS HIGH SCHOOL 1084.0 Bronx 90.4
136 BRONX STUDIO SCHOOL FOR WRITERS AND ARTISTS 1208.0 Bronx 85.6
143 ARCHIMEDES ACADEMY FOR MATH, SCIENCE AND TECHN... 1208.0 Bronx 87.3
148 BRONX BRIDGES HIGH SCHOOL 1208.0 Bronx 87.2
156 JANE ADDAMS HIGH SCHOOL FOR ACADEMIC CAREERS 1112.0 Bronx 85.7
165 LEADERSHIP INSTITUTE 1081.0 Bronx 94.8
169 ACADEMY FOR LANGUAGE AND TECHNOLOGY 951.0 Bronx 85.7
183 WEST BRONX ACADEMY FOR THE FUTURE 1158.0 Bronx 86.3
184 KINGSBRIDGE INTERNATIONAL HIGH SCHOOL 962.0 Bronx 95.1
190 ENGLISH LANGUAGE LEARNERS AND INTERNATIONAL SU... 1029.0 Bronx 92.8
191 HIGH SCHOOL FOR TEACHING AND THE PROFESSIONS 1106.0 Bronx 88.1
226 NEW DAY ACADEMY 1046.0 Bronx 87.0
227 METROPOLITAN HIGH SCHOOL, THE 1055.0 Bronx 86.6
231 EAST BRONX ACADEMY FOR THE FUTURE 1102.0 Bronx 86.2
233 PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE 970.0 Bronx 100.0
239 HIGH SCHOOL OF WORLD CULTURES 939.0 Bronx 90.6
243 MONROE ACADEMY FOR VISUAL ARTS & DESIGN 1038.0 Bronx 87.3
275 FRANCES PERKINS ACADEMY 1122.0 Brooklyn 86.9
315 ACADEMY FOR HEALTH CAREERS 1208.0 Brooklyn 87.7
461 ACADEMY FOR ENVIRONMENTAL LEADERSHIP 1098.0 Brooklyn 88.7
462 EBC HIGH SCHOOL FOR PUBLIC SERVICE–BUSHWICK 1154.0 Brooklyn 86.9
466 BUSHWICK LEADERS HIGH SCHOOL FOR ACADEMIC EXCE... 1055.0 Brooklyn 88.0

Even those without a clear linear trend can reveal interesting information, however. Both ell_percent and is_intl indicate that native english speakers have a distinct advantage on the SAT: all "International" schools and schools with over 40 percent "English language learners" had mean SAT scores under 1250. Some were even among the lowest in the city, with mean scores in the 900s. Of these 28 schools, there are 10 in Manhattan, 10 in the Bronx, 4 in Queens and 4 in Brooklyn. None are in Staten Island.

In [22]:
full[['SCHOOL NAME', 'sat_score', 'borough']][full['ell_percent'] > 40]
Out[22]:
SCHOOL NAME sat_score borough
5 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 1205.0 Manhattan
24 EMMA LAZARUS HIGH SCHOOL 1188.0 Manhattan
38 MANHATTAN ACADEMY FOR ARTS & LANGUAGE 1208.0 Manhattan
41 INTERNATIONAL HIGH SCHOOL AT UNION SQUARE 1208.0 Manhattan
45 MANHATTAN INTERNATIONAL HIGH SCHOOL 1227.0 Manhattan
55 MANHATTAN BRIDGES HIGH SCHOOL 1058.0 Manhattan
59 LIBERTY HIGH SCHOOL ACADEMY FOR NEWCOMERS 1156.0 Manhattan
64 MANHATTAN COMPREHENSIVE NIGHT AND DAY HIGH SCHOOL 1205.0 Manhattan
113 HIGH SCHOOL FOR INTERNATIONAL BUSINESS AND FIN... 1127.0 Manhattan
118 GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M... 1014.0 Manhattan
121 INTERNATIONAL COMMUNITY HIGH SCHOOL 945.0 Bronx
148 BRONX BRIDGES HIGH SCHOOL 1208.0 Bronx
169 ACADEMY FOR LANGUAGE AND TECHNOLOGY 951.0 Bronx
170 BRONX INTERNATIONAL HIGH SCHOOL 965.0 Bronx
184 KINGSBRIDGE INTERNATIONAL HIGH SCHOOL 962.0 Bronx
187 INTERNATIONAL SCHOOL FOR LIBERAL ARTS 934.0 Bronx
190 ENGLISH LANGUAGE LEARNERS AND INTERNATIONAL SU... 1029.0 Bronx
220 NEW WORLD HIGH SCHOOL 1048.0 Bronx
233 PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE 970.0 Bronx
239 HIGH SCHOOL OF WORLD CULTURES 939.0 Bronx
250 BROOKLYN INTERNATIONAL HIGH SCHOOL 981.0 Brooklyn
300 INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS 913.0 Brooklyn
337 MULTICULTURAL HIGH SCHOOL 887.0 Brooklyn
350 INTERNATIONAL HIGH SCHOOL AT LAFAYETTE 1026.0 Brooklyn
377 PAN AMERICAN INTERNATIONAL HIGH SCHOOL 951.0 Queens
383 INTERNATIONAL HIGH SCHOOL AT LAGUARDIA COMMUNI... 1064.0 Queens
390 FLUSHING INTERNATIONAL HIGH SCHOOL 1049.0 Queens
446 NEWCOMERS HIGH SCHOOL 1127.0 Queens
In [23]:
full[['SCHOOL NAME', 'sat_score', 'borough']][full['is_intl'] == 1]
Out[23]:
SCHOOL NAME sat_score borough
41 INTERNATIONAL HIGH SCHOOL AT UNION SQUARE 1208.0 Manhattan
121 INTERNATIONAL COMMUNITY HIGH SCHOOL 945.0 Bronx
170 BRONX INTERNATIONAL HIGH SCHOOL 965.0 Bronx
184 KINGSBRIDGE INTERNATIONAL HIGH SCHOOL 962.0 Bronx
233 PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE 970.0 Bronx
300 INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS 913.0 Brooklyn
350 INTERNATIONAL HIGH SCHOOL AT LAFAYETTE 1026.0 Brooklyn
377 PAN AMERICAN INTERNATIONAL HIGH SCHOOL 951.0 Queens
390 FLUSHING INTERNATIONAL HIGH SCHOOL 1049.0 Queens
446 NEWCOMERS HIGH SCHOOL 1127.0 Queens

For hispanic_per and black_per, we see a slight downward linear trend, but there are a number of outlier schools with less than 20 percent Black or Hispanic students that may make the negative correlation appear stronger than it is. However, it is worth noting that if we look at the schools that have over 80 percent Black or Hispanic students we find some interesting insights:

  • None of these schools are in Staten Island.
  • Schools with a high percentage of Hispanic students are primarily in Manhattan or the Bronx. Half of these schools are International.
  • Schools with a high percentage of Black students are overwhelmingly in Brooklyn (50 of 57), with the remainder located in Queens.
  • Primarily Black schools have slightly higher SAT scores on average (1147) than primarily Hispanic schools (1085).
In [24]:
full[['SCHOOL NAME', 'sat_score', 'borough']][full['hispanic_per'] > 80]
Out[24]:
SCHOOL NAME sat_score borough
38 MANHATTAN ACADEMY FOR ARTS & LANGUAGE 1208.0 Manhattan
55 MANHATTAN BRIDGES HIGH SCHOOL 1058.0 Manhattan
108 CITY COLLEGE ACADEMY OF THE ARTS 1270.0 Manhattan
109 COMMUNITY HEALTH ACADEMY OF THE HEIGHTS 1105.0 Manhattan
110 WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL 1174.0 Manhattan
113 HIGH SCHOOL FOR INTERNATIONAL BUSINESS AND FIN... 1127.0 Manhattan
114 HIGH SCHOOL FOR MEDIA AND COMMUNICATIONS 1098.0 Manhattan
115 HIGH SCHOOL FOR LAW AND PUBLIC SERVICE 1102.0 Manhattan
116 HIGH SCHOOL FOR HEALTH CAREERS AND SCIENCES 1224.0 Manhattan
118 GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M... 1014.0 Manhattan
148 BRONX BRIDGES HIGH SCHOOL 1208.0 Bronx
169 ACADEMY FOR LANGUAGE AND TECHNOLOGY 951.0 Bronx
184 KINGSBRIDGE INTERNATIONAL HIGH SCHOOL 962.0 Bronx
187 INTERNATIONAL SCHOOL FOR LIBERAL ARTS 934.0 Bronx
188 INsTECH ACADEMY (M.S. / HIGH SCHOOL 368) 1181.0 Bronx
233 PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE 970.0 Bronx
239 HIGH SCHOOL OF WORLD CULTURES 939.0 Bronx
262 JUAN MOREL CAMPOS SECONDARY SCHOOL 1085.0 Brooklyn
276 EL PUENTE ACADEMY FOR PEACE AND JUSTICE 1035.0 Brooklyn
289 SUNSET PARK HIGH SCHOOL 1208.0 Brooklyn
331 FRANKLIN K. LANE HIGH SCHOOL 1208.0 Brooklyn
337 MULTICULTURAL HIGH SCHOOL 887.0 Brooklyn
377 PAN AMERICAN INTERNATIONAL HIGH SCHOOL 951.0 Queens
462 EBC HIGH SCHOOL FOR PUBLIC SERVICE–BUSHWICK 1154.0 Brooklyn
In [25]:
full[['SCHOOL NAME', 'sat_score', 'borough']][full['black_per'] > 80]
Out[25]:
SCHOOL NAME sat_score borough
245 ACADEMY OF BUSINESS AND COMMUNITY DEVELOPMENT 1231.0 Brooklyn
252 ACORN COMMUNITY HIGH SCHOOL 1116.0 Brooklyn
253 FREEDOM ACADEMY HIGH SCHOOL 1193.0 Brooklyn
255 BROOKLYN ACADEMY HIGH SCHOOL 1197.0 Brooklyn
256 BEDFORD STUYVESANT PREPARATORY HIGH SCHOOL 1183.0 Brooklyn
257 BEDFORD ACADEMY HIGH SCHOOL 1312.0 Brooklyn
260 BENJAMIN BANNEKER ACADEMY 1391.0 Brooklyn
285 PACIFIC HIGH SCHOOL 993.0 Brooklyn
287 METROPOLITAN CORPORATE ACADEMY HIGH SCHOOL 1101.0 Brooklyn
291 FREDERICK DOUGLASS ACADEMY IV SECONDARY SCHOOL 1068.0 Brooklyn
292 BOYS AND GIRLS HIGH SCHOOL 1097.0 Brooklyn
296 ACADEMY FOR COLLEGE PREPARATION AND CAREER EXP... 1139.0 Brooklyn
297 ACADEMY OF HOSPITALITY AND TOURISM 1045.0 Brooklyn
299 W.E.B. DUBOIS ACADEMIC HIGH SCHOOL 1092.0 Brooklyn
301 THE HIGH SCHOOL FOR GLOBAL CITIZENSHIP 1176.0 Brooklyn
302 SCHOOL FOR HUMAN RIGHTS, THE 1088.0 Brooklyn
303 SCHOOL FOR DEMOCRACY AND LEADERSHIP 1153.0 Brooklyn
304 HIGH SCHOOL FOR YOUTH AND COMMUNITY DEVELOPMEN... 1027.0 Brooklyn
305 HIGH SCHOOL FOR SERVICE & LEARNING AT ERASMUS 1105.0 Brooklyn
306 SCIENCE, TECHNOLOGY AND RESEARCH EARLY COLLEGE... 1360.0 Brooklyn
307 INTERNATIONAL ARTS BUSINESS SCHOOL 1146.0 Brooklyn
308 HIGH SCHOOL FOR PUBLIC SERVICE: HEROES OF TOMO... 1273.0 Brooklyn
310 BROOKLYN SCHOOL FOR MUSIC & THEATRE 1151.0 Brooklyn
311 BROWNSVILLE ACADEMY HIGH SCHOOL 1063.0 Brooklyn
312 MEDGAR EVERS COLLEGE PREPARATORY SCHOOL 1436.0 Brooklyn
313 CLARA BARTON HIGH SCHOOL 1251.0 Brooklyn
314 PAUL ROBESON HIGH SCHOOL 1083.0 Brooklyn
315 ACADEMY FOR HEALTH CAREERS 1208.0 Brooklyn
316 IT TAKES A VILLAGE ACADEMY 963.0 Brooklyn
317 BROOKLYN GENERATION SCHOOL 1145.0 Brooklyn
318 BROOKLYN THEATRE ARTS HIGH SCHOOL 1118.0 Brooklyn
319 KURT HAHN EXPEDITIONARY LEARNING SCHOOL 1092.0 Brooklyn
320 VICTORY COLLEGIATE HIGH SCHOOL 1143.0 Brooklyn
321 BROOKLYN BRIDGE ACADEMY 1097.0 Brooklyn
322 ARTS & MEDIA PREPARATORY ACADEMY 1080.0 Brooklyn
323 HIGH SCHOOL FOR INNOVATION IN ADVERTISING AND ... 1183.0 Brooklyn
324 CULTURAL ACADEMY FOR THE ARTS AND SCIENCES 1169.0 Brooklyn
325 HIGH SCHOOL FOR MEDICAL PROFESSIONS 1159.0 Brooklyn
326 OLYMPUS ACADEMY 1140.0 Brooklyn
327 ACADEMY FOR CONSERVATION AND THE ENVIRONMENT 1111.0 Brooklyn
328 URBAN ACTION ACADEMY 1135.0 Brooklyn
329 EAST BROOKLYN COMMUNITY HIGH SCHOOL 1191.0 Brooklyn
335 PERFORMING ARTS AND TECHNOLOGY HIGH SCHOOL 1149.0 Brooklyn
336 WORLD ACADEMY FOR TOTAL COMMUNITY HEALTH HIGH ... 1106.0 Brooklyn
368 BROOKLYN COLLEGIATE: A COLLEGE BOARD SCHOOL 1185.0 Brooklyn
369 FREDERICK DOUGLASS ACADEMY VII HIGH SCHOOL 1091.0 Brooklyn
370 BROOKLYN DEMOCRACY ACADEMY 1018.0 Brooklyn
372 METROPOLITAN DIPLOMA PLUS HIGH SCHOOL 1028.0 Brooklyn
373 TEACHERS PREPARATORY HIGH SCHOOL 1196.0 Brooklyn
430 QUEENS PREPARATORY ACADEMY 1099.0 Queens
431 PATHWAYS COLLEGE PREPARATORY SCHOOL: A COLLEGE... 1173.0 Queens
432 EXCELSIOR PREPARATORY HIGH SCHOOL 1202.0 Queens
434 PREPARATORY ACADEMY FOR WRITERS: A COLLEGE BOA... 1100.0 Queens
435 CAMBRIA HEIGHTS ACADEMY 1208.0 Queens
437 LAW, GOVERNMENT AND COMMUNITY SERVICE HIGH SCHOOL 1139.0 Queens
438 BUSINESS, COMPUTER APPLICATIONS & ENTREPRENEUR... 1152.0 Queens
439 HUMANITIES & ARTS MAGNET HIGH SCHOOL 1151.0 Queens

One final aspect that stands out from our neg_corr grid involves students with special needs. We see that the percentage of Special Ed students appears to have some negative correlation with SAT scores, but it is easy to miss the two other features that align with this finding: Local - % of grads and Local - % of cohort. These features don't refer to the locale of students, but rather to the type of diploma they receive.

In NYC, high school students can receive one of three different diplomas: Advanced Regents, Regents, or Local. Local diplomas are only available to qualifying students: those with Individualized Education Plans or disabilities.

The takeaway from these features, however, is not that special needs students do worse on the SAT. Rather, when we look at the schools that have more than 70 percent of students graduating with Local diplomas or that have 25 percent of students in a Special Ed program, we see two main things:

  1. The Bronx and Brooklyn dominate these lists again, comprising 32 of the 45 schools. (Staten Island had one school that met these criteria); and
  2. Even having scores on the lower end, none of these 45 schools had scores in the sub-1000 range — despite their higher than average proportion of special needs students.
In [26]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District']][full['Local - % of grads'] > 70]
Out[26]:
SCHOOL NAME sat_score borough District
27 INSTITUTE FOR COLLABORATIVE EDUCATION 1424.0 Manhattan DISTRICT 02
35 LANDMARK HIGH SCHOOL 1170.0 Manhattan DISTRICT 02
39 LEGACY SCHOOL FOR INTEGRATED STUDIES 1062.0 Manhattan DISTRICT 02
57 INDEPENDENCE HIGH SCHOOL 1095.0 Manhattan DISTRICT 02
63 SATELLITE ACADEMY HIGH SCHOOL 1032.0 Manhattan DISTRICT 02
87 EDWARD A. REYNOLDS WEST SIDE HIGH SCHOOL 1121.0 Manhattan DISTRICT 03
98 HARLEM RENAISSANCE HIGH SCHOOL 1008.0 Manhattan DISTRICT 05
122 JILL CHAIFETZ TRANSFER HIGH SCHOOL 1208.0 Bronx DISTRICT 07
123 BRONX HAVEN HIGH SCHOOL 1208.0 Bronx DISTRICT 07
145 BRONX COMMUNITY HIGH SCHOOL 1112.0 Bronx DISTRICT 08
149 BRONX GUILD 1105.0 Bronx DISTRICT 08
155 HIGH SCHOOL X560 s BRONX ACADEMY HIGH SCHOOL 1171.0 Bronx DISTRICT 08
255 BROOKLYN ACADEMY HIGH SCHOOL 1197.0 Brooklyn DISTRICT 13
259 BROOKLYN HIGH SCHOOL FOR LEADERSHIP AND COMMUN... 1208.0 Brooklyn DISTRICT 13
276 EL PUENTE ACADEMY FOR PEACE AND JUSTICE 1035.0 Brooklyn DISTRICT 14
290 SOUTH BROOKLYN COMMUNITY HIGH SCHOOL 1271.0 Brooklyn DISTRICT 15
299 W.E.B. DUBOIS ACADEMIC HIGH SCHOOL 1092.0 Brooklyn DISTRICT 17
329 EAST BROOKLYN COMMUNITY HIGH SCHOOL 1191.0 Brooklyn DISTRICT 18
342 W. H. MAXWELL CAREER AND TECHNICAL EDUCATION H... 1102.0 Brooklyn DISTRICT 19
467 BUSHWICK COMMUNITY HIGH SCHOOL 1034.0 Brooklyn DISTRICT 32
In [27]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District']][full['sped_percent'] > 25]
Out[27]:
SCHOOL NAME sat_score borough District
2 EAST SIDE COMMUNITY SCHOOL 1149.0 Manhattan DISTRICT 01
4 MARTA VALLE HIGH SCHOOL 1207.0 Manhattan DISTRICT 01
9 47 THE AMERICAN SIGN LANGUAGE AND ENGLISH SECO... 1182.0 Manhattan DISTRICT 02
50 UNITY CENTER FOR URBAN TECHNOLOGIES 1070.0 Manhattan DISTRICT 02
111 HIGH SCHOOL FOR EXCELLENCE AND INNOVATION 1208.0 Manhattan DISTRICT 06
130 NEW EXPLORERS HIGH SCHOOL 1084.0 Bronx DISTRICT 07
134 SAMUEL GOMPERS CAREER AND TECHNICAL EDUCATION ... 1184.0 Bronx DISTRICT 07
140 PABLO NERUDA ACADEMY FOR ARCHITECTURE AND WORL... 1038.0 Bronx DISTRICT 08
152 BANANA KELLY HIGH SCHOOL 1131.0 Bronx DISTRICT 08
154 SCHOOL FOR COMMUNITY RESEARCH AND LEARNING 1134.0 Bronx DISTRICT 08
159 URBAN ASSEMBLY ACADEMY FOR HISTORY AND CITIZEN... 1084.0 Bronx DISTRICT 09
171 SCHOOL FOR EXCELLENCE 1074.0 Bronx DISTRICT 09
218 HARRY S TRUMAN HIGH SCHOOL 1151.0 Bronx DISTRICT 11
225 BRONX AEROSPACE HIGH SCHOOL 1163.0 Bronx DISTRICT 11
229 PERFORMANCE CONSERVATORY HIGH SCHOOL 1074.0 Bronx DISTRICT 12
237 BRONX CAREER AND COLLEGE PREPARATORY HIGH SCHOOL 1208.0 Bronx DISTRICT 12
240 FANNIE LOU HAMER FREEDOM HIGH SCHOOL 1029.0 Bronx DISTRICT 12
262 JUAN MOREL CAMPOS SECONDARY SCHOOL 1085.0 Brooklyn DISTRICT 14
263 FOUNDATIONS ACADEMY 1208.0 Brooklyn DISTRICT 14
274 AUTOMOTIVE HIGH SCHOOL 1093.0 Brooklyn DISTRICT 14
278 BROOKLYN SCHOOL FOR GLOBAL STUDIES 1111.0 Brooklyn DISTRICT 15
279 BROOKLYN SECONDARY SCHOOL FOR COLLABORATIVE ST... 1179.0 Brooklyn DISTRICT 15
314 PAUL ROBESON HIGH SCHOOL 1083.0 Brooklyn DISTRICT 17
342 W. H. MAXWELL CAREER AND TECHNICAL EDUCATION H... 1102.0 Brooklyn DISTRICT 19
458 RALPH R. MCKEE CAREER AND TECHNICAL EDUCATION ... 1235.0 Staten Island DISTRICT 31

With our initial exploration of the negative correlations done, we can begin outlining our discoveries:

  • Locations (District and Borough) appear to be related to average score, but likely due to various other factors.
  • Gender variables did not receive top r-values, but should be looked into.
  • Cultural aspects seem to play a part. Percentages of both Black and Hispanic students may have negative correlations to SAT score, but English proficiency seems to have the more noticeable connection at the extreme ends.
  • Poorer students (those who receive free or reduced-price lunch) have the most consistent correlation to low scores.
  • Schools with a greater percentage of special needs students have lower, but not the lowest, scores.

Exploring Positive Correlations

Moving on to our positive correlations, we see that our original AP-related features show very little linear trend. On the other hand, the AP features we created in this notebook — pct_3_plus and pct_AP — show clearer upward trends, but the strength of their r-values may be somewhat overinflated by outliers.

In [28]:
graph_corrs(full, pos_corr, orientation='positive')

If we take a closer look at the schools with either a) more than 20 percent AP test takers; or b) more than 70 percent 3+ AP scores, we notice the following:

  • A handful of schools had extremely high proportions of test takers scoring 3 and above, but had very low average SAT scores.
    • These are schools with high percentages of English Language Learners, suggesting that either AP tests don't require the same command of the English language, or that students at these schools took AP test in math/science rather than reading/writing.

  • These schools are more evenly distributed among the boroughs (except Staten Island) than the schools we examined through our neg_corr exploration.
    • Again, there is only one school from Staten Island, but this time it is the borough's only — and significant — outlier: Staten Island Technical High School.
In [29]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'pct_3_plus']][full['pct_3_plus'] > 0.7].sort_values(by='sat_score')
Out[29]:
SCHOOL NAME sat_score borough District pct_3_plus
337 MULTICULTURAL HIGH SCHOOL 887.0 Brooklyn DISTRICT 19 0.886364
169 ACADEMY FOR LANGUAGE AND TECHNOLOGY 951.0 Bronx DISTRICT 09 1.000000
262 JUAN MOREL CAMPOS SECONDARY SCHOOL 1085.0 Brooklyn DISTRICT 14 0.750000
446 NEWCOMERS HIGH SCHOOL 1127.0 Queens DISTRICT 30 0.858896
128 FOREIGN LANGUAGE ACADEMY OF GLOBAL STUDIES 1186.0 Bronx DISTRICT 07 0.923077
5 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 1205.0 Manhattan DISTRICT 01 0.923077
386 QUEENS VOCATIONAL AND TECHNICAL HIGH SCHOOL 1270.0 Queens DISTRICT 24 0.740741
104 FREDERICK DOUGLASS ACADEMY 1374.0 Manhattan DISTRICT 05 0.703540
58 HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES 1424.0 Manhattan DISTRICT 02 0.927083
28 PROFESSIONAL PERFORMING ARTS HIGH SCHOOL 1522.0 Manhattan DISTRICT 02 0.750000
29 BARUCH COLLEGE CAMPUS HIGH SCHOOL 1577.0 Manhattan DISTRICT 02 0.765217
34 MILLENNIUM HIGH SCHOOL 1614.0 Manhattan DISTRICT 02 0.705263
83 BEACON HIGH SCHOOL 1744.0 Manhattan DISTRICT 03 0.710660
33 ELEANOR ROOSEVELT HIGH SCHOOL 1758.0 Manhattan DISTRICT 02 0.719149
249 BROOKLYN TECHNICAL HIGH SCHOOL 1833.0 Brooklyn DISTRICT 13 0.727790
427 QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO... 1868.0 Queens DISTRICT 28 0.813609
396 TOWNSEND HARRIS HIGH SCHOOL 1910.0 Queens DISTRICT 25 0.785176
206 HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE 1920.0 Bronx DISTRICT 10 0.804636
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31 0.893923
198 BRONX HIGH SCHOOL OF SCIENCE 1969.0 Bronx DISTRICT 10 0.898973
48 STUYVESANT HIGH SCHOOL 2096.0 Manhattan DISTRICT 02 0.939340
In [30]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'pct_AP']][full['pct_AP'] > 0.2].sort_values(by='sat_score')
Out[30]:
SCHOOL NAME sat_score borough District pct_AP
115 HIGH SCHOOL FOR LAW AND PUBLIC SERVICE 1102.0 Manhattan DISTRICT 06 0.203170
180 BRONX ENGINEERING AND TECHNOLOGY ACADEMY 1150.0 Bronx DISTRICT 10 0.240991
161 EXIMIUS COLLEGE PREPARATORY ACADEMY: A COLLEGE... 1169.0 Bronx DISTRICT 09 0.252809
384 HIGH SCHOOL FOR ARTS AND BUSINESS 1174.0 Queens DISTRICT 24 0.239130
253 FREEDOM ACADEMY HIGH SCHOOL 1193.0 Brooklyn DISTRICT 13 0.339450
263 FOUNDATIONS ACADEMY 1208.0 Brooklyn DISTRICT 14 0.297872
14 URBAN ASSEMBLY SCHOOL OF DESIGN AND CONSTRUCTI... 1269.0 Manhattan DISTRICT 02 0.229698
308 HIGH SCHOOL FOR PUBLIC SERVICE: HEROES OF TOMO... 1273.0 Brooklyn DISTRICT 17 0.254197
374 ACADEMY OF FINANCE AND ENTERPRISE 1280.0 Queens DISTRICT 24 0.226190
312 MEDGAR EVERS COLLEGE PREPARATORY SCHOOL 1436.0 Brooklyn DISTRICT 17 0.246377
366 LEON M. GOLDSTEIN HIGH SCHOOL FOR THE SCIENCES 1627.0 Brooklyn DISTRICT 22 0.359100
84 FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A... 1707.0 Manhattan DISTRICT 03 0.265259
33 ELEANOR ROOSEVELT HIGH SCHOOL 1758.0 Manhattan DISTRICT 02 0.305720
249 BROOKLYN TECHNICAL HIGH SCHOOL 1833.0 Brooklyn DISTRICT 13 0.397037
107 HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN... 1847.0 Manhattan DISTRICT 05 0.280788
427 QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO... 1868.0 Queens DISTRICT 28 0.514354
396 TOWNSEND HARRIS HIGH SCHOOL 1910.0 Queens DISTRICT 25 0.537719
206 HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE 1920.0 Bronx DISTRICT 10 0.514589
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31 0.478261
198 BRONX HIGH SCHOOL OF SCIENCE 1969.0 Bronx DISTRICT 10 0.394955
48 STUYVESANT HIGH SCHOOL 2096.0 Manhattan DISTRICT 02 0.457992

Although our AP features weren't as promising as we'd hoped, we did see some correlation. However, another feature related to high-achieving students may shed some light on the matter. With a quick glance at our regplot grid, we see that all the schools flagged as "Specialized" have average SAT scores upwards of 1700. All except The Brooklyn Latin School also hold top spots in our AP features.


What is special about these schools? A Google search reveals that eight of the nine "Specialized" schools in our dataset are among the top 10 high schools in New York State (top 100 in the U.S.) and require eighth graders to achieve a certain rank on the Specialized High School Admissions Test prior to admission.

The ninth school on this list, Fiorello H. Laguardia High School of Music & Arts and Performing Arts, is also elite, though more emphasis is placed on their auditions than academics. Incoming students to this school only have to show "evidence of satisfactory achievement" which is defined as 75 or higher in core subjects and a 90 percent attendance rate. Emphasis on the Arts over academics may explain why this school has the lowest mean SAT score of the nine, but 1707 is still one of the highest in our dataset.

It is quite possible that this small group of elite schools is inflating our r-values for pct_AP and pct_3_plus.

In [31]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District']][full['is_specialized'] == 1].sort_values(by='sat_score')
Out[31]:
SCHOOL NAME sat_score borough District
84 FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A... 1707.0 Manhattan DISTRICT 03
265 BROOKLYN LATIN SCHOOL, THE 1740.0 Brooklyn DISTRICT 14
249 BROOKLYN TECHNICAL HIGH SCHOOL 1833.0 Brooklyn DISTRICT 13
107 HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN... 1847.0 Manhattan DISTRICT 05
427 QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO... 1868.0 Queens DISTRICT 28
206 HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE 1920.0 Bronx DISTRICT 10
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31
198 BRONX HIGH SCHOOL OF SCIENCE 1969.0 Bronx DISTRICT 10
48 STUYVESANT HIGH SCHOOL 2096.0 Manhattan DISTRICT 02

Moving on to the cultural demographics that appear in pos_corr, we see that SAT scores trend upward in relation to larger percentages of White and Asian students in a school. However, neither plot is particularly linear and asian_per shows a number of highly-Asian schools in the mid-to-lower SAT score range. Upon closer inspection, we find the lower-scoring schools are once again International schools. Schools with over 30 percent White students, on the other hand, had average SAT scores that bottomed out at 1195.

Interestingly, when comparing these ethnic demographic variables to our highly Hispanic or Black schools, we find some remarkable contrast in both location and saturation:

  1. Our cutoff for examining "highly" Hispanic and Black schools was 80 percent. For White and Asian student proportions it is 30 percent. There are only 3 schools in our data that are above 80 percent White or Asian.
  2. The majority of highly Black or Hispanic schools were located in Brooklyn, The Bronx or Manhattan. No Schools in Staten Island had over 80 percent Black or Hispanic students.
  3. There are more highly Asian schools in Queens than the other boroughs, though Brooklyn and Manhattan are well-represented. Staten Island and the Bronx only have one school each in this category.
  4. Highly White schools are more evenly spread among the boroughs — except the Bronx, where there is only one school with more than 30 percent White students. However, seven of Staten Island's 13 high schools fall into this category — more than any feature we've examined so far.
  5. 50+ percent Black or Hispanic student body and 30+ percent White or Asian student body are mutually exclusive.
In [32]:
# School info where 30+ percent Asian
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'asian_per']][full['asian_per'] > 30].sort_values(by='borough')
Out[32]:
SCHOOL NAME sat_score borough District asian_per
198 BRONX HIGH SCHOOL OF SCIENCE 1969.0 Bronx DISTRICT 10 63.5
363 MIDWOOD HIGH SCHOOL 1473.0 Brooklyn DISTRICT 22 31.7
357 JOHN DEWEY HIGH SCHOOL 1262.0 Brooklyn DISTRICT 21 39.1
350 INTERNATIONAL HIGH SCHOOL AT LAFAYETTE 1026.0 Brooklyn DISTRICT 21 42.0
348 THE URBAN ASSEMBLY SCHOOL FOR CRIMINAL JUSTICE 1208.0 Brooklyn DISTRICT 20 35.0
347 FRANKLIN DELANO ROOSEVELT HIGH SCHOOL 1244.0 Brooklyn DISTRICT 20 44.2
346 FORT HAMILTON HIGH SCHOOL 1306.0 Brooklyn DISTRICT 20 30.3
344 NEW UTRECHT HIGH SCHOOL 1272.0 Brooklyn DISTRICT 20 35.9
265 BROOKLYN LATIN SCHOOL, THE 1740.0 Brooklyn DISTRICT 14 36.8
250 BROOKLYN INTERNATIONAL HIGH SCHOOL 981.0 Brooklyn DISTRICT 13 42.5
249 BROOKLYN TECHNICAL HIGH SCHOOL 1833.0 Brooklyn DISTRICT 13 60.3
5 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 1205.0 Manhattan DISTRICT 01 84.7
79 INNOVATION DIPLOMA PLUS 1200.0 Manhattan DISTRICT 03 48.7
64 MANHATTAN COMPREHENSIVE NIGHT AND DAY HIGH SCHOOL 1205.0 Manhattan DISTRICT 02 35.3
58 HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES 1424.0 Manhattan DISTRICT 02 89.5
48 STUYVESANT HIGH SCHOOL 2096.0 Manhattan DISTRICT 02 72.1
45 MANHATTAN INTERNATIONAL HIGH SCHOOL 1227.0 Manhattan DISTRICT 02 33.5
41 INTERNATIONAL HIGH SCHOOL AT UNION SQUARE 1208.0 Manhattan DISTRICT 02 40.3
32 N.Y.C. MUSEUM SCHOOL 1419.0 Manhattan DISTRICT 02 33.0
29 BARUCH COLLEGE CAMPUS HIGH SCHOOL 1577.0 Manhattan DISTRICT 02 60.6
25 THE HIGH SCHOOL FOR LANGUAGE AND DIPLOMACY 1208.0 Manhattan DISTRICT 02 44.3
24 EMMA LAZARUS HIGH SCHOOL 1188.0 Manhattan DISTRICT 02 62.4
107 HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN... 1847.0 Manhattan DISTRICT 05 36.2
420 JAMAICA GATEWAY TO THE SCIENCES 1307.0 Queens DISTRICT 28 45.1
422 JAMAICA HIGH SCHOOL 1063.0 Queens DISTRICT 28 31.0
423 HILLCREST HIGH SCHOOL 1194.0 Queens DISTRICT 28 36.5
446 NEWCOMERS HIGH SCHOOL 1127.0 Queens DISTRICT 30 48.3
425 QUEENS GATEWAY TO HEALTH SCIENCES SECONDARY SC... 1538.0 Queens DISTRICT 28 36.5
427 QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO... 1868.0 Queens DISTRICT 28 74.4
440 YOUNG WOMEN'S LEADERSHIP SCHOOL, ASTORIA 1208.0 Queens DISTRICT 30 39.2
415 HIGH SCHOOL FOR CONSTRUCTION TRADES, ENGINEERI... 1345.0 Queens DISTRICT 27 30.7
424 THOMAS A. EDISON CAREER AND TECHNICAL EDUCATIO... 1372.0 Queens DISTRICT 28 50.9
413 RICHMOND HILL HIGH SCHOOL 1154.0 Queens DISTRICT 27 32.9
393 JOHN BOWNE HIGH SCHOOL 1243.0 Queens DISTRICT 25 34.3
401 FRANCIS LEWIS HIGH SCHOOL 1474.0 Queens DISTRICT 26 51.9
400 BENJAMIN N. CARDOZO HIGH SCHOOL 1514.0 Queens DISTRICT 26 45.7
396 TOWNSEND HARRIS HIGH SCHOOL 1910.0 Queens DISTRICT 25 55.5
391 EASTsWEST SCHOOL OF INTERNATIONAL STUDIES 1271.0 Queens DISTRICT 25 62.7
390 FLUSHING INTERNATIONAL HIGH SCHOOL 1049.0 Queens DISTRICT 25 64.5
389 QUEENS SCHOOL OF INQUIRY, THE 1396.0 Queens DISTRICT 25 40.2
383 INTERNATIONAL HIGH SCHOOL AT LAGUARDIA COMMUNI... 1064.0 Queens DISTRICT 24 43.0
378 BARD HIGH SCHOOL EARLY COLLEGE II 1663.0 Queens DISTRICT 24 32.0
448 BACCALAUREATE SCHOOL FOR GLOBAL EDUCATION 1636.0 Queens DISTRICT 30 34.8
403 BAYSIDE HIGH SCHOOL 1449.0 Queens DISTRICT 26 45.2
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31 31.8
In [33]:
# School info where 30+ percent White
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'white_per']][full['white_per'] > 30].sort_values(by='borough')
Out[33]:
SCHOOL NAME sat_score borough District white_per
206 HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE 1920.0 Bronx DISTRICT 10 53.8
356 EDWARD R. MURROW HIGH SCHOOL 1431.0 Brooklyn DISTRICT 21 30.8
366 LEON M. GOLDSTEIN HIGH SCHOOL FOR THE SCIENCES 1627.0 Brooklyn DISTRICT 22 65.6
364 JAMES MADISON HIGH SCHOOL 1350.0 Brooklyn DISTRICT 22 46.4
361 BROOKLYN STUDIO SECONDARY SCHOOL 1313.0 Brooklyn DISTRICT 21 56.5
355 KINGSBOROUGH EARLY COLLEGE SCHOOL 1208.0 Brooklyn DISTRICT 21 44.7
351 RACHEL CARSON HIGH SCHOOL FOR COASTAL STUDIES 1237.0 Brooklyn DISTRICT 21 37.6
346 FORT HAMILTON HIGH SCHOOL 1306.0 Brooklyn DISTRICT 20 34.2
344 NEW UTRECHT HIGH SCHOOL 1272.0 Brooklyn DISTRICT 20 30.5
84 FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A... 1707.0 Manhattan DISTRICT 03 49.2
6 NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ... 1621.0 Manhattan DISTRICT 01 44.9
81 FRANK MCCOURT HIGH SCHOOL 1208.0 Manhattan DISTRICT 03 37.0
34 MILLENNIUM HIGH SCHOOL 1614.0 Manhattan DISTRICT 02 35.9
33 ELEANOR ROOSEVELT HIGH SCHOOL 1758.0 Manhattan DISTRICT 02 63.7
31 SCHOOL OF THE FUTURE HIGH SCHOOL 1565.0 Manhattan DISTRICT 02 38.0
30 N.Y.C. LAB SCHOOL FOR COLLABORATIVE STUDIES 1677.0 Manhattan DISTRICT 02 45.9
28 PROFESSIONAL PERFORMING ARTS HIGH SCHOOL 1522.0 Manhattan DISTRICT 02 41.8
27 INSTITUTE FOR COLLABORATIVE EDUCATION 1424.0 Manhattan DISTRICT 02 54.6
8 BARD HIGH SCHOOL EARLY COLLEGE 1856.0 Manhattan DISTRICT 01 49.8
83 BEACON HIGH SCHOOL 1744.0 Manhattan DISTRICT 03 49.8
448 BACCALAUREATE SCHOOL FOR GLOBAL EDUCATION 1636.0 Queens DISTRICT 30 37.3
444 FRANK SINATRA SCHOOL OF THE ARTS HIGH SCHOOL 1494.0 Queens DISTRICT 30 46.3
426 QUEENS METROPOLITAN HIGH SCHOOL 1208.0 Queens DISTRICT 28 40.4
410 SCHOLARS' ACADEMY 1532.0 Queens DISTRICT 27 41.6
392 WORLD JOURNALISM PREPARATORY: A COLLEGE BOARD ... 1441.0 Queens DISTRICT 25 51.6
378 BARD HIGH SCHOOL EARLY COLLEGE II 1663.0 Queens DISTRICT 24 30.3
421 FOREST HILLS HIGH SCHOOL 1407.0 Queens DISTRICT 28 33.7
456 SUSAN E. WAGNER HIGH SCHOOL 1388.0 Staten Island DISTRICT 31 49.4
449 CSI HIGH SCHOOL FOR INTERNATIONAL STUDIES 1353.0 Staten Island DISTRICT 31 58.5
450 GAYNOR MCCOWN EXPEDITIONARY LEARNING SCHOOL 1195.0 Staten Island DISTRICT 31 60.0
451 THE MICHAEL J. PETRIDES SCHOOL 1426.0 Staten Island DISTRICT 31 55.9
452 NEW DORP HIGH SCHOOL 1277.0 Staten Island DISTRICT 31 53.3
455 TOTTENVILLE HIGH SCHOOL 1418.0 Staten Island DISTRICT 31 82.1
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31 61.3
In [34]:
# Number of schools with 50+ percent Black or Hisp., 30+/50+ percent White or Asian
print('Schools with 50+ percent Black or Hispanic students: {}'.format(len(full[full['black_per'] > 50]) + len(full[full['hispanic_per'] > 50])))
print('Schools with 30+ percent White or Asian students: {}'.format(len(full[full['white_per'] > 30]) + len(full[full['asian_per'] > 30])))
print('Schools with 50+ percent White or Asian students: {}'.format(len(full[full['white_per'] > 50]) + len(full[full['asian_per'] > 50])))
Schools with 50+ percent Black or Hispanic students: 324
Schools with 30+ percent White or Asian students: 79
Schools with 50+ percent White or Asian students: 25
In [35]:
# Max percent of other races by type
race = ['White', 'Asian', 'Black', 'Hispanic']
feat = ['white_per', 'asian_per', 'black_per', 'hispanic_per']
race_per = [30, 30, 50, 50]

print('Max. racial percentages for schools with... \n')

for x in range(len(race)):
    print('{}+ percent {} students:'.format(race_per[x], race[x]))
    print(full[[f for f in feat if f != feat[x]]][full[feat[x]] >= race_per[x]].max())
    print('\n')
Max. racial percentages for schools with... 

30+ percent White students:
asian_per       35.9
black_per       25.0
hispanic_per    38.6
dtype: float64


30+ percent Asian students:
white_per       61.3
black_per       44.8
hispanic_per    46.4
dtype: float64


50+ percent Black students:
white_per       11.2
asian_per       24.1
hispanic_per    45.4
dtype: float64


50+ percent Hispanic students:
white_per    24.9
asian_per    27.1
black_per    45.5
dtype: float64


The last standout from our pos_corr regplots is the apparent positive correlation between the percentage of "Advanced Regents" graduates and SAT scores. Of all the plots, the Advanced Regents features had the clearest linear relationship to SAT performance. As briefly discussed earlier, students in New York high schools can earn Local, Regents or Advanced Regents diplomas. It is somewhat intuitive that Advanced Regents graduates would perform better on the SAT, as it has the strictest requirements of the three diploma types.

Schools with a high proportion of Advanced Regents graduates (40-plus percent) included all nine of our Specialized schools and were found in all five boroughs. However, only two schools each from the Bronx and Staten Island fit this classification — three of which were Specialized schools: High School of American Studies at Lehman College, Bronx High School of Science and Staten Island Technical High School.

In [36]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'Advanced Regents - % of cohort']][full['Advanced Regents - % of cohort'] > 40].sort_values(by='borough')
Out[36]:
SCHOOL NAME sat_score borough District Advanced Regents - % of cohort
206 HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE 1920.0 Bronx DISTRICT 10 80.666667
198 BRONX HIGH SCHOOL OF SCIENCE 1969.0 Bronx DISTRICT 10 97.271429
257 BEDFORD ACADEMY HIGH SCHOOL 1312.0 Brooklyn DISTRICT 13 60.780000
363 MIDWOOD HIGH SCHOOL 1473.0 Brooklyn DISTRICT 22 51.500000
345 HIGH SCHOOL OF TELECOMMUNICATION ARTS AND TECH... 1323.0 Brooklyn DISTRICT 20 40.442857
265 BROOKLYN LATIN SCHOOL, THE 1740.0 Brooklyn DISTRICT 14 51.000000
249 BROOKLYN TECHNICAL HIGH SCHOOL 1833.0 Brooklyn DISTRICT 13 82.514286
107 HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN... 1847.0 Manhattan DISTRICT 05 73.483333
6 NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ... 1621.0 Manhattan DISTRICT 01 77.380000
84 FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A... 1707.0 Manhattan DISTRICT 03 55.700000
58 HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES 1424.0 Manhattan DISTRICT 02 72.920000
48 STUYVESANT HIGH SCHOOL 2096.0 Manhattan DISTRICT 02 96.771429
33 ELEANOR ROOSEVELT HIGH SCHOOL 1758.0 Manhattan DISTRICT 02 61.800000
30 N.Y.C. LAB SCHOOL FOR COLLABORATIVE STUDIES 1677.0 Manhattan DISTRICT 02 52.557143
29 BARUCH COLLEGE CAMPUS HIGH SCHOOL 1577.0 Manhattan DISTRICT 02 58.228571
92 MANHATTAN CENTER FOR SCIENCE AND MATHEMATICS 1430.0 Manhattan DISTRICT 04 52.642857
374 ACADEMY OF FINANCE AND ENTERPRISE 1280.0 Queens DISTRICT 24 44.033333
396 TOWNSEND HARRIS HIGH SCHOOL 1910.0 Queens DISTRICT 25 98.485714
401 FRANCIS LEWIS HIGH SCHOOL 1474.0 Queens DISTRICT 26 40.985714
403 BAYSIDE HIGH SCHOOL 1449.0 Queens DISTRICT 26 46.371429
425 QUEENS GATEWAY TO HEALTH SCIENCES SECONDARY SC... 1538.0 Queens DISTRICT 28 65.714286
427 QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO... 1868.0 Queens DISTRICT 28 86.533333
447 ACADEMY OF AMERICAN STUDIES 1470.0 Queens DISTRICT 30 41.557143
451 THE MICHAEL J. PETRIDES SCHOOL 1426.0 Staten Island DISTRICT 31 43.914286
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31 95.000000
In [37]:
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'Advanced Regents - % of grads']][full['Advanced Regents - % of grads'] > 40].sort_values(by='borough')
Out[37]:
SCHOOL NAME sat_score borough District Advanced Regents - % of grads
206 HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE 1920.0 Bronx DISTRICT 10 84.233333
198 BRONX HIGH SCHOOL OF SCIENCE 1969.0 Bronx DISTRICT 10 99.185714
260 BENJAMIN BANNEKER ACADEMY 1391.0 Brooklyn DISTRICT 13 40.685714
363 MIDWOOD HIGH SCHOOL 1473.0 Brooklyn DISTRICT 22 62.171429
356 EDWARD R. MURROW HIGH SCHOOL 1431.0 Brooklyn DISTRICT 21 47.814286
345 HIGH SCHOOL OF TELECOMMUNICATION ARTS AND TECH... 1323.0 Brooklyn DISTRICT 20 54.700000
265 BROOKLYN LATIN SCHOOL, THE 1740.0 Brooklyn DISTRICT 14 53.150000
257 BEDFORD ACADEMY HIGH SCHOOL 1312.0 Brooklyn DISTRICT 13 64.740000
249 BROOKLYN TECHNICAL HIGH SCHOOL 1833.0 Brooklyn DISTRICT 13 89.671429
107 HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN... 1847.0 Manhattan DISTRICT 05 91.216667
5 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 1205.0 Manhattan DISTRICT 01 44.771429
88 MANHATTAN / HUNTER SCIENCE HIGH SCHOOL 1446.0 Manhattan DISTRICT 03 41.000000
84 FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A... 1707.0 Manhattan DISTRICT 03 58.385714
58 HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES 1424.0 Manhattan DISTRICT 02 84.000000
48 STUYVESANT HIGH SCHOOL 2096.0 Manhattan DISTRICT 02 99.242857
33 ELEANOR ROOSEVELT HIGH SCHOOL 1758.0 Manhattan DISTRICT 02 62.000000
30 N.Y.C. LAB SCHOOL FOR COLLABORATIVE STUDIES 1677.0 Manhattan DISTRICT 02 54.571429
29 BARUCH COLLEGE CAMPUS HIGH SCHOOL 1577.0 Manhattan DISTRICT 02 59.214286
6 NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ... 1621.0 Manhattan DISTRICT 01 78.660000
92 MANHATTAN CENTER FOR SCIENCE AND MATHEMATICS 1430.0 Manhattan DISTRICT 04 65.714286
447 ACADEMY OF AMERICAN STUDIES 1470.0 Queens DISTRICT 30 49.471429
427 QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO... 1868.0 Queens DISTRICT 28 92.200000
425 QUEENS GATEWAY TO HEALTH SCIENCES SECONDARY SC... 1538.0 Queens DISTRICT 28 71.514286
403 BAYSIDE HIGH SCHOOL 1449.0 Queens DISTRICT 26 62.128571
374 ACADEMY OF FINANCE AND ENTERPRISE 1280.0 Queens DISTRICT 24 49.400000
400 BENJAMIN N. CARDOZO HIGH SCHOOL 1514.0 Queens DISTRICT 26 47.357143
396 TOWNSEND HARRIS HIGH SCHOOL 1910.0 Queens DISTRICT 25 98.857143
387 AVIATION CAREER & TECHNICAL EDUCATION HIGH SCHOOL 1364.0 Queens DISTRICT 24 43.185714
401 FRANCIS LEWIS HIGH SCHOOL 1474.0 Queens DISTRICT 26 53.557143
451 THE MICHAEL J. PETRIDES SCHOOL 1426.0 Staten Island DISTRICT 31 46.957143
459 STATEN ISLAND TECHNICAL HIGH SCHOOL 1953.0 Staten Island DISTRICT 31 97.814286

Exploring the Male-Female Split

To this point we haven't seen any information relating to a school's male to female ratio. It is possible that there is no correlation between SAT scores and gender. After all, male_per and female_per received r-values of -0.096 and 0.097 respectively. As we mentioned earlier though, Pearson's r is only accurate if certain assumptions are met, and it only accounts for linear relationships. To make sure we don't rule out a gender correlation early, we should at least take a look at the scatter plots for each.

In [38]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5), sharey=True)
sns.scatterplot(data=full, x='male_per', y='sat_score', color='tab:blue', ax=ax1)
sns.scatterplot(data=full, x='female_per', y='sat_score', color='tab:pink', ax=ax2)
plt.tight_layout()

The scatter plots indicate virtually no linear correlation, which lines up with the r-values they received from our Pearson analysis. However, we do see that all the higher scoring schools have a male to female split between roughly 30/70 to 70/30. This is almost certainly becuase the vast majority of schools falls into that range. However, we can rule out a curvilinear relationship (in this case an upside-down U) by plotting these features with a regplot and adding the keyword argument order=2. Changing the order argument to a number greater than 1 prompts regplot to call np.polyfit under the hood and estimate a polynomial regression on our data. If these features did have a curvilinear relationship, the regression line would show it here.

In [39]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5), sharey=True)
sns.regplot(data=full, x='male_per', y='sat_score', order=2, ax=ax1, color='tab:blue', line_kws=dict(color='k'))
sns.regplot(data=full, x='female_per', y='sat_score', order=2, ax=ax2, color='tab:pink', line_kws=dict(color='k'))
plt.tight_layout()

Having completed our exploratory analysis, we can answer the first part of our initial inquiry:

  • Both District and Borough appear to be related to average SAT score, influenced by factors such as cultural and socioeconomic makeup.
  • There is likely no correlation between gender and SAT performance.
  • Cultural demographics are correlated to SAT performance, though in differing degrees:
    • Higher percentages of both Black and Hispanic students tend to correlate with lower SAT scores, while larger percentages of White and Asian students correlate with higher SAT scores. However, the metric for classifying schools with high Black/Hispanic students is different than that for White/Asian students: Many schools have well over 50 percent Black or Hispanic students (324 of 478), while the number of schools comprised of 50 percent White or Asian students is minimal (25 of 478; 79 schools are 30+ percent White or Asian).
    • Schools with 30+ percent White or Asian students and schools with 50+ percent Black or Hispanic students are mutually exclusive
    • Poor English proficiency seems to outweigh ethnic correlations — schools with high proportions of English Language Learners accounted for the lowest SAT scores regardless of racial split.
  • Poorer students and neighborhoods have the most consistent correlation to low scores. Percent of students receiving free or reduced-price lunches had a clear negative correlation with SAT scores, and the Bronx — New York's poorest borough — had the most low-scoring schools in the city.
  • The percentage of high-performing/advanced students is the best indicator of top SAT scores, and while schools with a greater percentage of special needs students tended to have lower scores, they were not as low as scores from International schools.

Strength and Significance of Correlations

Correlation to Numerical Features

To wrap up this project, we want to know the strength of the correlations we found and whether they are statistically significant. Earlier in this notebook we touched briefly on Pearson's Product-Moment Correlation and mentioned that it is unlikely to be accurate for our data. That is because two of the Pearson test's main assumptions are 1) bivariate normality; and 2) homoscedasticity of the data. This is a fancy way of saying that the two variables being compared need to have 1) a normal bell curve distribution; and 2) roughly equal variances.

We already know that our main variable, sat_score, has a skew-right distribution, so we automatically break that assumption. What about our other variables though? Is sat_score close enough to a normal distribution to use Pearson's r if our other variable is normally distributed? The following examples help illustrate the issue.

In [40]:
sns.jointplot(data=full, y='sat_score', x='sped_percent', kind='reg', color='r',  marginal_kws=dict(color='k'))
Out[40]:
<seaborn.axisgrid.JointGrid at 0x1eef62187f0>
In [41]:
sns.jointplot(data=full, y='sat_score', x='white_per', kind='reg', color='b',  marginal_kws=dict(color='k'))
Out[41]:
<seaborn.axisgrid.JointGrid at 0x1eef438b390>

Above, we have joint plots representing two of our variables, sped_percent and white_per, plotted against sat_score. The histograms on the side and top represent the shape of each variable. As we can see, sped_percent is much closer to a normal distribution than white_per, which is heavily skewed right. However, in both plots we see evidence of heteroscedasticity. The telltale sign of heteroscedasticity is data that fans out in a cone shape rather than a tube shape. The outliers we see in both plots due to unequal variances make it highly unlikely that Pearson's r will accurately tell us the strength of the correlation of either one.


Although this probably holds true for each of the features we are interested in, Pingouin makes it relatively painless to double check. Using pg.homoscedasticity and pg.multivariate_normality we can simply loop through our variables and print out the results:

In [42]:
neg_feats = ['frl_percent', 'Local - % of grads', 'sped_percent', 'ell_percent', 'hispanic_per', 'black_per']
pos_feats = ['pct_3_plus', 'asian_per', 'pct_AP', 'white_per', 'Advanced Regents - % of grads', 'Advanced Regents - % of cohort']

for x in neg_feats:
    print(x)
    print(pg.homoscedasticity(full[['sat_score', x]]), pg.multivariate_normality(full[['sat_score', x]]))
    print('\n')

for x in pos_feats:
    print(x)
    print(pg.homoscedasticity(full[['sat_score', x]]), pg.multivariate_normality(full[['sat_score', x]]))
    print('\n')
frl_percent
                   W  pval  equal_var
levene  3.093425e+30   0.0      False (False, 6.120765230391291e-25)


Local - % of grads
                   W  pval  equal_var
levene  3.222483e+30   0.0      False (False, 1.25375145341369e-29)


sped_percent
                   W  pval  equal_var
levene  3.311444e+30   0.0      False (False, 2.198084091158776e-27)


ell_percent
                   W  pval  equal_var
levene  3.108404e+30   0.0      False (False, 2.8912797050762057e-51)


hispanic_per
                   W  pval  equal_var
levene  3.306292e+30   0.0      False (False, 7.985685861620852e-25)


black_per
                   W  pval  equal_var
levene  3.563503e+30   0.0      False (False, 1.1467838914476226e-30)


pct_3_plus
                   W  pval  equal_var
levene  7.619586e+30   0.0      False (False, 6.135013853048197e-52)


asian_per
                   W  pval  equal_var
levene  2.701160e+30   0.0      False (False, 2.0947573843702347e-53)


pct_AP
                   W  pval  equal_var
levene  4.348325e+30   0.0      False (False, 7.291229071964563e-41)


white_per
                   W  pval  equal_var
levene  2.532577e+30   0.0      False (False, 8.316863964555129e-53)


Advanced Regents - % of grads
                   W  pval  equal_var
levene  2.986968e+30   0.0      False (False, 3.7261450769584264e-38)


Advanced Regents - % of cohort
                   W  pval  equal_var
levene  1.815248e+30   0.0      False (False, 4.156416769203743e-43)


Unsurprisingly, all of our features fail the tests for bivariate normality and homoscedasticity. This is not uncommon with real world data, which is rarely a perfect fit for theoretical statistical models out of the box. Fortunately, there are still ways to analyze correlation strength and significance within our dataset. Two of the most common ways are to use non-parametric statistical tests (tests that make fewer assumptions on the structure/distribution of data) or to transform the data using a power transform, such as a Box-Cox transformation to stabalize variance and bring the distribution closer to normal.

For our purposes, using a non-parametric test for analyzing our numeric features will be more straightforward than using power transforms. Since we know that several (if not all) of our features have outliers, we want to use a test that — unlike Pearson's r — is robust to such outliers. A relatively new test called Shepherd's pi has been shown to be more robust to bivariate outliers than Pearson's r, Spearman's rho and skipped correlation (r'). Once again, we can easily conduct this test on our data using Pingouin's pairwise_corr function.

In [43]:
shep_corr_neg = pg.pairwise_corr(full, columns=[['sat_score'], neg_feats], method='shepherd')
shep_corr_pos = pg.pairwise_corr(full, columns=[['sat_score'], pos_feats], method='shepherd')
all_shep_corrs = pd.concat([shep_corr_neg, shep_corr_pos]).sort_values(by='r').reset_index().drop(columns=['index'])

all_shep_corrs['p_doubled'] = all_shep_corrs['p-unc'].apply(lambda x: 1 if x*2 > 1 else x*2) # p-val x 2 to account for removing outliers

all_shep_corrs[['X', 'Y', 'outliers', 'r', 'CI95%', 'p-unc', 'p_doubled']]
Out[43]:
X Y outliers r CI95% p-unc p_doubled
0 sat_score sped_percent 42.0 -0.463 [-0.53, -0.39] 1.666789e-24 3.333578e-24
1 sat_score Local - % of grads 37.0 -0.454 [-0.52, -0.38] 8.124567e-24 1.624913e-23
2 sat_score frl_percent 34.0 -0.446 [-0.52, -0.37] 4.409605e-23 8.819210e-23
3 sat_score ell_percent 46.0 -0.309 [-0.39, -0.23] 4.827200e-11 9.654400e-11
4 sat_score black_per 26.0 -0.270 [-0.35, -0.19] 5.289608e-09 1.057922e-08
5 sat_score hispanic_per 25.0 -0.162 [-0.25, -0.07] 5.334939e-04 1.066988e-03
6 sat_score pct_AP 36.0 0.239 [0.15, 0.32] 3.562404e-07 7.124807e-07
7 sat_score pct_3_plus 39.0 0.393 [0.31, 0.47] 1.192349e-17 2.384697e-17
8 sat_score Advanced Regents - % of grads 37.0 0.503 [0.43, 0.57] 1.172161e-29 2.344321e-29
9 sat_score white_per 40.0 0.508 [0.44, 0.57] 3.566473e-30 7.132945e-30
10 sat_score Advanced Regents - % of cohort 36.0 0.536 [0.47, 0.6] 3.176937e-34 6.353874e-34
11 sat_score asian_per 42.0 0.566 [0.5, 0.62] 2.855790e-38 5.711580e-38

As we can see, the Shepherd's pi correlation identified a number of outliers in each of the features we analyzed. However, even after removing those outliers from the analysis, all 12 of our features have a statistically significant correlation (p_doubled < 0.05), including those with the weakest correlation (black_per, hispanic_per and pct_AP).

To visualize how the Shepherd's pi correlation flags outliers, we can take a look at the Pingouin source code and use the shepherd function to create columns in our full dataset that correspond to the outliers. The code below does the following:

  • Imports underlying shepherd function from Pingouin
  • Loops through each of our top correlating features, passing each through shepherd() and saving the list of outliers as a new column in the full dataset
  • Creates a list of the outlier column names
  • Plots each regplot using our graph_corrs function, where the list of outlier column names is used to create a custom color map for the scatter plots: red indicates outliers, blue for everything else.
In [44]:
from pingouin.correlation import shepherd

outlier_columns = []

for feat in all_shep_corrs['Y']:
    shep_r, shep_pval, outliers = shepherd(full[feat], full['sat_score'])
    full['{}_outliers'.format(feat)] = outliers
    outlier_columns.append('{}_outliers'.format(feat))
In [45]:
# First 10 rows of outlier info
full[outlier_columns].head(10)
Out[45]:
sped_percent_outliers Local - % of grads_outliers frl_percent_outliers ell_percent_outliers black_per_outliers hispanic_per_outliers pct_AP_outliers pct_3_plus_outliers Advanced Regents - % of grads_outliers white_per_outliers Advanced Regents - % of cohort_outliers asian_per_outliers
0 False False False False False False False False False False False False
1 False False False False False False False False False False False False
2 False False False False False False False False False False False False
3 False False False False False False False False False False False False
4 False False False False False False False False False False False False
5 False False False True False False False True True False False True
6 True True True True True True True True True True True True
7 False False False False False False False False False False False False
8 True True True True True True True True True True True True
9 True False False False False False False False False False False False
In [46]:
graph_corrs(full, all_shep_corrs, coef='pi', outliers=outlier_columns)

Correlation to District and Borough

We began this notebook by using maps and boxplots to view correlations between sat_score and both District and borough, so it is only fitting to end by looking at the significance of those correlations. However, boxplots and maps alone can't indicate statistical significance. Because our District and borough columns are categorical rather than numerical, we cannot use the same Shepherd's pi method we used for our other features. Instead we will need to do an analysis of variance (ANOVA) test, which determines whether the difference between the means of three or more groups are statistically significant from one another.

Although the traditional ANOVA (or F-test) is robust against the assumption of normal distributions in our data, much like Pearson's correlation it is unreliable when the groups of data have unequal variances. Fortunately, another ANOVA — Welch's ANOVA — is not sensitive to unequal variances and actually tends to be more accurate than the traditional ANOVA in most circumstances.

For our analysis, we will use Welch's ANOVA to test whether there is a significant difference between SAT scores in any two districts or boroughs, followed by the Games-Howell post-hoc test to determine pair(s) are significantly different from one another.

Our null hypothesis is that there are no significant differences between the groups' means: $H_0 : \mu_1 = \mu_2 = \mu_3 \cdots \mu_k$, where $\mu$ = group mean and $k$ = number of groups.

Our alternative hypothesis is that there is a significant difference between group means: $H_a : \mu_i \neq \mu_j$, where $\mu_i$ and $\mu_j$ can be the mean of any group.

If there is at least one group with a significant difference from another group (p < 0.05), we will reject our null hypothesis.

Using Pingouin's welch_anova() function to analyze both districts and boroughs, we see that our p-values are less than 0.05 and we can thus reject the null hypothesis for both.

In [47]:
pg.welch_anova(dv='sat_score', between='District', data=full)
Out[47]:
Source ddof1 ddof2 F p-unc
0 District 33 99.517 3.664 3.122246e-07
In [48]:
pg.welch_anova(dv='sat_score', between='borough', data=full)
Out[48]:
Source ddof1 ddof2 F p-unc
0 borough 4 77.497 12.508 6.722958e-08

When we run our post-hoc Games-Howell test on pairs of districts, we see that our p-values indicate significance mostly between pairs containing District 02. We also see that the effect size (hedges) is very large, even for those district pairs that were not found to be significantly different. For example, District 16 vs. District 31 receieved a p-value of 0.118, even though it has an effect size of 1.9 (indicating 79.4% non-overlap between the two).

So how can we explain this pair of districts not meeting statistical significance? We simply do not have enough schools in each district to be sure for most district pairs. If we take a look at the number of schools in each district, we see that District 02 has the most (65), with District 10 coming in second (28), and 29 of our 34 districts with fewer than 20 schools. With group sizes so low, it is important to remember that a p-value greater than 0.05 doesn't necessarily mean there is no statistical significance — it means that we cannot confidently reject our null hypothesis with the data we have.

In [49]:
district_gh = pg.pairwise_gameshowell(data=full, dv='sat_score', between='District').sort_values(by='pval')
district_gh[district_gh['pval'] < 0.15] # Top 20 of over 500 different pairs
Out[49]:
A B mean(A) mean(B) diff se tail T df pval hedges
74 DISTRICT 02 DISTRICT 12 1276.585 1107.722 168.862 21.858 two-sided 5.463 62.674 0.001000 1.441
80 DISTRICT 02 DISTRICT 18 1276.585 1123.286 153.299 19.704 two-sided 5.501 68.744 0.001000 1.605
71 DISTRICT 02 DISTRICT 09 1276.585 1128.636 147.948 22.383 two-sided 4.674 68.783 0.003125 1.143
81 DISTRICT 02 DISTRICT 19 1276.585 1118.714 157.870 23.811 two-sided 4.688 41.013 0.004563 1.368
69 DISTRICT 02 DISTRICT 07 1276.585 1141.176 135.408 20.901 two-sided 4.581 66.943 0.004629 1.236
70 DISTRICT 02 DISTRICT 08 1276.585 1159.810 116.775 18.429 two-sided 4.481 83.271 0.006096 1.115
455 DISTRICT 18 SPECIAL ED DISTRICT 75 1123.286 1222.000 -98.714 14.627 two-sided -4.772 17.373 0.010279 -2.121
350 DISTRICT 12 SPECIAL ED DISTRICT 75 1107.722 1222.000 -114.278 17.421 two-sided -4.638 22.550 0.010710 -1.998
78 DISTRICT 02 DISTRICT 16 1276.585 1111.600 164.985 23.994 two-sided 4.862 13.711 0.011632 2.231
73 DISTRICT 02 DISTRICT 11 1276.585 1163.211 113.374 20.317 two-sided 3.946 75.562 0.043497 1.020
85 DISTRICT 02 DISTRICT 23 1276.585 1085.167 191.418 30.248 two-sided 4.475 9.995 0.044086 1.888
348 DISTRICT 12 DISTRICT 31 1107.722 1359.167 -251.444 44.336 two-sided -4.010 13.598 0.087499 -1.454
513 DISTRICT 23 DISTRICT 31 1085.167 1359.167 -274.000 49.019 two-sided -3.952 15.850 0.089291 -1.882
337 DISTRICT 12 DISTRICT 20 1107.722 1247.333 -139.611 24.899 two-sided -3.965 10.433 0.113031 -1.805
422 DISTRICT 16 DISTRICT 31 1111.600 1359.167 -247.567 45.427 two-sided -3.854 13.977 0.118128 -1.947
453 DISTRICT 18 DISTRICT 31 1123.286 1359.167 -235.881 43.315 two-sided -3.851 12.451 0.125475 -1.467
284 DISTRICT 09 SPECIAL ED DISTRICT 75 1128.636 1222.000 -93.364 18.075 two-sided -3.652 26.088 0.137700 -1.541
467 DISTRICT 19 DISTRICT 31 1118.714 1359.167 -240.452 45.331 two-sided -3.751 14.644 0.140433 -1.429
469 DISTRICT 19 SPECIAL ED DISTRICT 75 1118.714 1222.000 -103.286 19.816 two-sided -3.686 18.622 0.144867 -1.638
94 DISTRICT 02 DISTRICT 32 1276.585 1107.286 169.299 31.595 two-sided 3.789 11.198 0.148178 1.491
In [50]:
full['DBN'].groupby(full['District']).count()
Out[50]:
District
ALTERNATIVE DISTRICT 79     3
DISTRICT 01                 9
DISTRICT 02                65
DISTRICT 03                17
DISTRICT 04                 7
DISTRICT 05                10
DISTRICT 06                11
DISTRICT 07                17
DISTRICT 08                21
DISTRICT 09                22
DISTRICT 10                28
DISTRICT 11                19
DISTRICT 12                18
DISTRICT 13                18
DISTRICT 14                16
DISTRICT 15                13
DISTRICT 16                 5
DISTRICT 17                20
DISTRICT 18                14
DISTRICT 19                14
DISTRICT 20                 6
DISTRICT 21                13
DISTRICT 22                 5
DISTRICT 23                 6
DISTRICT 24                15
DISTRICT 25                11
DISTRICT 26                 5
DISTRICT 27                11
DISTRICT 28                14
DISTRICT 29                10
DISTRICT 30                 9
DISTRICT 31                12
DISTRICT 32                 7
SPECIAL ED DISTRICT 75      7
Name: DBN, dtype: int64

We are, however, in luck when it comes to our boroughs. The results of the Games-Howell on borough match up to what we see in our boxplots (re-illustrated below):

  • Bronx and Brooklyn are both significantly different from Manhattan, Queens and Staten Island.
  • Bronx and Brooklyn are not significantly different from each other.
  • There is no significant difference between any combination of Manhattan, Queens and Staten Island.
In [51]:
boro_gh = pg.pairwise_gameshowell(data=full, dv='sat_score', between='borough').sort_values(by='pval')
boro_gh
Out[51]:
A B mean(A) mean(B) diff se tail T df pval hedges
1 Bronx Manhattan 1153.500 1253.711 -100.211 14.665 two-sided -4.832 210.263 0.001000 -0.611
2 Bronx Queens 1153.500 1268.462 -114.962 15.695 two-sided -5.179 132.512 0.001000 -0.741
4 Brooklyn Manhattan 1171.775 1253.711 -81.935 14.511 two-sided -3.993 207.857 0.001000 -0.496
5 Brooklyn Queens 1171.775 1268.462 -96.686 15.551 two-sided -4.396 129.613 0.001000 -0.621
3 Bronx Staten Island 1153.500 1347.538 -194.038 40.282 two-sided -3.406 13.031 0.011764 -0.986
6 Brooklyn Staten Island 1171.775 1347.538 -175.763 40.227 two-sided -3.090 12.960 0.027973 -0.892
8 Manhattan Staten Island 1253.711 1347.538 -93.828 41.309 two-sided -1.606 14.400 0.492869 -0.466
9 Queens Staten Island 1268.462 1347.538 -79.077 41.686 two-sided -1.341 14.915 0.638245 -0.398
0 Bronx Brooklyn 1153.500 1171.775 -18.275 11.261 two-sided -1.148 262.592 0.834129 -0.140
7 Manhattan Queens 1253.711 1268.462 -14.751 18.168 two-sided -0.574 178.623 0.900000 -0.083
In [52]:
f, ax = plt.subplots(figsize=(16, 12))
ax = sns.boxplot(data=full, x='sat_score', y='borough', palette='muted', showmeans=True, meanline=True)
plt.show()
In [53]:
all_shep_corrs[['X', 'Y', 'outliers', 'r', 'CI95%', 'p-unc', 'p_doubled']]
Out[53]:
X Y outliers r CI95% p-unc p_doubled
0 sat_score sped_percent 42.0 -0.463 [-0.53, -0.39] 1.666789e-24 3.333578e-24
1 sat_score Local - % of grads 37.0 -0.454 [-0.52, -0.38] 8.124567e-24 1.624913e-23
2 sat_score frl_percent 34.0 -0.446 [-0.52, -0.37] 4.409605e-23 8.819210e-23
3 sat_score ell_percent 46.0 -0.309 [-0.39, -0.23] 4.827200e-11 9.654400e-11
4 sat_score black_per 26.0 -0.270 [-0.35, -0.19] 5.289608e-09 1.057922e-08
5 sat_score hispanic_per 25.0 -0.162 [-0.25, -0.07] 5.334939e-04 1.066988e-03
6 sat_score pct_AP 36.0 0.239 [0.15, 0.32] 3.562404e-07 7.124807e-07
7 sat_score pct_3_plus 39.0 0.393 [0.31, 0.47] 1.192349e-17 2.384697e-17
8 sat_score Advanced Regents - % of grads 37.0 0.503 [0.43, 0.57] 1.172161e-29 2.344321e-29
9 sat_score white_per 40.0 0.508 [0.44, 0.57] 3.566473e-30 7.132945e-30
10 sat_score Advanced Regents - % of cohort 36.0 0.536 [0.47, 0.6] 3.176937e-34 6.353874e-34
11 sat_score asian_per 42.0 0.566 [0.5, 0.62] 2.855790e-38 5.711580e-38

Results

Despite the fact that this data was inherently difficult to fit into existing statistical models, our analysis uncovered some excellent insights into possible contributing factors for SAT performance across New York City. Circling back to the inquiries that motivated this project, we find that significant correlations to SAT score were found in nearly every demographic we set out to explore.

  1. Cultural and racial differences in schools showed a strong positive relationship between White- and Asian-heavy schools, with a weaker negative relationship to primarily Black or Hispanic schools. The percentage of English Language Learners also had a moderate negative correlation to the SAT.

  2. The proportion of high-achieving students also had a strong positive correlation to SAT score, though more so when looking to the percentage of students who acheived an Advanced Regents diploma than those who took or excelled on AP tests. Conversely, we also saw a moderately strong negative correlation to the percentage of both Special Ed and Local diploma earners.

  3. For socioeconomic demographics, we saw a strong negative correlation to the percentage of students receiving free or reduced-price lunch. The city's poorest borough, the Bronx, also had the lowest mean and median SAT scores in our dataset.

  4. The Bronx and Brooklyn had significantly different (lower) means from the other three boroughs, with medium to large effect sizes for each pair. Districts, on the other hand showed very large effect sizes, but were mostly too small to confidently say their difference in means was significant.

Only a school's gender split proved to be virtually unrelated to its SAT score, which makes sense as it is the least intertwined of all the features we examined.

Limitations

Although this project paints a compelling narrative for what lies behind the scenes of a school's average SAT score, there are numerous reasons why it should not be relied on for prediction purposes and why it should reassessed in the future by other analysts:

  1. First, this data is nearly a decade old now. NYC Open Data hasn't released newer data on SAT results, so there is no way to know whether these schools are still performing at the level they did in 2012.

  2. Second, the underlying data was compiled from several different sources, which did not all contain the same schools. Rather than drop all our missing values, we imputed values that were missing for schools that were in our sat_results data. In doing so, we preserved our sample size at the cost of some amount of accuracy.

  3. We also had no ideal way to measure statistical significance due to the highly varied sample sizes, data shapes and standard deviations. Although Shepherd's pi analysis was more accurate than Pearson's r, a careful look at the outliers it excluded from analysis were virtually all of the International schools — a highly informative, yet small portion of our data.

  4. Finally, it would be nice to include more socioeconomic demographics. Of all the features in our data, only frl_percent directly addressed student body affluence (or lack thereof). It is entirely possible that wealth plays a larger part than this analysis reveals, as students with means often have an advantage when it comes to test preparation.