Tableau Visualization (Check out my interactive Tableau viz here)

Exploring SAT Scores of New York City: Visualization & Analysis¶

Part One of this project focused entirely on preparing our data for analysis. While not the most exciting task, it is necessary to some degree for nearly every real-world data set. For the second half of this project, our goal is to understand how differences in school, borough and district relate to a school's mean SAT score and whether these differences actually have any meaningful correlation to the test scores. We will utilize several visualizations, some basic correlation analysis and hypothesis testing to help us discover trends and examine the significance of those trends.

Preliminary Analysis¶

Up to this point, we haven't set any research questions for our project other than the broad inquiry of whether we can identify any outside factors influencing SAT scores in New York City. Despite having only explored datasets in isolation, we should define some narrower research questions at this point in order to keep our analysis focused. As with all research, we may discover interesting findings outside of our intended scope along the way; however, having concrete questions that we aim to answer will ensure that we do not get lost amidst the numerous possibilities. In this notebook we will explore five potential influencing demographics: sex/gender, cultural, socioeconomic, concentration of advanced/high-achieving students, and physical location of the school.

The research questions we will attempt to answer are as follows:

Is a school's average SAT score correlated to... :

...the location of a school (borough or district)?
...the proportion of males to females attending that school?
...any cultural aspects of its student population (e.g., ethnicity, English proficiency)?
...the socioeconomic spread of its students?
...the proportion of high-achieving or advanced placement students?

Are any of these correlations strong and significant enough to warrant looking into further?

Setting the Stage with Maps¶

Just as in our first notebook, we know we are likely to use NumPy and Pandas, so we will start by importing those and setting up Pandas to display up to 500 rows or columns of a table. This time, rather than having to read in nine separate datasets, we only need to take a look at our clean data.

In [1]:

import numpy as np
import pandas as pd

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

full = pd.read_csv('sat_data_clean.csv')
full.head()

Out[1]:

	DBN	SCHOOL NAME	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score	sat_score	AP Test Takers	Total Exams Taken	Number of Exams with scores 3 4 or 5	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	schoolyear	frl_percent	total_enrollment	grade9	grade10	grade11	grade12	ell_num	ell_percent	sped_num	sped_percent	ctt_num	selfcontained_num	asian_num	asian_per	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per	Total Cohort	Total Grads - % of cohort	Total Regents - % of cohort	Total Regents - % of grads	Advanced Regents - % of cohort	Advanced Regents - % of grads	Regents w/o Advanced - % of cohort	Regents w/o Advanced - % of grads	Local - % of cohort	Local - % of grads	Still Enrolled - % of cohort	Dropped Out - % of cohort	borough	grade_span_min	grade_span_max	city	postcode	total_students	school_type	language_classes	advancedplacement_courses	online_ap_courses	online_language_courses	start_time	end_time	number_programs	Location 1	Community Board	Council District	lat	lon	has_lang	has_ap	has_online_lang	has_online_ap	has_spanish	has_french	has_chinese	rr_s	rr_t	rr_p	N_s	N_t	N_p	saf_p_11	com_p_11	eng_p_11	aca_p_11	saf_t_11	com_t_11	eng_t_11	aca_t_11	saf_s_11	com_s_11	eng_s_11	aca_s_11	saf_tot_11	com_tot_11	eng_tot_11	aca_tot_11	school_dist	District	YTD % Attendance (Avg)	YTD Enrollment(Avg)	is_consort
0	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	29.0	355.0	404.0	363.0	1122.0	0.0	0.0	0.0	65.384615	2.923077	22.700000	20.115385	25.307692	20112012.0	88.6	422.0	98.0	79.0	80.0	50.000000	94.0	22.3	105.0	24.9	34.0	35.0	59.0	14.0	123.0	29.1	227.0	53.8	7.0	1.7	259.0	61.4	163.0	38.6	56.000000	61.500000	41.675000	69.125000	0.000000	0.000000	41.675000	69.125000	19.850000	30.875000	20.275000	11.950000	Manhattan	6.000000	12.0	New York	10002.00000	323.00000	Traditional	Chinese (Mandarin), Spanish	Psychology	Chinese Language and Culture, Spanish Literatu...	Chinese (Mandarin), Spanish	830.000000	330.000000	1.000000	220 Henry Street\nNew York, NY 10002\n(40.7137...	3.000000	1.000000	40.713764	-73.985260	1.0	1.0	1.0	1.0	1.0	0.0	1.0	89.0	70.0	39.0	379.000000	26.0	151.0	7.8	7.7	7.4	7.6	6.3	5.3	6.1	6.5	6.000000	5.600000	6.100000	6.700000	6.7	6.2	6.6	7.0	1	DISTRICT 01	91.18	12367	0
1	01M448	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	91.0	383.0	423.0	366.0	1172.0	39.0	49.0	10.0	103.923077	4.423077	23.800000	20.115385	28.000000	20112012.0	71.8	394.0	109.0	97.0	93.0	95.000000	83.0	21.1	86.0	21.8	55.0	10.0	115.0	29.2	89.0	22.6	181.0	45.9	9.0	2.3	226.0	57.4	168.0	42.6	97.714286	60.485714	37.157143	62.471429	8.657143	14.071429	28.528571	48.385714	23.314286	37.528571	26.957143	9.828571	Manhattan	9.000000	12.0	New York	10002.00000	299.00000	Traditional	Chinese, Spanish	Calculus AB, Chinese Language and Culture, Eng...	No courses	Chinese (Cantonese), Chinese (Mandarin), Spanish	815.000000	315.000000	3.000000	200 Monroe Street\nNew York, NY 10002\n(40.712...	3.000000	1.000000	40.712332	-73.984797	1.0	1.0	1.0	0.0	1.0	0.0	1.0	84.0	95.0	10.0	385.000000	37.0	46.0	7.9	7.4	7.2	7.3	6.6	5.8	6.6	7.3	6.000000	5.700000	6.300000	7.000000	6.8	6.3	6.7	7.2	1	DISTRICT 01	91.18	12367	0
2	01M450	EAST SIDE COMMUNITY SCHOOL	70.0	377.0	402.0	370.0	1149.0	19.0	21.0	0.0	53.535714	2.464286	21.928571	20.464286	23.250000	20112012.0	71.8	598.0	101.0	93.0	77.0	86.000000	30.0	5.0	158.0	26.4	91.0	19.0	58.0	9.7	143.0	23.9	331.0	55.4	62.0	10.4	327.0	54.7	271.0	45.3	79.571429	70.385714	66.000000	93.828571	0.000000	0.000000	66.000000	93.828571	4.357143	6.171429	17.614286	10.742857	Manhattan	6.000000	12.0	New York	10009.00000	649.00000	Consortium School	No Language Classes	Calculus AB, English Literature and Composition	No courses	American Sign Language, Arabic, Chinese (Manda...	830.000000	330.000000	1.000000	420 East 12 Street\nNew York, NY 10009\n(40.72...	3.000000	2.000000	40.729783	-73.983041	0.0	1.0	1.0	0.0	0.0	0.0	0.0	0.0	98.0	28.0	516.257511	42.0	150.0	8.7	8.2	8.1	8.4	7.3	8.0	8.0	8.8	6.725751	6.166953	6.719313	7.429828	7.9	7.9	7.9	8.4	1	DISTRICT 01	91.18	12367	1
3	01M458	FORSYTH SATELLITE ACADEMY	7.0	414.0	401.0	359.0	1174.0	0.0	0.0	0.0	28.600000	1.200000	23.000000	22.600000	23.400000	20112012.0	72.8	224.0	131.0	49.0	44.0	147.334928	9.0	4.0	20.0	8.9	3.0	0.0	5.0	2.2	77.0	34.4	133.0	59.4	8.0	3.6	97.0	43.3	127.0	56.7	175.547166	62.915625	45.121491	66.876392	10.910809	14.105702	34.210494	52.770587	17.803826	33.140172	24.645990	9.779497	Manhattan	8.457766	12.0	NaN	10725.96477	772.02168	Traditional	No courses	No courses	No courses	No courses	816.252033	315.111111	1.821138	NaN	6.782016	22.237057	40.719022	-73.982377	0.0	0.0	0.0	0.0	0.0	0.0	0.0	40.0	100.0	23.0	66.000000	10.0	37.0	8.1	7.0	6.7	7.6	8.5	8.2	8.9	8.9	6.800000	6.100000	6.100000	6.800000	7.8	7.1	7.2	7.8	1	DISTRICT 01	91.18	12367	0
4	01M509	MARTA VALLE HIGH SCHOOL	44.0	390.0	433.0	384.0	1207.0	0.0	0.0	0.0	49.851852	2.296296	19.370370	17.370370	21.481481	20112012.0	80.7	367.0	143.0	100.0	51.0	73.000000	41.0	11.2	95.0	25.9	28.0	36.0	34.0	9.3	116.0	31.6	209.0	56.9	6.0	1.6	170.0	46.3	197.0	53.7	73.571429	49.914286	31.385714	61.157143	10.571429	19.628571	20.814286	41.514286	18.514286	38.842857	29.857143	14.342857	Manhattan	9.000000	12.0	New York	10002.00000	401.00000	Traditional	French, Spanish	English Literature and Composition, Studio Art...	No courses	Spanish	800.000000	330.000000	1.000000	145 Stanton Street\nNew York, NY 10002\n(40.72...	3.000000	1.000000	40.720569	-73.985673	1.0	1.0	1.0	0.0	1.0	1.0	0.0	90.0	100.0	21.0	306.000000	29.0	69.0	7.7	7.4	7.2	7.3	6.4	5.3	6.1	6.8	6.400000	5.900000	6.400000	7.000000	6.9	6.2	6.6	7.0	1	DISTRICT 01	91.18	12367	0

To this point, we have only worked with numbers and words. However, data is often best understood and communicated through images. Since we have geodata at our disposal, we can draw up some maps to gain a 20,000-foot overview of the schools and school districts that is difficult to picture by simply reading the numbers.

For this task we will use Folium, a Python library that allows us to create interactive maps with the power of the leaflet.js JavaScript library. Our first map will be a map of New York City with a marker identifying each school. If the map is zoomed in, we can see individual schools, while zooming out displays clusters that indicate the number of schools in the area.

In [2]:

import folium
from folium import plugins # Needed for adding a marker cluster & heatmap 

schools_map = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
marker_cluster = plugins.MarkerCluster().add_to(schools_map)
for name, row in full.iterrows():
    # add markers that display DBN and school name when clicked
    folium.Marker([row['lat'], row['lon']], popup='{0}: {1}'.format(row['DBN'], row['SCHOOL NAME'])).add_to(marker_cluster)

schools_map.save('schools.html')
schools_map

Out[2]:

We can also create a heatmap with Folium for a different view of the concentration of schools across the city. Once again, zooming in provides a finer level of detail.

In [3]:

schools_heatmap = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
schools_heatmap.add_child(plugins.HeatMap([[row["lat"], row["lon"]] for name, row in full.iterrows()]))
schools_heatmap.save("heatmap.html")
schools_heatmap

Out[3]:

Now that we have a general idea of where schools are located throughout the city, we can take a closer look at where each district lies. To do this, we will create a map using the GeoJSON file we downloaded at the beginning of our first notebook.

One advantage that Folium provides is the ability to apply a fully-integrated choropleth layer over our map. Using our GeoJSON file to define the district boundaries within the choropleth, we can view values of any column from our data in a color spectrum on our map. However, in order to do this, we need to re-format our data to match the format of our JSON data. Opening up the GeoJSON file we see that each district is indicated by a string containing the (non-zero-padded) district number.

{"type":"Feature","properties":{"school_dist":"6"...

The following code groups our full dataset by the school district, converts our school_dist column into strings and strips the zeros from in front of districts 1 through 9.

In [4]:

district_data = full.groupby('school_dist').mean().reset_index()
district_data['school_dist'] = district_data['school_dist'].apply(lambda x: str(int(x)))
district_data.to_csv('district_data.csv', index=False) # Saving to CSV for later use in Tableau
district_data

Out[4]:

	school_dist	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score	sat_score	is_suppressed	AP Test Takers	Total Exams Taken	Number of Exams with scores 3 4 or 5	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	schoolyear	frl_percent	total_enrollment	grade9	grade10	grade11	grade12	ell_num	ell_percent	sped_num	sped_percent	ctt_num	selfcontained_num	asian_num	asian_per	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per	Total Cohort	Total Grads - % of cohort	Total Regents - % of cohort	Total Regents - % of grads	Advanced Regents - % of cohort	Advanced Regents - % of grads	Regents w/o Advanced - % of cohort	Regents w/o Advanced - % of grads	Local - % of cohort	Local - % of grads	Still Enrolled - % of cohort	Dropped Out - % of cohort	grade_span_min	grade_span_max	postcode	total_students	start_time	end_time	number_programs	Community Board	Council District	lat	lon	has_lang	has_ap	has_online_lang	has_online_ap	has_spanish	has_french	has_chinese	has_russian	rr_s	rr_t	rr_p	N_s	N_t	N_p	saf_p_11	com_p_11	eng_p_11	aca_p_11	saf_t_11	com_t_11	eng_t_11	aca_t_11	saf_s_11	com_s_11	eng_s_11	aca_s_11	saf_tot_11	com_tot_11	eng_tot_11	aca_tot_11	YTD % Attendance (Avg)	YTD Enrollment(Avg)	is_consort	is_CTE	is_allgirls	is_intl	is_consort_intl	is_specialized	is_allboys
0	1	73.333333	423.777778	468.444444	413.555556	1305.777778	0.000000	37.444444	52.555556	25.000000	96.341444	4.242394	22.438021	19.084220	25.581993	20112012.0	63.722222	557.222222	120.788457	117.000000	98.666667	100.926103	80.222222	16.422222	59.333333	13.277778	23.777778	11.333333	143.222222	21.388889	106.444444	24.333333	175.555556	40.244444	126.444444	13.255556	272.777778	48.577778	284.444444	51.422222	101.886193	65.130942	52.135483	74.430631	14.394217	19.026348	37.745055	55.399510	13.000901	25.588670	23.471221	9.189309	8.092340	12.0	10244.099368	697.007227	826.528455	333.370370	1.495935	4.260672	8.412352	40.719022	-73.982377	0.555556	0.555556	0.444444	0.111111	0.555556	0.222222	0.444444	0.000000	74.333333	88.666667	36.333333	412.806390	32.111111	210.111111	8.311111	7.711111	7.588889	7.977778	7.511111	6.711111	7.344444	7.966667	7.069528	6.262995	6.835479	7.469981	7.688889	6.955556	7.300000	7.833333	91.18	12367.0	0.111111	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
1	2	95.215385	421.584615	438.292308	416.707692	1276.584615	0.107692	57.800000	90.461538	61.261538	117.518143	4.617674	24.057153	20.407704	27.104752	20112012.0	63.734085	587.791556	168.652068	163.679870	141.849770	132.143762	66.172991	14.974137	69.735521	12.796629	33.063881	13.984510	102.799009	13.738872	143.916752	25.882137	268.596103	48.605874	65.752444	10.462198	267.564821	46.200168	320.226701	53.799819	159.481007	65.260013	47.308477	67.554182	12.139508	14.893343	35.170046	52.661096	17.964940	32.470529	22.565430	9.715520	8.690254	12.0	10185.284178	655.235772	825.242777	317.733333	1.497186	4.838650	7.381429	40.739540	-73.991099	0.738462	0.630769	0.123077	0.123077	0.676923	0.138462	0.076923	0.000000	79.600000	85.061538	37.553846	456.573192	31.507692	183.000000	8.296923	7.578462	7.458462	7.789231	7.444615	6.592308	7.206154	7.627692	7.057319	6.347184	6.804913	7.486613	7.593846	6.824615	7.152308	7.629231	89.01	60823.0	0.107692	0.092308	0.015385	0.015385	0.015385	0.015385	0.000000
2	3	83.647059	419.235294	422.941176	411.647059	1253.823529	0.235294	59.941176	95.588235	55.411765	123.533350	4.830710	22.370030	18.967461	25.351618	20112012.0	59.542092	594.732418	167.535065	162.423032	146.153971	140.667042	40.896732	9.948170	63.518170	14.257699	20.257888	15.941043	61.407974	8.195686	172.858170	34.708170	224.102745	45.628340	127.876993	10.443699	249.806667	47.012405	344.925621	52.987542	180.200594	62.733718	47.259706	69.996154	10.440779	12.987728	36.813394	57.011203	15.472511	30.008920	25.572306	8.886548	8.487578	12.0	10230.283756	733.829906	819.191774	308.326797	1.947394	7.288828	11.128546	40.781574	-73.977370	0.705882	0.588235	0.117647	0.000000	0.647059	0.294118	0.058824	0.000000	76.235294	78.352941	28.764706	425.352941	25.764706	168.710523	8.207187	7.497328	7.479142	7.750270	6.788235	6.041176	6.605882	7.052941	6.776471	6.188235	6.641176	7.335294	7.247059	6.564706	6.923529	7.394118	89.28	21962.0	0.058824	0.058824	0.000000	0.000000	0.000000	0.058824	0.000000
3	4	99.428571	393.142857	405.285714	392.714286	1191.142857	0.000000	55.285714	71.285714	41.428571	93.291096	3.764391	23.987716	20.732331	26.678992	20112012.0	70.342857	532.571429	133.285714	148.000000	119.428571	108.857143	23.857143	6.171429	59.428571	14.942857	26.714286	11.428571	59.000000	6.000000	139.285714	29.514286	321.000000	62.185714	8.857143	1.385714	210.285714	39.371429	322.285714	60.628571	113.755102	67.693878	44.028571	61.207143	11.591837	14.705102	32.444898	46.502041	23.669388	38.792857	19.652041	9.871429	8.493967	12.0	10129.423539	624.860240	832.321719	310.587302	1.260163	10.397431	10.033865	40.793572	-73.942534	0.857143	0.857143	0.142857	0.000000	0.857143	0.142857	0.142857	0.000000	88.857143	92.571429	46.000000	484.142857	31.714286	191.714286	8.185714	7.542857	7.400000	7.800000	7.514286	6.614286	7.014286	7.471429	6.728571	6.014286	6.557143	7.371429	7.471429	6.728571	6.971429	7.542857	91.13	14252.0	0.000000	0.000000	0.142857	0.000000	0.000000	0.000000	0.000000
4	5	63.600000	406.700000	410.000000	400.500000	1217.200000	0.100000	34.800000	42.200000	23.000000	90.196566	3.739961	23.214686	20.062172	26.188577	20112012.0	63.200000	533.400000	130.100000	121.600000	102.630876	93.433493	27.100000	5.820000	59.800000	11.810000	26.538409	18.699773	23.300000	5.230000	307.900000	53.450000	176.700000	35.830000	22.500000	4.870000	258.100000	46.720000	275.300000	53.280000	105.890386	63.161815	48.935012	70.744921	13.426567	16.826736	35.506504	53.919594	14.232075	29.258392	22.605269	10.098399	7.937330	12.0	10238.889431	658.506504	812.875610	327.033333	1.346341	8.634605	12.371117	40.817077	-73.949251	0.700000	0.600000	0.000000	0.000000	0.700000	0.300000	0.000000	0.000000	81.700000	79.200000	38.500000	387.300000	25.400000	165.900000	8.200000	7.470000	7.450000	7.700000	6.850000	6.100000	6.490000	7.080000	6.270000	5.920000	6.380000	7.210000	7.100000	6.510000	6.770000	7.340000	89.08	13170.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.100000	0.000000
5	6	66.818182	382.818182	396.000000	378.272727	1157.090909	0.090909	51.909091	73.545455	26.090909	117.696183	4.579324	24.731261	20.856773	28.028701	20112012.0	82.637778	629.859192	169.281465	157.017412	127.664432	122.060896	158.385859	28.501717	81.255354	14.870990	35.706198	20.726860	17.539596	1.593333	94.689899	13.939899	501.431515	82.480162	12.900808	1.349354	335.883030	53.400990	293.975960	46.598929	168.690270	65.100609	43.938340	64.956802	9.115303	12.377916	34.826848	52.582734	21.152389	35.035143	21.574151	10.324447	8.132524	12.0	10098.905888	641.911062	808.295639	325.010101	1.347376	10.778548	11.861283	40.848970	-73.932502	0.909091	0.818182	0.000000	0.090909	0.909091	0.090909	0.000000	0.000000	83.000000	80.818182	53.181818	450.454545	31.363636	244.909091	8.636364	8.127273	7.900000	8.181818	7.627273	6.800000	7.218182	7.772727	7.090909	6.227273	6.818182	7.600000	7.790909	7.054545	7.309091	7.872727	91.34	25733.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
6	7	49.941176	380.764706	385.411765	375.000000	1141.176471	0.117647	22.941176	35.470588	8.294118	83.143887	3.624158	22.618375	19.326739	25.450079	20112012.0	77.976471	434.764706	105.647059	108.470588	88.764706	88.235294	62.941176	14.517647	77.176471	16.205882	29.058824	27.588235	6.764706	1.494118	134.235294	31.305882	287.705882	65.711765	3.411765	0.805882	241.176471	51.535294	193.588235	48.464706	97.800534	56.699765	35.330119	54.655052	6.278709	8.772800	29.050324	45.885097	21.374092	45.353620	28.415565	10.918708	8.343004	12.0	10516.991711	537.887454	820.588714	314.026144	1.546150	2.889886	15.055778	40.816815	-73.919971	0.705882	0.647059	0.176471	0.176471	0.529412	0.176471	0.000000	0.000000	77.882353	79.235294	40.058824	332.588235	24.705882	157.117647	8.476471	7.817647	7.641176	7.976471	7.017647	6.282353	6.823529	7.341176	6.917647	6.323529	6.811765	7.547059	7.470588	6.800000	7.094118	7.617647	86.75	19717.0	0.117647	0.058824	0.000000	0.058824	0.000000	0.000000	0.000000
7	8	48.380952	387.761905	390.476190	381.571429	1159.809524	0.285714	27.666667	35.428571	10.428571	94.160941	3.866522	22.984436	19.775820	25.689651	20112012.0	75.572910	555.519153	189.304021	146.875384	128.564406	125.429027	56.975661	12.468466	108.124656	18.160085	44.084199	39.618831	30.517672	3.407302	160.389418	31.517989	332.023492	61.955407	28.943704	2.437418	282.163175	47.329608	273.355767	52.670307	162.621731	54.316886	33.923229	56.545056	5.541562	8.147207	28.381401	48.398495	20.398792	43.460466	29.796418	12.762213	8.364863	12.0	10566.224674	626.293973	815.476965	305.947090	1.455672	7.250292	18.947450	40.823004	-73.864576	0.619048	0.476190	0.142857	0.047619	0.619048	0.095238	0.047619	0.000000	67.173860	79.277102	34.702026	355.643572	29.833131	133.098995	8.148675	7.621647	7.497401	7.783552	7.122553	6.326920	6.882695	7.421044	6.654833	6.134948	6.678030	7.331412	7.298531	6.694813	7.004691	7.505218	87.15	31625.0	0.000000	0.000000	0.047619	0.000000	0.000000	0.000000	0.000000
8	9	52.818182	373.772727	381.090909	373.772727	1128.636364	0.045455	19.227273	24.363636	5.818182	80.463122	3.531670	23.223048	19.946652	26.492725	20112012.0	75.631818	426.318182	116.818182	104.136364	81.363636	76.106133	79.681818	20.786364	72.818182	17.350000	24.818182	26.863636	6.500000	1.400000	145.590909	33.690909	268.636364	63.568182	3.045455	0.709091	216.318182	51.513636	209.954545	48.477273	87.078510	66.398787	43.687812	64.303079	5.856128	8.060139	37.825057	56.237082	22.733184	35.721907	21.238717	9.254737	8.268888	12.0	10480.633161	479.001971	811.932003	327.555556	1.211013	3.843820	16.339732	40.836349	-73.906240	0.818182	0.590909	0.045455	0.045455	0.727273	0.227273	0.000000	0.000000	81.000000	86.045455	38.636364	332.772727	25.363636	152.181818	8.522727	7.950000	7.772727	8.095455	6.918182	6.350000	6.918182	7.354545	6.627273	6.145455	6.686364	7.481818	7.350000	6.818182	7.113636	7.663636	89.27	34518.0	0.000000	0.000000	0.000000	0.045455	0.000000	0.000000	0.045455
9	10	98.142857	395.785714	409.928571	391.035714	1196.750000	0.071429	91.714286	158.892857	107.392857	129.446808	5.199036	22.885003	18.959498	26.175094	20112012.0	73.083254	779.496508	213.864008	197.549395	157.379197	139.488209	130.803175	19.797778	100.950635	13.898635	30.255438	39.464042	103.816825	6.212619	174.756349	25.113492	432.660476	62.241556	63.636349	5.842349	393.443810	49.011492	386.052540	50.988444	166.921736	64.184826	47.296412	70.087588	13.485337	16.743680	33.809280	53.347286	16.905776	29.931955	21.222046	11.555591	8.348093	12.0	10519.492451	760.897503	810.268293	331.130952	1.568815	6.953289	14.372227	40.870345	-73.898360	0.714286	0.714286	0.071429	0.071429	0.642857	0.214286	0.035714	0.000000	78.166109	78.600684	36.776520	582.544911	41.053419	255.395675	8.236506	7.666235	7.508765	7.855521	7.052629	6.277333	6.776307	7.372926	6.654491	6.098820	6.686404	7.415351	7.291755	6.660395	6.978518	7.528913	88.92	56757.0	0.000000	0.000000	0.000000	0.035714	0.000000	0.071429	0.000000
10	11	66.894737	388.894737	394.052632	380.263158	1163.210526	0.105263	20.421053	25.894737	5.473684	97.396423	3.930534	24.014070	20.156705	27.446385	20112012.0	67.206901	571.310643	183.746959	161.809635	110.979869	100.140519	66.709942	12.680936	104.874620	17.408515	32.619378	40.841866	30.519532	4.550175	233.640936	40.435673	274.289123	50.013871	28.253567	4.225567	319.443509	55.232725	251.866901	44.767181	131.077772	70.193311	47.324192	65.194708	8.123932	10.432367	39.198686	54.764698	22.866668	34.807036	19.118901	8.205386	8.885845	12.0	10522.255741	607.530880	806.842533	328.233918	1.593924	10.480424	14.365696	40.873138	-73.856120	0.736842	0.684211	0.052632	0.105263	0.736842	0.105263	0.000000	0.000000	79.263158	77.421053	41.631579	376.157895	27.842105	182.894737	8.205263	7.878947	7.747368	7.984211	7.152632	6.947368	7.105263	7.636842	6.400000	5.889474	6.468421	7.363158	7.247368	6.905263	7.110526	7.657895	89.84	38230.0	0.000000	0.052632	0.000000	0.000000	0.000000	0.000000	0.000000
11	12	33.166667	368.555556	377.222222	361.944444	1107.722222	0.166667	5.111111	5.777778	0.000000	79.656523	3.532684	22.253377	19.155263	25.211129	20112012.0	78.450864	370.469506	108.177346	99.399530	85.406042	76.055821	82.235802	21.645494	63.989383	17.082272	20.465783	21.499874	10.996420	1.945926	112.532716	31.602160	238.041481	64.948988	7.383827	1.074605	181.539630	47.522827	188.929753	52.477123	102.839557	56.131267	37.736463	63.898083	4.674040	6.442377	33.063154	57.456664	18.410072	36.148708	27.292909	12.981459	8.485922	12.0	10550.766034	530.007227	818.750678	326.981481	1.329268	5.649561	18.745686	40.831412	-73.886946	0.611111	0.611111	0.111111	0.000000	0.611111	0.000000	0.000000	0.000000	73.166667	83.944444	40.833333	240.277778	21.888889	131.944444	8.555556	7.944444	7.772222	8.100000	7.222222	6.538889	7.033333	7.600000	6.950000	6.350000	6.938889	7.600000	7.566667	6.938889	7.250000	7.766667	87.33	23118.0	0.055556	0.000000	0.000000	0.055556	0.000000	0.000000	0.000000
12	13	122.000000	410.666667	414.333333	401.666667	1226.666667	0.111111	143.666667	237.555556	154.944444	151.051350	5.684546	23.043802	19.782463	25.839729	20112012.0	64.611111	706.500000	191.005339	179.388889	164.388889	160.796385	26.555556	6.744444	53.722222	11.661111	23.653788	19.555303	197.388889	7.833333	345.444444	72.200000	87.722222	15.888889	70.000000	2.877778	380.277778	50.605556	326.222222	49.394444	167.039078	63.127644	47.086836	66.735827	13.897516	16.233728	33.191855	50.495944	16.071524	33.299473	27.448472	7.280075	8.516046	12.0	11073.490214	860.950467	825.736676	334.586420	2.005872	3.772782	30.621405	40.692865	-73.977016	0.722222	0.611111	0.000000	0.111111	0.666667	0.277778	0.055556	0.000000	81.500000	81.888889	34.166667	585.000000	32.722222	244.666667	8.455556	7.833333	7.738889	8.005556	7.127778	6.605556	6.994444	7.533333	6.622222	6.188889	6.711111	7.450000	7.372222	6.850000	7.138889	7.644444	89.56	22785.0	0.000000	0.111111	0.055556	0.000000	0.055556	0.055556	0.000000
13	14	68.687500	395.500000	396.125000	385.250000	1176.875000	0.125000	19.687500	26.625000	2.375000	93.507076	3.775618	23.910637	20.247981	27.255405	20112012.0	74.675972	556.090694	164.818507	138.699471	111.581797	100.208433	56.515278	9.276181	94.488056	17.948806	39.274006	23.124858	24.433472	4.020417	224.349306	43.333681	280.796667	48.186361	22.869306	3.733931	312.607083	55.400681	243.483472	44.586764	98.013783	69.010221	48.789223	67.340068	8.521446	10.286709	40.270792	57.051107	20.234869	32.689824	19.910930	8.258135	8.557221	12.0	11150.183096	573.690210	815.156504	306.576389	1.977642	1.847752	32.217132	40.711599	-73.948360	0.875000	0.687500	0.187500	0.187500	0.812500	0.062500	0.000000	0.000000	77.687500	85.500000	32.250000	372.250000	30.687500	143.750000	8.343750	7.693750	7.612500	7.975000	7.362500	6.743750	7.306250	7.818750	6.775000	6.243750	6.787500	7.543750	7.493750	6.912500	7.256250	7.787500	89.41	20181.0	0.062500	0.062500	0.000000	0.000000	0.000000	0.062500	0.000000
14	15	49.846154	393.000000	397.692308	384.153846	1174.846154	0.153846	7.000000	8.153846	0.000000	89.690023	3.794926	22.160676	18.672783	25.303108	20112012.0	69.892308	439.384615	131.399402	120.230769	88.000000	83.564225	34.615385	6.538462	76.538462	15.346154	26.615385	24.384615	19.846154	4.215385	196.153846	47.423077	193.076923	41.800000	28.461538	6.023077	214.000000	49.892308	225.384615	50.107692	93.976156	49.516953	29.118686	52.765034	2.810722	4.263442	26.307767	48.505979	20.415313	47.250526	38.471486	8.726262	7.910082	12.0	11063.989160	634.314363	822.692933	308.034188	1.714196	6.009851	31.919095	40.675972	-73.989255	0.692308	0.538462	0.153846	0.230769	0.692308	0.153846	0.153846	0.000000	74.923077	80.538462	34.769231	375.865962	27.769231	140.615385	8.269231	7.715385	7.492308	7.992308	6.892308	6.246154	6.761538	7.376923	6.555827	6.143612	6.578409	7.394602	7.276923	6.707692	6.969231	7.615385	91.27	26786.0	0.153846	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
15	16	47.000000	370.600000	376.400000	364.600000	1111.600000	0.200000	14.200000	22.400000	0.000000	111.695505	4.624700	23.011849	18.777358	26.536763	20112012.0	64.640000	585.600000	118.800000	128.800000	152.600000	180.266986	16.400000	3.460000	115.600000	19.280000	27.000000	49.400000	4.400000	1.060000	495.800000	79.840000	76.000000	17.400000	4.000000	0.880000	327.600000	51.360000	258.000000	48.640000	228.226009	62.606518	45.912894	69.151550	7.882676	10.553897	38.030106	58.613781	16.699439	30.858389	24.561880	9.363888	9.000000	12.0	11221.800000	400.600000	815.000000	292.400000	1.600000	3.000000	38.000000	40.686497	-73.928188	1.000000	0.800000	0.200000	0.400000	1.000000	0.400000	0.000000	0.000000	70.000000	77.400000	16.400000	408.000000	29.000000	99.600000	7.620000	7.320000	7.140000	7.460000	5.580000	5.200000	5.940000	6.280000	5.860000	5.800000	6.300000	7.180000	6.260000	6.020000	6.400000	6.920000	85.55	10196.0	0.200000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
16	17	62.850000	386.550000	390.850000	375.600000	1153.000000	0.100000	37.700000	57.600000	8.900000	103.903095	4.190471	24.342899	20.161051	27.845455	20112012.0	69.515778	487.322556	133.109611	131.609577	101.580876	102.483493	43.962222	9.700944	53.490444	12.024044	24.219205	16.549886	18.346778	3.376333	398.529444	81.731944	55.237333	11.999089	11.695444	2.117144	216.885667	46.760544	270.436778	53.239411	117.482574	63.808920	45.293792	65.928508	7.040367	9.043677	38.246978	56.878821	18.516025	34.075862	26.395063	7.760450	8.114441	12.0	11096.591192	591.655420	807.063008	306.027778	1.655285	9.645504	33.859264	40.661319	-73.954519	0.700000	0.500000	0.050000	0.150000	0.650000	0.400000	0.050000	0.050000	76.182553	80.140957	38.987128	375.462876	26.524787	176.553945	8.001109	7.677729	7.617271	7.852729	6.873681	6.603266	7.086830	7.522096	6.266288	5.968348	6.560966	7.331491	7.053457	6.749553	7.084926	7.570479	89.67	26551.0	0.000000	0.100000	0.000000	0.050000	0.000000	0.000000	0.000000
17	18	39.214286	378.714286	374.214286	370.357143	1123.285714	0.000000	0.000000	0.000000	0.000000	54.285745	2.530983	21.534449	19.259438	23.966516	20112012.0	64.121429	305.571429	87.785714	93.142857	65.714286	58.928571	23.928571	7.221429	46.285714	14.957143	30.714286	6.642857	3.714286	1.178571	267.071429	87.392857	29.857143	9.771429	3.285714	1.085714	153.500000	51.742857	152.071429	48.257143	66.751571	51.795015	36.428790	61.320142	8.572779	11.083052	27.855865	50.237009	15.373840	38.692873	35.981968	9.785081	8.883807	12.0	11117.278165	435.718931	819.768293	325.523810	1.247387	15.310432	40.336512	40.641863	-73.914726	0.785714	0.571429	0.214286	0.357143	0.642857	0.428571	0.000000	0.000000	72.857143	82.714286	29.214286	194.428571	17.357143	79.000000	8.264286	7.742857	7.571429	7.978571	6.878571	6.285714	7.000000	7.457143	6.500000	6.200000	6.671429	7.400000	7.207143	6.742857	7.085714	7.628571	89.83	18641.0	0.000000	0.071429	0.000000	0.000000	0.000000	0.000000	0.000000
18	19	54.428571	371.928571	382.642857	364.142857	1118.714286	0.142857	20.928571	23.214286	4.785714	79.683736	3.332686	23.189496	19.913765	26.069705	20112012.0	70.915397	480.889365	140.149722	122.299395	101.522054	105.166781	63.017460	16.487063	72.486349	15.005778	29.313149	24.285552	22.066825	3.730476	261.042063	50.788492	182.910476	43.184413	11.707778	1.660206	273.622381	53.286492	207.266825	46.713444	142.420003	59.301244	38.512522	61.047436	6.657942	9.646424	31.855279	51.393577	20.791673	38.976847	26.307751	11.462065	8.708252	12.0	11138.709253	487.788811	804.607433	290.944444	1.903020	5.254574	37.033865	40.676547	-73.882158	0.857143	0.714286	0.357143	0.285714	0.857143	0.071429	0.000000	0.000000	69.428571	86.785714	31.357143	291.571429	27.785714	127.000000	7.657143	7.600000	7.400000	7.692857	6.685714	6.557143	6.900000	7.428571	6.350000	6.021429	6.607143	7.250000	6.900000	6.728571	6.964286	7.442857	87.81	25632.0	0.000000	0.214286	0.000000	0.000000	0.000000	0.000000	0.000000
19	20	298.833333	394.333333	466.500000	386.500000	1247.333333	0.166667	168.333333	266.666667	122.500000	218.885864	8.114801	22.854126	17.922480	26.025602	20112012.0	65.952593	2170.575185	645.016018	681.698589	436.769585	419.278309	515.374074	19.519815	299.801481	14.963481	101.397348	119.999621	722.989259	29.004444	192.598148	16.956481	684.457778	34.163630	563.818148	19.473815	1180.785556	45.685148	989.789259	54.314704	533.491913	60.690923	47.773830	76.665940	19.405984	30.430472	28.374927	46.233053	12.925085	23.356248	23.212949	13.921737	8.409628	12.0	11129.494128	2229.836947	848.542005	336.685185	3.470190	10.297003	39.872843	40.626751	-74.006191	0.833333	0.833333	0.000000	0.000000	0.833333	0.166667	0.500000	0.166667	68.500000	81.833333	21.666667	1170.500000	96.666667	503.833333	7.966667	7.200000	7.250000	7.616667	7.816667	7.566667	7.816667	8.200000	7.200000	6.366667	7.100000	7.650000	7.650000	7.050000	7.416667	7.816667	92.77	44214.0	0.000000	0.000000	0.166667	0.000000	0.000000	0.000000	0.000000
20	21	161.923077	396.692308	416.538462	387.923077	1201.153846	0.153846	65.000000	96.692308	37.846154	129.896696	5.116267	23.332248	19.847699	26.019050	20112012.0	64.101197	1050.880855	291.392008	318.476272	202.562212	211.974604	142.941880	15.655299	141.216068	13.798530	59.029545	47.461364	230.995043	15.540513	332.276068	34.626068	221.826667	26.052444	260.992991	23.357145	555.054872	56.254684	495.825812	43.745248	244.760187	59.177335	44.526942	71.661967	12.271824	18.247908	32.248101	53.414759	14.670591	28.360712	27.549957	10.885266	8.455041	12.0	11144.840734	1048.080258	806.038774	303.478632	3.049406	11.966464	43.036470	40.593596	-73.978465	0.846154	0.615385	0.000000	0.076923	0.846154	0.307692	0.230769	0.153846	78.665466	88.832242	35.210966	836.404424	58.115057	305.929145	7.940167	7.450353	7.372724	7.658045	7.082586	6.358871	6.948969	7.480147	6.517365	6.105150	6.578409	7.233064	7.182242	6.637774	6.969116	7.454583	90.50	34342.0	0.000000	0.076923	0.000000	0.076923	0.000000	0.000000	0.000000
21	22	393.600000	455.400000	480.600000	453.400000	1389.400000	0.000000	294.400000	464.000000	271.000000	263.231012	8.940673	26.438064	20.923430	29.482129	20112012.0	46.000000	2099.600000	554.200000	604.200000	453.000000	476.200000	194.400000	7.860000	183.400000	7.940000	76.600000	61.600000	452.400000	16.780000	736.200000	40.440000	277.800000	13.040000	624.600000	29.200000	1017.600000	46.080000	1082.000000	53.920000	580.000000	75.442857	62.197143	80.345714	27.771429	35.040000	34.417143	45.305714	13.294286	19.688571	15.142857	8.262857	8.891553	12.0	11123.592954	1873.604336	817.250407	284.622222	2.164228	11.556403	40.447411	40.618285	-73.952288	0.800000	0.600000	0.000000	0.000000	0.800000	0.400000	0.200000	0.200000	89.800000	84.200000	41.200000	1861.600000	91.000000	925.800000	8.120000	7.240000	7.440000	7.660000	7.860000	7.200000	7.480000	8.000000	6.760000	5.940000	6.700000	7.380000	7.600000	6.780000	7.200000	7.700000	92.57	36352.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
22	23	36.500000	358.666667	375.166667	351.333333	1085.166667	0.000000	14.500000	15.500000	0.000000	85.382105	3.628471	21.879838	18.452419	24.762065	20112012.0	69.683333	362.000000	90.833333	86.500000	55.833333	70.500000	6.166667	1.750000	45.833333	12.600000	15.000000	14.833333	4.166667	1.066667	310.000000	85.216667	44.000000	12.883333	2.000000	0.433333	162.833333	45.983333	199.166667	54.016667	77.222222	44.595833	28.761111	62.800000	4.841667	6.633333	23.920833	56.156944	15.823611	37.211111	43.084722	9.606944	8.228883	12.0	10972.482385	581.510840	834.292683	292.888889	1.577236	11.391008	31.618529	40.668586	-73.912298	0.500000	0.500000	0.000000	0.000000	0.500000	0.166667	0.000000	0.000000	74.166667	71.833333	32.166667	272.666667	18.666667	110.666667	8.350000	7.850000	7.783333	8.183333	6.833333	6.183333	6.916667	7.516667	6.883333	6.416667	7.116667	7.800000	7.366667	6.833333	7.266667	7.833333	86.98	11833.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
23	24	124.533333	405.200000	431.200000	401.066667	1237.466667	0.133333	58.000000	76.400000	28.200000	137.802570	5.799242	23.277269	19.544007	26.511753	20112012.0	62.614370	922.763407	257.339741	251.158872	216.774501	211.822329	162.082963	19.374593	91.320593	9.665393	37.543485	30.132879	167.329037	16.055111	83.372593	10.869259	563.049778	61.972119	105.593926	10.749526	528.647556	51.420726	394.115704	48.579215	219.586448	62.068522	48.250647	73.146707	11.331368	16.096378	36.920512	57.047768	13.825845	26.865177	26.842849	9.085423	8.794369	12.0	11142.128636	937.069557	820.833604	328.681481	2.176152	3.370936	25.231608	40.740621	-73.911518	0.866667	0.666667	0.133333	0.000000	0.800000	0.200000	0.200000	0.000000	81.666667	84.333333	38.133333	683.000000	44.333333	231.133333	8.546667	7.746667	7.626667	7.973333	7.706667	6.686667	7.333333	7.800000	7.313333	6.400000	6.986667	7.640000	7.860000	6.940000	7.313333	7.813333	92.21	52936.0	0.066667	0.133333	0.000000	0.066667	0.066667	0.000000	0.000000
24	25	140.000000	423.000000	458.545455	418.090909	1299.636364	0.000000	114.090909	149.818182	85.272727	170.122766	6.560779	25.214998	20.607206	29.085464	20112012.0	55.265051	1106.041010	347.926565	269.653776	234.391705	198.939539	197.658586	16.265354	113.255354	10.398263	38.125826	30.908884	356.448687	32.738788	218.144444	20.712626	403.613333	31.507434	121.173535	14.494808	549.519394	49.310081	556.521414	50.689838	227.016151	60.392776	49.142544	73.835318	16.609758	20.026361	32.543142	53.814782	11.247449	26.167694	28.465829	9.353090	8.033936	12.0	11187.808574	1147.915004	823.250554	304.212121	1.860310	7.667823	21.973743	40.745414	-73.815558	0.636364	0.636364	0.181818	0.000000	0.545455	0.272727	0.363636	0.000000	85.181818	82.000000	37.181818	800.636364	47.363636	290.454545	8.309091	7.681818	7.500000	7.800000	7.463636	6.945455	7.300000	7.772727	7.118182	6.327273	6.936364	7.581818	7.627273	6.990909	7.263636	7.736364	91.90	34371.0	0.000000	0.000000	0.000000	0.090909	0.000000	0.000000	0.000000
25	26	607.800000	445.200000	487.600000	444.800000	1377.600000	0.000000	384.800000	593.000000	361.400000	316.743254	10.672006	26.353720	20.303250	29.495985	20112012.0	43.800000	2991.600000	750.600000	778.000000	751.800000	711.200000	248.600000	7.480000	350.000000	11.960000	112.400000	132.400000	1259.200000	38.140000	676.400000	28.420000	604.800000	19.380000	432.400000	13.420000	1475.000000	49.220000	1516.600000	50.780000	799.476190	76.325238	64.186667	84.060952	28.884762	37.772857	35.311429	46.290952	12.154762	15.976190	15.244762	6.277143	9.000000	12.0	11388.600000	2837.400000	821.000000	337.800000	4.600000	11.800000	21.600000	40.748507	-73.759176	1.000000	1.000000	0.200000	0.200000	1.000000	0.600000	0.600000	0.000000	67.000000	83.400000	23.800000	1930.800000	129.800000	684.200000	7.720000	6.980000	7.220000	7.260000	7.000000	6.580000	6.840000	7.260000	6.760000	6.060000	6.660000	7.380000	7.140000	6.540000	6.900000	7.300000	93.34	31988.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
26	27	145.727273	405.636364	419.818182	390.636364	1216.090909	0.000000	44.545455	60.454545	16.727273	152.182301	5.751941	24.876535	20.841643	27.889288	20112012.0	61.718182	1106.000000	291.090909	276.727273	248.545455	193.181818	114.090909	7.227273	135.909091	12.845455	41.000000	49.454545	240.454545	15.536364	361.545455	40.400000	385.818182	32.554545	98.363636	10.345455	590.909091	52.863636	515.090909	47.136364	294.251825	63.514773	45.475672	66.606675	8.742242	11.789086	36.732063	54.823071	18.053664	33.404543	23.319646	10.610401	8.132524	12.0	11480.814979	1044.729244	791.022912	296.191919	2.438285	11.525638	29.748823	40.638828	-73.807823	0.909091	0.818182	0.363636	0.545455	0.909091	0.090909	0.000000	0.000000	74.090909	81.636364	37.454545	742.272727	55.454545	253.545455	7.645455	7.336364	7.327273	7.590909	6.827273	6.754545	7.063636	7.636364	6.281818	5.909091	6.436364	7.218182	6.881818	6.636364	6.909091	7.463636	89.88	46007.0	0.000000	0.090909	0.000000	0.000000	0.000000	0.000000	0.000000
27	28	157.357143	433.642857	452.428571	423.071429	1309.142857	0.214286	110.714286	164.071429	72.571429	153.625343	5.551129	25.415336	21.096636	28.554577	20112012.0	55.857143	1025.428571	271.714286	260.285714	242.115537	238.500342	81.571429	6.464286	104.857143	9.657143	36.785714	29.714286	337.357143	30.235714	303.714286	39.450000	238.714286	19.864286	132.785714	8.371429	503.142857	46.800000	522.285714	53.200000	314.067801	68.936758	53.568398	73.558366	21.436765	25.814280	32.132798	47.746450	15.370383	26.452127	20.267281	8.807537	8.065395	12.0	11323.280681	1163.360240	806.821719	297.373016	2.331591	8.397431	25.176722	40.709697	-73.805547	0.857143	0.785714	0.071429	0.071429	0.857143	0.285714	0.142857	0.000000	81.903647	81.844225	36.910182	785.589822	48.106839	316.719921	8.137298	7.525327	7.410387	7.753899	7.533830	6.876094	7.074043	7.717280	6.623268	6.026211	6.572808	7.387845	7.433511	6.806505	7.014179	7.629255	91.70	37009.0	0.000000	0.071429	0.071429	0.000000	0.000000	0.071429	0.000000
28	29	49.200000	392.700000	398.000000	382.400000	1173.100000	0.100000	12.000000	18.100000	0.800000	73.998430	2.789649	26.645084	23.522580	29.262070	20112012.0	50.750000	423.300000	134.800000	97.900000	89.630876	86.533493	13.700000	3.270000	57.900000	13.720000	21.200000	16.400000	20.800000	4.920000	355.800000	84.050000	37.900000	8.920000	4.000000	1.010000	221.700000	53.070000	201.600000	46.930000	85.577574	68.779062	51.182506	73.657639	6.652390	9.466046	44.531645	64.199202	17.595383	26.344017	20.313647	9.095807	8.291553	12.0	11276.092954	533.704336	815.250407	299.122222	1.364228	11.156403	27.247411	40.685276	-73.752740	0.800000	0.800000	0.200000	0.200000	0.800000	0.100000	0.000000	0.000000	78.300000	92.500000	43.000000	295.600000	21.200000	142.800000	7.660000	7.310000	7.260000	7.520000	7.280000	7.220000	7.450000	7.830000	5.960000	5.860000	6.340000	7.130000	6.970000	6.790000	7.020000	7.500000	92.14	27232.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
29	30	172.000000	430.333333	465.222222	429.111111	1324.666667	0.111111	114.222222	186.444444	64.000000	167.157553	6.383145	24.426389	19.581769	27.712779	20112012.0	54.211111	1221.555556	327.333333	347.444444	267.333333	247.037214	220.555556	16.133333	125.111111	7.988889	44.931566	49.777525	281.777778	24.866667	124.222222	10.444444	607.222222	43.522222	204.444444	20.788889	604.111111	44.777778	617.444444	55.222222	306.409952	69.621250	59.824352	83.596553	21.146265	27.987934	38.677147	55.606480	9.817782	16.427763	19.245670	9.184438	8.444444	12.0	11103.000000	1123.333333	820.444444	318.333333	2.555556	1.222222	25.111111	40.755398	-73.932306	1.000000	0.777778	0.000000	0.000000	1.000000	0.444444	0.333333	0.000000	89.000000	85.444444	42.333333	963.444444	63.111111	318.777778	8.222222	7.444444	7.433333	7.700000	7.500000	6.655556	6.911111	7.533333	7.033333	6.166667	6.844444	7.511111	7.577778	6.744444	7.077778	7.588889	92.79	39742.0	0.000000	0.111111	0.111111	0.111111	0.000000	0.000000	0.000000
30	31	252.083333	453.833333	462.916667	442.416667	1359.166667	0.000000	158.500000	246.666667	123.666667	198.536039	7.321520	25.297286	20.209668	28.780992	20112012.0	42.209630	1616.287593	473.174676	419.682628	346.442396	321.139155	60.937037	3.659907	261.567407	16.540074	63.365341	91.833144	157.911296	9.510556	280.215741	21.611574	377.895556	25.406815	792.825741	42.945241	827.142778	51.350907	789.144630	48.649019	372.476115	70.470064	55.883344	74.467891	24.279579	29.918014	31.600162	44.543907	14.591908	25.545584	18.834252	7.785472	8.864441	12.0	10376.910795	1668.253613	802.375339	281.685185	4.470190	2.630336	45.372843	40.595680	-74.125726	0.833333	0.833333	0.166667	0.083333	0.666667	0.166667	0.083333	0.083333	88.250000	92.750000	44.083333	1246.500000	76.916667	503.750000	7.958333	7.475000	7.491667	7.733333	7.450000	7.375000	7.616667	8.041667	6.733333	6.150000	6.866667	7.375000	7.391667	7.008333	7.316667	7.716667	90.98	59373.0	0.000000	0.083333	0.000000	0.000000	0.000000	0.083333	0.000000
31	32	52.285714	369.714286	376.000000	361.571429	1107.285714	0.000000	23.428571	29.571429	5.857143	85.597174	3.791643	21.971905	18.397676	25.292561	20112012.0	82.714286	415.285714	113.156587	138.142857	103.329822	114.762133	71.000000	16.628571	63.000000	14.757143	20.285714	27.142857	5.571429	1.542857	89.857143	22.128571	313.142857	74.657143	4.285714	1.085714	210.857143	50.642857	204.428571	49.357143	88.947279	59.155510	30.680000	46.764082	3.501224	5.047347	27.182857	41.711633	28.476531	53.235918	27.725306	11.400408	8.493967	12.0	11159.423539	437.288811	800.893148	325.730159	1.117305	4.397431	34.033865	40.696295	-73.917124	0.857143	0.857143	0.000000	0.000000	0.857143	0.142857	0.000000	0.000000	73.571429	76.285714	35.000000	298.857143	23.000000	135.428571	8.471429	8.057143	7.900000	8.214286	7.214286	6.228571	6.985714	7.557143	6.985714	6.100000	6.900000	7.571429	7.557143	6.785714	7.271429	7.785714	89.28	15297.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
32	75	6.285714	405.000000	417.428571	399.571429	1222.000000	0.857143	0.000000	0.000000	0.000000	117.140546	4.647296	23.595245	19.904134	26.687738	20112012.0	66.515556	710.451111	199.096110	189.191537	155.308756	147.334928	85.244444	13.118889	92.808889	14.080889	34.384091	29.997727	114.935556	9.426667	221.588889	38.638889	276.746667	43.481778	91.908889	7.642889	358.713333	49.510889	351.735556	50.488222	175.547166	62.915625	45.121491	66.876392	10.910809	14.105702	34.210494	52.770587	17.803826	33.140172	24.645990	9.779497	8.457766	12.0	10725.964770	772.021680	816.252033	315.111111	1.821138	6.782016	22.237057	40.735418	-73.928741	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	53.000000	77.857143	27.857143	149.000000	61.285714	102.000000	8.114286	7.685714	7.642857	7.757143	6.771429	6.942857	6.714286	7.271429	6.657143	6.371429	7.328571	7.400000	7.142857	6.985714	7.171429	7.457143	83.21	21435.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
33	79	6.333333	421.333333	394.333333	393.333333	1209.000000	0.333333	0.000000	0.000000	0.000000	117.140546	4.647296	23.595245	19.904134	26.687738	20112012.0	66.515556	710.451111	199.096110	189.191537	155.308756	147.334928	85.244444	13.118889	92.808889	14.080889	34.384091	29.997727	114.935556	9.426667	221.588889	38.638889	276.746667	43.481778	91.908889	7.642889	358.713333	49.510889	351.735556	50.488222	175.547166	62.915625	45.121491	66.876392	10.910809	14.105702	34.210494	52.770587	17.803826	33.140172	24.645990	9.779497	8.457766	12.0	10725.964770	772.021680	816.252033	315.111111	1.821138	6.782016	22.237057	40.775291	-73.900635	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	77.651064	82.819149	36.742553	516.257511	36.495745	213.078891	8.222175	7.654584	7.545416	7.854584	7.173617	6.565319	7.036596	7.541915	6.725751	6.166953	6.719313	7.429828	7.369149	6.791064	7.098511	7.609574	63.81	7288.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000

For our first school district choropleth map we will compare SAT scores among districts. Because we may later want to compare a different feature (and creating a map from scratch each time is somewhat time-consuming) we will write a function that can be reused for any column of district_data.

In [5]:

# Takes any column of district_data and returns a district choropleth map with that column 
def show_district_map(col):
    geo_path = 'districts.geojson'
    districts = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
    choropleth = folium.features.Choropleth(
        geo_data=geo_path,
        data=district_data,
        columns=['school_dist', col],
        key_on='feature.properties.school_dist', # found in GeoJSON file
        fill_color='YlGn',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name='Average {}'.format(col),
        highlight=True
    ).add_to(districts)
    dist_markers = district_data.iloc[:32] # to avoid physical markers for Districts 75 and 79
    for name, row in dist_markers.iterrows(): # adds circle marker with hover displaying district
        folium.CircleMarker(
            location=[row['lat'], row['lon']], 
            tooltip='{0}: {1}'.format('District', row['school_dist']),
            radius=2).add_to(districts)
    return districts

show_district_map('sat_score')

Out[5]:

Already we see a noticeable difference in SAT scores from district to district. Districts 9, 12, 16, 18, 19, 23 and 32 have an average SAT score between 1085 and 1136, while districts 22, 26 and 31 average roughly 300 points higher. We will take a closer look into this below, but for now it is worth noting the strength of this map as a communication device. From one glance we can determine the lowest- and highest-scoring districts, where they lie geographically and a rough estimate on the spread between district scores. No other visualization would have provided this much easily digestible information, so it was well worth the time it took to fine-tune the map to our liking.

Finding Relationships in Our Data¶

With our maps as reference points, we now want to dive a little deeper into the features of our data. We'll begin by taking a look at the shape of our SAT data. Our map shows a range of 1085 to 1389, but we know the range of scores is much greater. When this data was compiled by the City of New York, the minimum score possible on the SAT was 600, while the maximum possible was 2400. As with all standardized tests, after a certain threshold it gets exponentially harder to achieve the higher scores so we are unlikely to see many schools with averages on the higher end. Thus, we can likely expect our distribution to be skewed right to some degree. Let's take a look with a simple histogram.

Outside of maps we'll use a combination of three plotting techniques for our visualizations moving forward, depending on our needs and which method makes the quickest or cleanest plot. We will use the Pandas built-in plotting method for "quick and dirty" plots, Seaborn for clean-looking plots requiring only a little customization, and Matplotlib for anything that needs a significant amount of tweaking. The Pandas method and Seaborn are both based on Matplotlib, but each sacrifices customizability for ease of creation to some degree.

To start, we import Matplotlib and Seaborn and enable our Jupyter notebook to show the plots as output. We can also set the style of our plots so that they all look similar regardless of the plotting technique we use. For our histogram, the Pandas plotting method will suffice.

In [6]:

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

plt.style.use('seaborn') # All plots we make will use this style

full['sat_score'].hist(bins=20)

Out[6]:

<matplotlib.axes._subplots.AxesSubplot at 0x1eef372b5f8>

Just as expected, the majority of the schools had average SAT scores between 1000 and 1400, giving us a skew-right distribution. Having confirmed the distribution of SAT scores in our data, we will dig deeper into potential correlations.

Location of School¶

We have already noticed a stark difference between certain school districts' average SAT scores from our maps. To get a more detailed view of the disparity among school districts we can use boxplots. Seaborn does an excellent job at making quick and polished boxplots, so we'll use it here.

In [7]:

plt.style.use('seaborn')
f, ax = plt.subplots(figsize=(16, 12))
ax = sns.boxplot(data=full, x='sat_score', y='District', showmeans=True, meanline=True) # Red dotted line indicates the mean
plt.show()

As with our map, we notice right away that the median SAT score (the solid black line inside each box) is higher for districts 22, 26 and 31 than any others. We also see that the lowest scores for these districts are higher than the lowest scores for any of the other districts and that of the three, only District 31 has a high-scoring outlier school. We can say with certainty that a noticable difference in district SAT performance exists. However, we still don't know if that difference is significant.

Before we attempt to answer that question, we will continue exploring correlations in location. Upon a closer look, we see that our three top-scoring districts are all from different boroughs: District 22 is in Brooklyn, District 26 is in Queens and District 31 is the district for all of Staten Island.

Does this mean that we won't see much of a difference in performance by borough? A boxplot by borough can help clear that up.

In [8]:

f, ax = plt.subplots(figsize=(16, 12))
ax = sns.boxplot(data=full, x='sat_score', y='borough', palette='muted', showmeans=True, meanline=True)
plt.show()

Interestingly, even with the three highest-scoring districts being spread accross different boroughs, it appears the borough a school is located in may still be related in some way to its student body's performance on the SAT. This is likely influenced by a number of factors, such as outlier schools or the fact that Staten Island is comprised of only one district (District 31) and 13 schools. For the moment we can say that schools in Staten Island, Queens and Manhattan tend, on average, to have higher-scoring SAT test takers, but to gain a better understanding of the underlying factors we will have to explore the other features of our dataset.

Narrowing Down Numerical Features¶

To investigate what factors may be driving differences among boroughs and districts we will look for correlations between our sat_score column and the other numerical columns from the full dataset. The default correlation test in Pandas' .corr() method is the Pearson correlation coefficient (or Pearson's r). Pearson's r is a measure of the linear correlation between two data features.

Scores of 1 or -1 indicate perfect positive or negative linear relationships respectively, while a score of 0 indicates no relationship. Generally speaking, r scores of $\pm$ 0.6 are considered strong correlations and anything below $\pm$ 0.3 weak correlations.

Note: As with all statistical tests, Pearson's correlation coefficient is based on assumptions about the data. In particular, it assumes a normal distribution in our data as well as homoscedasticity (homogeneity of variance). Because of this, Pearson's r can be quite sensitive to outliers. Keeping this in mind, we will still run this test (it provides a good starting point for uncovering which features to investigate further) but we will not put too much stock in the accuracy of the r-values themselves.

In [9]:

full.corr()['sat_score'].sort_values()

Out[9]:

frl_percent                            -0.690782
Local - % of grads                     -0.487021
Dropped Out - % of cohort              -0.459963
Local - % of cohort                    -0.420838
sped_percent                           -0.401149
Still Enrolled - % of cohort           -0.373215
ell_percent                            -0.353428
hispanic_per                           -0.347408
black_per                              -0.298421
Regents w/o Advanced - % of grads      -0.216672
is_intl                                -0.175639
ell_num                                -0.129127
lon                                    -0.128337
lat                                    -0.121645
end_time                               -0.104474
male_per                               -0.096487
com_p_11                               -0.084269
Council District                       -0.081392
postcode                               -0.066905
Community Board                        -0.060518
is_consort_intl                        -0.057083
has_online_ap                          -0.050644
start_time                             -0.036989
grade_span_min                         -0.032146
is_allboys                             -0.020760
is_suppressed                          -0.000405
is_consort                              0.005243
is_allgirls                             0.011363
rr_t                                    0.011555
is_CTE                                  0.020591
ctt_num                                 0.026258
selfcontained_num                       0.026747
eng_p_11                                0.027884
aca_p_11                                0.029398
school_dist                             0.029547
black_num                               0.036775
eng_t_11                                0.041858
has_online_lang                         0.043416
hispanic_num                            0.048459
sped_num                                0.055018
Regents w/o Advanced - % of cohort      0.068012
com_tot_11                              0.077030
com_t_11                                0.082347
eng_tot_11                              0.084201
female_per                              0.096534
rr_p                                    0.102142
saf_p_11                                0.103706
number_programs                         0.112059
aca_t_11                                0.121250
YTD % Attendance (Avg)                  0.142879
has_russian                             0.145859
com_s_11                                0.147875
eng_s_11                                0.150950
aca_tot_11                              0.158354
has_french                              0.185955
has_lang                                0.190621
has_spanish                             0.201407
has_ap                                  0.205296
SIZE OF SMALLEST CLASS                  0.205947
YTD Enrollment(Avg)                     0.222860
saf_s_11                                0.245603
saf_tot_11                              0.252684
aca_s_11                                0.261800
rr_s                                    0.263214
saf_t_11                                0.272186
Total Cohort                            0.272337
N_t                                     0.292926
grade9                                  0.303409
grade10                                 0.316743
male_num                                0.333068
has_chinese                             0.346651
SIZE OF LARGEST CLASS                   0.349110
AVERAGE CLASS SIZE                      0.360053
total_enrollment                        0.371481
grade11                                 0.376710
grade12                                 0.388337
female_num                              0.388636
total_students                          0.389229
N_s                                     0.420773
N_p                                     0.426887
NUMBER OF SECTIONS                      0.450922
white_num                               0.454454
Num of SAT Test Takers                  0.469323
asian_num                               0.471302
Total Grads - % of cohort               0.473012
Total Regents - % of grads              0.486904
NUMBER OF STUDENTS / SEATS FILLED       0.491368
asian_per                               0.527458
Number of Exams with scores 3 4 or 5    0.548118
Total Exams Taken                       0.548238
AP Test Takers                          0.558501
is_specialized                          0.570065
Total Regents - % of cohort             0.609374
white_per                               0.630166
Advanced Regents - % of grads           0.722432
Advanced Regents - % of cohort          0.754615
SAT Math Avg. Score                     0.953010
SAT Critical Reading Avg. Score         0.974757
SAT Writing Avg. Score                  0.981016
sat_score                               1.000000
schoolyear                                   NaN
grade_span_max                               NaN
Name: sat_score, dtype: float64

Looking at the top positive and negative results, we see that our AP columns all have an r-value greater than 0.5. However, they only represent the raw number of tests/test takers and not a percentage. Columns showing 1) the percentage of AP test takers per school and 2) the percentage of AP exams at each school with scores of 3 or above will help account for size differences among schools and may tell a different story. We can add these new columns using the code below.

In [10]:

full['pct_AP'] = full['AP Test Takers '] / full['total_enrollment']
full['pct_3_plus'] = (full['Number of Exams with scores 3 4 or 5'] / full['Total Exams Taken']).fillna(0)

In [11]:

full.corr()['sat_score'].sort_values().tail(20)

Out[11]:

Total Grads - % of cohort               0.473012
Total Regents - % of grads              0.486904
NUMBER OF STUDENTS / SEATS FILLED       0.491368
pct_3_plus                              0.519764
asian_per                               0.527458
Number of Exams with scores 3 4 or 5    0.548118
Total Exams Taken                       0.548238
AP Test Takers                          0.558501
is_specialized                          0.570065
pct_AP                                  0.585918
Total Regents - % of cohort             0.609374
white_per                               0.630166
Advanced Regents - % of grads           0.722432
Advanced Regents - % of cohort          0.754615
SAT Math Avg. Score                     0.953010
SAT Critical Reading Avg. Score         0.974757
SAT Writing Avg. Score                  0.981016
sat_score                               1.000000
schoolyear                                   NaN
grade_span_max                               NaN
Name: sat_score, dtype: float64

What if we want to know more than the r-value? A particularly useful Python library for more extensive statistical analysis is called Pingouin. Pingouin has many of the statistical tests found in SciPy and Statsmodels, but also seeks to bring to Python statistical tests that were previously available only in R or Matlab. It is also built to use directly with Pandas, which makes it generally more straightforward to use than SciPy. We will import it using the code below.

In [12]:

import pingouin as pg

Running a correlation test in Pingouin returns several other useful figures besides the r value. In particular, Pingouin returns the p-value (p-unc) and Bayes Factor (BF10) for each pair of features tested. Both of these values are measures of statistical significance. Generally, we reject the null hypothesis (that the two features have no correlation) when the p-value is less than 0.05. For the Bayes Factor, the larger the value, the stronger the evidence of correlation. If you need them, the confidence interval is also returned by default (CI95%), as is Fisher's z (z). Pingouin also makes it effortless to switch between correlation tests (Pearson's r, Spearman's rho, etc.), which we will explore later. For now, we will run the same basic correlation in Pingouin (Pearson method) and explore the DataFrame it outputs.

In [13]:

corr = pg.pairwise_corr(full, columns=['sat_score'])
corr.sort_values(by=['r']).reset_index()

Out[13]:

	index	X	Y	method	tail	n	r	CI95%	r2	adj_r2	z	p-unc	BF10	power
0	13	sat_score	frl_percent	pearson	two-sided	478	-0.691	[-0.73, -0.64]	0.477	0.475	-0.850	4.902523e-69	3.24e+65	1.000
1	46	sat_score	Local - % of grads	pearson	two-sided	478	-0.487	[-0.55, -0.42]	0.237	0.234	-0.532	7.721751e-30	4.229e+26	1.000
2	48	sat_score	Dropped Out - % of cohort	pearson	two-sided	478	-0.460	[-0.53, -0.39]	0.212	0.208	-0.497	2.123609e-26	1.681e+23	1.000
3	45	sat_score	Local - % of cohort	pearson	two-sided	478	-0.421	[-0.49, -0.34]	0.177	0.174	-0.449	6.118468e-22	6.641e+18	1.000
4	22	sat_score	sped_percent	pearson	two-sided	478	-0.401	[-0.47, -0.32]	0.161	0.157	-0.425	6.608265e-20	6.569e+16	1.000
5	47	sat_score	Still Enrolled - % of cohort	pearson	two-sided	478	-0.373	[-0.45, -0.29]	0.139	0.136	-0.392	3.030167e-17	1.576e+14	1.000
6	20	sat_score	ell_percent	pearson	two-sided	478	-0.353	[-0.43, -0.27]	0.125	0.121	-0.369	1.647484e-15	3.107e+12	1.000
7	30	sat_score	hispanic_per	pearson	two-sided	478	-0.347	[-0.42, -0.27]	0.121	0.117	-0.362	5.262691e-15	9.936e+11	1.000
8	28	sat_score	black_per	pearson	two-sided	478	-0.298	[-0.38, -0.21]	0.089	0.085	-0.307	2.744013e-11	2.285e+08	1.000
9	44	sat_score	Regents w/o Advanced - % of grads	pearson	two-sided	478	-0.217	[-0.3, -0.13]	0.047	0.043	-0.221	1.738548e-06	5099.587	0.998
10	95	sat_score	is_intl	pearson	two-sided	478	-0.176	[-0.26, -0.09]	0.031	0.027	-0.178	1.133784e-04	96.264	0.972
11	19	sat_score	ell_num	pearson	two-sided	478	-0.129	[-0.22, -0.04]	0.017	0.013	-0.130	4.690113e-03	3.082	0.809
12	58	sat_score	lon	pearson	two-sided	478	-0.128	[-0.22, -0.04]	0.016	0.012	-0.129	4.952031e-03	2.935	0.804
13	57	sat_score	lat	pearson	two-sided	478	-0.122	[-0.21, -0.03]	0.015	0.011	-0.123	7.756702e-03	1.962	0.760
14	53	sat_score	end_time	pearson	two-sided	478	-0.104	[-0.19, -0.01]	0.011	0.007	-0.104	2.234739e-02	0.772	0.628
15	34	sat_score	male_per	pearson	two-sided	478	-0.096	[-0.18, -0.01]	0.009	0.005	-0.096	3.495175e-02	0.526	0.560
16	74	sat_score	com_p_11	pearson	two-sided	478	-0.084	[-0.17, 0.01]	0.007	0.003	-0.084	6.564555e-02	0.31	0.453
17	56	sat_score	Council District	pearson	two-sided	478	-0.081	[-0.17, 0.01]	0.007	0.002	-0.081	7.543887e-02	0.277	0.428
18	50	sat_score	postcode	pearson	two-sided	478	-0.067	[-0.16, 0.02]	0.004	0.000	-0.067	1.441342e-01	0.166	0.309
19	55	sat_score	Community Board	pearson	two-sided	478	-0.061	[-0.15, 0.03]	0.004	-0.001	-0.061	1.865452e-01	0.137	0.262
20	96	sat_score	is_consort_intl	pearson	two-sided	478	-0.057	[-0.15, 0.03]	0.003	-0.001	-0.057	2.128526e-01	0.124	0.238
21	62	sat_score	has_online_ap	pearson	two-sided	478	-0.051	[-0.14, 0.04]	0.003	-0.002	-0.051	2.691357e-01	0.105	0.198
22	52	sat_score	start_time	pearson	two-sided	478	-0.037	[-0.13, 0.05]	0.001	-0.003	-0.037	4.197432e-01	0.079	0.127
23	49	sat_score	grade_span_min	pearson	two-sided	478	-0.032	[-0.12, 0.06]	0.001	-0.003	-0.032	4.832028e-01	0.073	0.108
24	98	sat_score	is_allboys	pearson	two-sided	478	-0.021	[-0.11, 0.07]	0.000	-0.004	-0.021	6.507327e-01	0.063	0.074
25	4	sat_score	is_suppressed	pearson	two-sided	478	-0.000	[-0.09, 0.09]	0.000	-0.004	-0.000	9.929599e-01	0.057	0.050
26	92	sat_score	is_consort	pearson	two-sided	478	0.005	[-0.08, 0.09]	0.000	-0.004	0.005	9.089708e-01	0.058	0.051
27	94	sat_score	is_allgirls	pearson	two-sided	478	0.011	[-0.08, 0.1]	0.000	-0.004	0.011	8.043059e-01	0.059	0.057
28	68	sat_score	rr_t	pearson	two-sided	478	0.012	[-0.08, 0.1]	0.000	-0.004	0.012	8.010521e-01	0.059	0.057
29	93	sat_score	is_CTE	pearson	two-sided	478	0.021	[-0.07, 0.11]	0.000	-0.004	0.021	6.533932e-01	0.063	0.073
30	23	sat_score	ctt_num	pearson	two-sided	478	0.026	[-0.06, 0.12]	0.001	-0.004	0.026	5.668687e-01	0.067	0.088
31	24	sat_score	selfcontained_num	pearson	two-sided	478	0.027	[-0.06, 0.12]	0.001	-0.003	0.027	5.596542e-01	0.068	0.090
32	75	sat_score	eng_p_11	pearson	two-sided	478	0.028	[-0.06, 0.12]	0.001	-0.003	0.028	5.430815e-01	0.069	0.093
33	76	sat_score	aca_p_11	pearson	two-sided	478	0.029	[-0.06, 0.12]	0.001	-0.003	0.029	5.214043e-01	0.07	0.098
34	89	sat_score	school_dist	pearson	two-sided	478	0.030	[-0.06, 0.12]	0.001	-0.003	0.030	5.192934e-01	0.07	0.099
35	27	sat_score	black_num	pearson	two-sided	478	0.037	[-0.05, 0.13]	0.001	-0.003	0.037	4.224462e-01	0.079	0.126
36	79	sat_score	eng_t_11	pearson	two-sided	478	0.042	[-0.05, 0.13]	0.002	-0.002	0.042	3.611648e-01	0.087	0.150
37	61	sat_score	has_online_lang	pearson	two-sided	478	0.043	[-0.05, 0.13]	0.002	-0.002	0.043	3.435536e-01	0.09	0.157
38	29	sat_score	hispanic_num	pearson	two-sided	478	0.048	[-0.04, 0.14]	0.002	-0.002	0.048	2.903709e-01	0.1	0.185
39	21	sat_score	sped_num	pearson	two-sided	478	0.055	[-0.03, 0.14]	0.003	-0.001	0.055	2.298976e-01	0.118	0.225
40	43	sat_score	Regents w/o Advanced - % of cohort	pearson	two-sided	478	0.068	[-0.02, 0.16]	0.005	0.000	0.068	1.375980e-01	0.172	0.318
41	86	sat_score	com_tot_11	pearson	two-sided	478	0.077	[-0.01, 0.17]	0.006	0.002	0.077	9.252535e-02	0.235	0.391
42	78	sat_score	com_t_11	pearson	two-sided	478	0.082	[-0.01, 0.17]	0.007	0.003	0.082	7.206481e-02	0.287	0.437
43	87	sat_score	eng_tot_11	pearson	two-sided	478	0.084	[-0.01, 0.17]	0.007	0.003	0.084	6.586354e-02	0.309	0.453
44	36	sat_score	female_per	pearson	two-sided	478	0.097	[0.01, 0.18]	0.009	0.005	0.097	3.486299e-02	0.527	0.561
45	69	sat_score	rr_p	pearson	two-sided	478	0.102	[0.01, 0.19]	0.010	0.006	0.102	2.553892e-02	0.688	0.609
46	73	sat_score	saf_p_11	pearson	two-sided	478	0.104	[0.01, 0.19]	0.011	0.007	0.104	2.335779e-02	0.743	0.622
47	54	sat_score	number_programs	pearson	two-sided	478	0.112	[0.02, 0.2]	0.013	0.008	0.112	1.423512e-02	1.145	0.690
48	80	sat_score	aca_t_11	pearson	two-sided	478	0.121	[0.03, 0.21]	0.015	0.011	0.122	7.959570e-03	1.917	0.757
49	90	sat_score	YTD % Attendance (Avg)	pearson	two-sided	478	0.143	[0.05, 0.23]	0.020	0.016	0.144	1.737939e-03	7.606	0.881
50	66	sat_score	has_russian	pearson	two-sided	478	0.146	[0.06, 0.23]	0.021	0.017	0.147	1.385193e-03	9.367	0.893
51	82	sat_score	com_s_11	pearson	two-sided	478	0.148	[0.06, 0.23]	0.022	0.018	0.149	1.185296e-03	10.812	0.901
52	83	sat_score	eng_s_11	pearson	two-sided	478	0.151	[0.06, 0.24]	0.023	0.019	0.152	9.309770e-04	13.51	0.913
53	88	sat_score	aca_tot_11	pearson	two-sided	478	0.158	[0.07, 0.24]	0.025	0.021	0.159	5.108240e-04	23.559	0.936
54	64	sat_score	has_french	pearson	two-sided	478	0.186	[0.1, 0.27]	0.035	0.031	0.188	4.301269e-05	240.125	0.984
55	59	sat_score	has_lang	pearson	two-sided	478	0.191	[0.1, 0.28]	0.036	0.032	0.193	2.726261e-05	369.769	0.988
56	63	sat_score	has_spanish	pearson	two-sided	478	0.201	[0.11, 0.29]	0.041	0.037	0.204	9.103959e-06	1048.519	0.994
57	60	sat_score	has_ap	pearson	two-sided	478	0.205	[0.12, 0.29]	0.042	0.038	0.208	6.040392e-06	1550.291	0.995
58	11	sat_score	SIZE OF SMALLEST CLASS	pearson	two-sided	478	0.206	[0.12, 0.29]	0.042	0.038	0.209	5.635480e-06	1656.399	0.995
59	91	sat_score	YTD Enrollment(Avg)	pearson	two-sided	478	0.223	[0.14, 0.31]	0.050	0.046	0.227	8.579268e-07	1.004e+04	0.999
60	81	sat_score	saf_s_11	pearson	two-sided	478	0.246	[0.16, 0.33]	0.060	0.056	0.251	5.353924e-08	1.453e+05	1.000
61	85	sat_score	saf_tot_11	pearson	two-sided	478	0.253	[0.17, 0.33]	0.064	0.060	0.259	2.129055e-08	3.544e+05	1.000
62	84	sat_score	aca_s_11	pearson	two-sided	478	0.262	[0.18, 0.34]	0.069	0.065	0.268	6.230939e-09	1.165e+06	1.000
63	67	sat_score	rr_s	pearson	two-sided	478	0.263	[0.18, 0.34]	0.069	0.065	0.269	5.127383e-09	1.408e+06	1.000
64	77	sat_score	saf_t_11	pearson	two-sided	478	0.272	[0.19, 0.35]	0.074	0.070	0.279	1.449571e-09	4.798e+06	1.000
65	37	sat_score	Total Cohort	pearson	two-sided	478	0.272	[0.19, 0.35]	0.074	0.070	0.279	1.418563e-09	4.9e+06	1.000
66	71	sat_score	N_t	pearson	two-sided	478	0.293	[0.21, 0.37]	0.086	0.082	0.302	6.518324e-11	9.825e+07	1.000
67	15	sat_score	grade9	pearson	two-sided	478	0.303	[0.22, 0.38]	0.092	0.088	0.313	1.230973e-11	4.996e+08	1.000
68	16	sat_score	grade10	pearson	two-sided	478	0.317	[0.23, 0.4]	0.100	0.097	0.328	1.338570e-12	4.369e+09	1.000
69	33	sat_score	male_num	pearson	two-sided	478	0.333	[0.25, 0.41]	0.111	0.107	0.346	7.579638e-14	7.265e+10	1.000
70	65	sat_score	has_chinese	pearson	two-sided	478	0.347	[0.27, 0.42]	0.120	0.116	0.362	6.079998e-15	8.624e+11	1.000
71	12	sat_score	SIZE OF LARGEST CLASS	pearson	two-sided	478	0.349	[0.27, 0.43]	0.122	0.118	0.364	3.799283e-15	1.368e+12	1.000
72	10	sat_score	AVERAGE CLASS SIZE	pearson	two-sided	478	0.360	[0.28, 0.44]	0.130	0.126	0.377	4.458527e-16	1.121e+13	1.000
73	14	sat_score	total_enrollment	pearson	two-sided	478	0.371	[0.29, 0.45]	0.138	0.134	0.390	4.349482e-17	1.105e+14	1.000
74	17	sat_score	grade11	pearson	two-sided	478	0.377	[0.3, 0.45]	0.142	0.138	0.397	1.453185e-17	3.247e+14	1.000
75	18	sat_score	grade12	pearson	two-sided	478	0.388	[0.31, 0.46]	0.151	0.147	0.409	1.181484e-18	3.837e+15	1.000
76	35	sat_score	female_num	pearson	two-sided	478	0.389	[0.31, 0.46]	0.151	0.147	0.411	1.106221e-18	4.094e+15	1.000
77	51	sat_score	total_students	pearson	two-sided	478	0.389	[0.31, 0.46]	0.151	0.148	0.411	9.705319e-19	4.657e+15	1.000
78	70	sat_score	N_s	pearson	two-sided	478	0.421	[0.34, 0.49]	0.177	0.174	0.449	6.218289e-22	6.536e+18	1.000
79	72	sat_score	N_p	pearson	two-sided	478	0.427	[0.35, 0.5]	0.182	0.179	0.456	1.363089e-22	2.921e+19	1.000
80	9	sat_score	NUMBER OF SECTIONS	pearson	two-sided	478	0.451	[0.38, 0.52]	0.203	0.200	0.486	2.567645e-25	1.432e+22	1.000
81	31	sat_score	white_num	pearson	two-sided	478	0.454	[0.38, 0.52]	0.207	0.203	0.490	9.782708e-26	3.716e+22	1.000
82	0	sat_score	Num of SAT Test Takers	pearson	two-sided	478	0.469	[0.4, 0.54]	0.220	0.217	0.509	1.485238e-27	2.33e+24	1.000
83	25	sat_score	asian_num	pearson	two-sided	478	0.471	[0.4, 0.54]	0.222	0.219	0.511	8.373964e-28	4.107e+24	1.000
84	38	sat_score	Total Grads - % of cohort	pearson	two-sided	478	0.473	[0.4, 0.54]	0.224	0.220	0.514	5.089191e-28	6.719e+24	1.000
85	40	sat_score	Total Regents - % of grads	pearson	two-sided	478	0.487	[0.42, 0.55]	0.237	0.234	0.532	8.002970e-30	4.082e+26	1.000
86	8	sat_score	NUMBER OF STUDENTS / SEATS FILLED	pearson	two-sided	478	0.491	[0.42, 0.56]	0.241	0.238	0.537	2.022669e-30	1.592e+27	1.000
87	100	sat_score	pct_3_plus	pearson	two-sided	478	0.520	[0.45, 0.58]	0.270	0.267	0.576	1.965914e-34	1.491e+31	1.000
88	26	sat_score	asian_per	pearson	two-sided	478	0.527	[0.46, 0.59]	0.278	0.275	0.586	1.379931e-35	2.071e+32	1.000
89	7	sat_score	Number of Exams with scores 3 4 or 5	pearson	two-sided	478	0.548	[0.48, 0.61]	0.300	0.297	0.616	7.784329e-39	3.427e+35	1.000
90	6	sat_score	Total Exams Taken	pearson	two-sided	478	0.548	[0.48, 0.61]	0.301	0.298	0.616	7.442607e-39	3.583e+35	1.000
91	5	sat_score	AP Test Takers	pearson	two-sided	478	0.559	[0.49, 0.62]	0.312	0.309	0.631	1.483977e-40	1.736e+37	1.000
92	97	sat_score	is_specialized	pearson	two-sided	478	0.570	[0.51, 0.63]	0.325	0.322	0.648	1.525280e-42	1.624e+39	1.000
93	99	sat_score	pct_AP	pearson	two-sided	478	0.586	[0.52, 0.64]	0.343	0.341	0.672	2.122086e-45	1.105e+42	1.000
94	39	sat_score	Total Regents - % of cohort	pearson	two-sided	478	0.609	[0.55, 0.66]	0.371	0.369	0.707	6.308734e-50	3.425e+46	1.000
95	32	sat_score	white_per	pearson	two-sided	478	0.630	[0.57, 0.68]	0.397	0.395	0.741	2.876059e-54	6.971e+50	1.000
96	42	sat_score	Advanced Regents - % of grads	pearson	two-sided	478	0.722	[0.68, 0.76]	0.522	0.520	0.912	2.673511e-78	5.2e+74	1.000
97	41	sat_score	Advanced Regents - % of cohort	pearson	two-sided	478	0.755	[0.71, 0.79]	0.569	0.568	0.984	3.835073e-89	3.128e+85	1.000
98	2	sat_score	SAT Math Avg. Score	pearson	two-sided	478	0.953	[0.94, 0.96]	0.908	0.908	1.863	5.125499e-249	nan	1.000
99	1	sat_score	SAT Critical Reading Avg. Score	pearson	two-sided	478	0.975	[0.97, 0.98]	0.950	0.950	2.185	4.111075e-312	nan	1.000
100	3	sat_score	SAT Writing Avg. Score	pearson	two-sided	478	0.981	[0.98, 0.98]	0.962	0.962	2.323	0.000000e+00	nan	1.000

With so many different features, it can be difficult to select which ones to analyze further. However, with our new table we see in the negative correlations that the Bayes Factor drops from around 96 to 3 after the twelfth feature (ell_num), and this as good a start point as any. Below we will separate the features with the 12 strongest positive and negative r scores into their own DataFrames and combine them into a DataFrame named top_corrs.

In [14]:

pos_corr = corr.sort_values(by=['r']).iloc[86:98].reset_index()
pos_corr

Out[14]:

	index	X	Y	method	tail	n	r	CI95%	r2	adj_r2	z	p-unc	BF10	power
0	8	sat_score	NUMBER OF STUDENTS / SEATS FILLED	pearson	two-sided	478	0.491	[0.42, 0.56]	0.241	0.238	0.537	2.022669e-30	1.592e+27	1.0
1	100	sat_score	pct_3_plus	pearson	two-sided	478	0.520	[0.45, 0.58]	0.270	0.267	0.576	1.965914e-34	1.491e+31	1.0
2	26	sat_score	asian_per	pearson	two-sided	478	0.527	[0.46, 0.59]	0.278	0.275	0.586	1.379931e-35	2.071e+32	1.0
3	7	sat_score	Number of Exams with scores 3 4 or 5	pearson	two-sided	478	0.548	[0.48, 0.61]	0.300	0.297	0.616	7.784329e-39	3.427e+35	1.0
4	6	sat_score	Total Exams Taken	pearson	two-sided	478	0.548	[0.48, 0.61]	0.301	0.298	0.616	7.442607e-39	3.583e+35	1.0
5	5	sat_score	AP Test Takers	pearson	two-sided	478	0.559	[0.49, 0.62]	0.312	0.309	0.631	1.483977e-40	1.736e+37	1.0
6	97	sat_score	is_specialized	pearson	two-sided	478	0.570	[0.51, 0.63]	0.325	0.322	0.648	1.525280e-42	1.624e+39	1.0
7	99	sat_score	pct_AP	pearson	two-sided	478	0.586	[0.52, 0.64]	0.343	0.341	0.672	2.122086e-45	1.105e+42	1.0
8	39	sat_score	Total Regents - % of cohort	pearson	two-sided	478	0.609	[0.55, 0.66]	0.371	0.369	0.707	6.308734e-50	3.425e+46	1.0
9	32	sat_score	white_per	pearson	two-sided	478	0.630	[0.57, 0.68]	0.397	0.395	0.741	2.876059e-54	6.971e+50	1.0
10	42	sat_score	Advanced Regents - % of grads	pearson	two-sided	478	0.722	[0.68, 0.76]	0.522	0.520	0.912	2.673511e-78	5.2e+74	1.0
11	41	sat_score	Advanced Regents - % of cohort	pearson	two-sided	478	0.755	[0.71, 0.79]	0.569	0.568	0.984	3.835073e-89	3.128e+85	1.0

In [15]:

neg_corr = corr.sort_values(by=['r']).head(12).reset_index()
neg_corr

Out[15]:

	index	X	Y	method	tail	n	r	CI95%	r2	adj_r2	z	p-unc	BF10	power
0	13	sat_score	frl_percent	pearson	two-sided	478	-0.691	[-0.73, -0.64]	0.477	0.475	-0.850	4.902523e-69	3.24e+65	1.000
1	46	sat_score	Local - % of grads	pearson	two-sided	478	-0.487	[-0.55, -0.42]	0.237	0.234	-0.532	7.721751e-30	4.229e+26	1.000
2	48	sat_score	Dropped Out - % of cohort	pearson	two-sided	478	-0.460	[-0.53, -0.39]	0.212	0.208	-0.497	2.123609e-26	1.681e+23	1.000
3	45	sat_score	Local - % of cohort	pearson	two-sided	478	-0.421	[-0.49, -0.34]	0.177	0.174	-0.449	6.118468e-22	6.641e+18	1.000
4	22	sat_score	sped_percent	pearson	two-sided	478	-0.401	[-0.47, -0.32]	0.161	0.157	-0.425	6.608265e-20	6.569e+16	1.000
5	47	sat_score	Still Enrolled - % of cohort	pearson	two-sided	478	-0.373	[-0.45, -0.29]	0.139	0.136	-0.392	3.030167e-17	1.576e+14	1.000
6	20	sat_score	ell_percent	pearson	two-sided	478	-0.353	[-0.43, -0.27]	0.125	0.121	-0.369	1.647484e-15	3.107e+12	1.000
7	30	sat_score	hispanic_per	pearson	two-sided	478	-0.347	[-0.42, -0.27]	0.121	0.117	-0.362	5.262691e-15	9.936e+11	1.000
8	28	sat_score	black_per	pearson	two-sided	478	-0.298	[-0.38, -0.21]	0.089	0.085	-0.307	2.744013e-11	2.285e+08	1.000
9	44	sat_score	Regents w/o Advanced - % of grads	pearson	two-sided	478	-0.217	[-0.3, -0.13]	0.047	0.043	-0.221	1.738548e-06	5099.587	0.998
10	95	sat_score	is_intl	pearson	two-sided	478	-0.176	[-0.26, -0.09]	0.031	0.027	-0.178	1.133784e-04	96.264	0.972
11	19	sat_score	ell_num	pearson	two-sided	478	-0.129	[-0.22, -0.04]	0.017	0.013	-0.130	4.690113e-03	3.082	0.809

In [16]:

top_corrs = pd.concat([neg_corr, pos_corr]).reset_index().drop(columns=['level_0', 'index'])
top_corrs

Out[16]:

	X	Y	method	tail	n	r	CI95%	r2	adj_r2	z	p-unc	BF10	power
0	sat_score	frl_percent	pearson	two-sided	478	-0.691	[-0.73, -0.64]	0.477	0.475	-0.850	4.902523e-69	3.24e+65	1.000
1	sat_score	Local - % of grads	pearson	two-sided	478	-0.487	[-0.55, -0.42]	0.237	0.234	-0.532	7.721751e-30	4.229e+26	1.000
2	sat_score	Dropped Out - % of cohort	pearson	two-sided	478	-0.460	[-0.53, -0.39]	0.212	0.208	-0.497	2.123609e-26	1.681e+23	1.000
3	sat_score	Local - % of cohort	pearson	two-sided	478	-0.421	[-0.49, -0.34]	0.177	0.174	-0.449	6.118468e-22	6.641e+18	1.000
4	sat_score	sped_percent	pearson	two-sided	478	-0.401	[-0.47, -0.32]	0.161	0.157	-0.425	6.608265e-20	6.569e+16	1.000
5	sat_score	Still Enrolled - % of cohort	pearson	two-sided	478	-0.373	[-0.45, -0.29]	0.139	0.136	-0.392	3.030167e-17	1.576e+14	1.000
6	sat_score	ell_percent	pearson	two-sided	478	-0.353	[-0.43, -0.27]	0.125	0.121	-0.369	1.647484e-15	3.107e+12	1.000
7	sat_score	hispanic_per	pearson	two-sided	478	-0.347	[-0.42, -0.27]	0.121	0.117	-0.362	5.262691e-15	9.936e+11	1.000
8	sat_score	black_per	pearson	two-sided	478	-0.298	[-0.38, -0.21]	0.089	0.085	-0.307	2.744013e-11	2.285e+08	1.000
9	sat_score	Regents w/o Advanced - % of grads	pearson	two-sided	478	-0.217	[-0.3, -0.13]	0.047	0.043	-0.221	1.738548e-06	5099.587	0.998
10	sat_score	is_intl	pearson	two-sided	478	-0.176	[-0.26, -0.09]	0.031	0.027	-0.178	1.133784e-04	96.264	0.972
11	sat_score	ell_num	pearson	two-sided	478	-0.129	[-0.22, -0.04]	0.017	0.013	-0.130	4.690113e-03	3.082	0.809
12	sat_score	NUMBER OF STUDENTS / SEATS FILLED	pearson	two-sided	478	0.491	[0.42, 0.56]	0.241	0.238	0.537	2.022669e-30	1.592e+27	1.000
13	sat_score	pct_3_plus	pearson	two-sided	478	0.520	[0.45, 0.58]	0.270	0.267	0.576	1.965914e-34	1.491e+31	1.000
14	sat_score	asian_per	pearson	two-sided	478	0.527	[0.46, 0.59]	0.278	0.275	0.586	1.379931e-35	2.071e+32	1.000
15	sat_score	Number of Exams with scores 3 4 or 5	pearson	two-sided	478	0.548	[0.48, 0.61]	0.300	0.297	0.616	7.784329e-39	3.427e+35	1.000
16	sat_score	Total Exams Taken	pearson	two-sided	478	0.548	[0.48, 0.61]	0.301	0.298	0.616	7.442607e-39	3.583e+35	1.000
17	sat_score	AP Test Takers	pearson	two-sided	478	0.559	[0.49, 0.62]	0.312	0.309	0.631	1.483977e-40	1.736e+37	1.000
18	sat_score	is_specialized	pearson	two-sided	478	0.570	[0.51, 0.63]	0.325	0.322	0.648	1.525280e-42	1.624e+39	1.000
19	sat_score	pct_AP	pearson	two-sided	478	0.586	[0.52, 0.64]	0.343	0.341	0.672	2.122086e-45	1.105e+42	1.000
20	sat_score	Total Regents - % of cohort	pearson	two-sided	478	0.609	[0.55, 0.66]	0.371	0.369	0.707	6.308734e-50	3.425e+46	1.000
21	sat_score	white_per	pearson	two-sided	478	0.630	[0.57, 0.68]	0.397	0.395	0.741	2.876059e-54	6.971e+50	1.000
22	sat_score	Advanced Regents - % of grads	pearson	two-sided	478	0.722	[0.68, 0.76]	0.522	0.520	0.912	2.673511e-78	5.2e+74	1.000
23	sat_score	Advanced Regents - % of cohort	pearson	two-sided	478	0.755	[0.71, 0.79]	0.569	0.568	0.984	3.835073e-89	3.128e+85	1.000

To showcase the relative r-value "strengths" of these correlations, we can use a bar chart with a diverging color scheme. This can be done in Seaborn, but to harness the maximum effect of the coolwarm colormap, we will manually tweak the colors in Matplotlib.

In [17]:

plt.style.use('seaborn')

xvals = range(len(top_corrs))
my_cmap = plt.cm.get_cmap('coolwarm')
colors = my_cmap([0 + (x * 0.045) for x in range(24)]) # fully saturates colors at opposing ends of our data
f, ax = plt.subplots(figsize=(12, 10))
tops = plt.bar(x=xvals, height=top_corrs['r'], color=colors)
ax.set_xticks(xvals)
ax.set_xticklabels(top_corrs['Y'], rotation=90)
ax.set_xlabel('Feature')
ax.set_ylabel('Pearson Correlation Coefficient')
plt.show()

Our bar chart excels at showing the the r-values, but what do they really mean? Scatter plots are often used to vizualize the relationship indicated by Pearson's r. The stronger the (linear) correlation, the more we should see an upward or downward linear trend in the data points. This will also help us identify the extent to which outliers might be affecting our r-values.

Seaborn has a handy plot (regplot) that combines a scatter plot with a regression line of best fit, which will be perfect this job. To avoid individually making 24 graphs from scratch, we will write a function to display a grid of 12 plots at once. So we can use it later with any data/correlation tests we will include several keyword arguments and docstrings.

In [18]:

def graph_corrs(data, corrs, xcols='Y', ycol='sat_score', vals='r', coef='r', orientation=None, outliers=None, nrows=4, ncols=3):
    '''
    Plots a grid of Seaborn regplots with correlation labels.
    
    Parameters
    ----------
    data : Pandas DataFrame
        DataFrame containing all data
    corrs : Pandas DataFrame
        DataFrame containing correlations we want to graph.
    xcols : str
        `corrs` column with `data` feature names to be plotted
    ycol : str
        `data` column name for y axis feature
    vals : str
        `corrs` column with correlation values
    coef : str
        Symbol for correlation, e.g. 'r', 'rho', "r'", 'pi'
    orientation : {'none', 'negative', 'positive'}:
        Orientation of correlation (for color of regression line)
    outliers : list
        List of `data` column names that indicate outliers.  Must be in same order as `xcols`.
    nrows : int
        Number of rows for subplots
    ncols : int
        Number of columns for subplots
        
    '''
    # Line Color
    if orientation == 'negative':
        color = 'tab:red'
    elif orientation == 'positive':
        color = 'tab:green'
    else:
        color = 'tab:cyan'

    
    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(16, 16), sharey=True)
    
    for x in range(len(corrs)):
        # Scatter/outlier colors
        if coef == 'pi':
            outlier_colors = ['r' if point == True else 'b' for point in full[outliers[x]]]
            scatter = dict(color=outlier_colors)
        else:
            scatter=None
            
        g = sns.regplot(data=data,
                        y=ycol,
                        x=(corrs[xcols][x]),
                        color='b',
                        marker='+',
                        ax=fig.get_axes()[x],
                        label='{} = {}'.format(coef, corrs[vals][x].round(3)),
                        line_kws=dict(color=color, label='Regression Line'),
                        scatter_kws=scatter
                       )
        g.legend(loc='best')
    plt.tight_layout()
    plt.ylim(800, 2200)

Because of our docstrings, we can now remind ourselves which kewords we need by typing the following in our Jupyter Notebook:

In [19]:

?graph_corrs

Exploring Negative Correlations¶

If we take a look first at our neg_corrs plots we see that some of our features have clear downward trends, while others are blatantly influenced by outliers. One feature with a relatively clear downward linear trend is frl_percent, the percentage of students receiving free or reduced-price lunch. We will take a closer look at this later, as it is one of our only socioecomic indicators and appears to have a strong correlation to SAT score.

In [20]:

graph_corrs(full, neg_corr, orientation='negative')

In [21]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'frl_percent']][full['frl_percent'] > 85]

Out[21]:

	SCHOOL NAME	sat_score	borough	frl_percent
0	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	1122.0	Manhattan	88.6
24	EMMA LAZARUS HIGH SCHOOL	1188.0	Manhattan	97.0
38	MANHATTAN ACADEMY FOR ARTS & LANGUAGE	1208.0	Manhattan	97.5
39	LEGACY SCHOOL FOR INTEGRATED STUDIES	1062.0	Manhattan	93.9
64	MANHATTAN COMPREHENSIVE NIGHT AND DAY HIGH SCHOOL	1205.0	Manhattan	85.6
110	WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL	1174.0	Manhattan	87.5
111	HIGH SCHOOL FOR EXCELLENCE AND INNOVATION	1208.0	Manhattan	99.2
113	HIGH SCHOOL FOR INTERNATIONAL BUSINESS AND FIN...	1127.0	Manhattan	89.3
116	HIGH SCHOOL FOR HEALTH CAREERS AND SCIENCES	1224.0	Manhattan	90.7
118	GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M...	1014.0	Manhattan	92.8
126	UNIVERSITY HEIGHTS SECONDARY SCHOOL	1201.0	Bronx	90.1
128	FOREIGN LANGUAGE ACADEMY OF GLOBAL STUDIES	1186.0	Bronx	89.9
130	NEW EXPLORERS HIGH SCHOOL	1084.0	Bronx	90.4
136	BRONX STUDIO SCHOOL FOR WRITERS AND ARTISTS	1208.0	Bronx	85.6
143	ARCHIMEDES ACADEMY FOR MATH, SCIENCE AND TECHN...	1208.0	Bronx	87.3
148	BRONX BRIDGES HIGH SCHOOL	1208.0	Bronx	87.2
156	JANE ADDAMS HIGH SCHOOL FOR ACADEMIC CAREERS	1112.0	Bronx	85.7
165	LEADERSHIP INSTITUTE	1081.0	Bronx	94.8
169	ACADEMY FOR LANGUAGE AND TECHNOLOGY	951.0	Bronx	85.7
183	WEST BRONX ACADEMY FOR THE FUTURE	1158.0	Bronx	86.3
184	KINGSBRIDGE INTERNATIONAL HIGH SCHOOL	962.0	Bronx	95.1
190	ENGLISH LANGUAGE LEARNERS AND INTERNATIONAL SU...	1029.0	Bronx	92.8
191	HIGH SCHOOL FOR TEACHING AND THE PROFESSIONS	1106.0	Bronx	88.1
226	NEW DAY ACADEMY	1046.0	Bronx	87.0
227	METROPOLITAN HIGH SCHOOL, THE	1055.0	Bronx	86.6
231	EAST BRONX ACADEMY FOR THE FUTURE	1102.0	Bronx	86.2
233	PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE	970.0	Bronx	100.0
239	HIGH SCHOOL OF WORLD CULTURES	939.0	Bronx	90.6
243	MONROE ACADEMY FOR VISUAL ARTS & DESIGN	1038.0	Bronx	87.3
275	FRANCES PERKINS ACADEMY	1122.0	Brooklyn	86.9
315	ACADEMY FOR HEALTH CAREERS	1208.0	Brooklyn	87.7
461	ACADEMY FOR ENVIRONMENTAL LEADERSHIP	1098.0	Brooklyn	88.7
462	EBC HIGH SCHOOL FOR PUBLIC SERVICE–BUSHWICK	1154.0	Brooklyn	86.9
466	BUSHWICK LEADERS HIGH SCHOOL FOR ACADEMIC EXCE...	1055.0	Brooklyn	88.0

Even those without a clear linear trend can reveal interesting information, however. Both ell_percent and is_intl indicate that native english speakers have a distinct advantage on the SAT: all "International" schools and schools with over 40 percent "English language learners" had mean SAT scores under 1250. Some were even among the lowest in the city, with mean scores in the 900s. Of these 28 schools, there are 10 in Manhattan, 10 in the Bronx, 4 in Queens and 4 in Brooklyn. None are in Staten Island.

In [22]:

full[['SCHOOL NAME', 'sat_score', 'borough']][full['ell_percent'] > 40]

Out[22]:

	SCHOOL NAME	sat_score	borough
5	LOWER EAST SIDE PREPARATORY HIGH SCHOOL	1205.0	Manhattan
24	EMMA LAZARUS HIGH SCHOOL	1188.0	Manhattan
38	MANHATTAN ACADEMY FOR ARTS & LANGUAGE	1208.0	Manhattan
41	INTERNATIONAL HIGH SCHOOL AT UNION SQUARE	1208.0	Manhattan
45	MANHATTAN INTERNATIONAL HIGH SCHOOL	1227.0	Manhattan
55	MANHATTAN BRIDGES HIGH SCHOOL	1058.0	Manhattan
59	LIBERTY HIGH SCHOOL ACADEMY FOR NEWCOMERS	1156.0	Manhattan
64	MANHATTAN COMPREHENSIVE NIGHT AND DAY HIGH SCHOOL	1205.0	Manhattan
113	HIGH SCHOOL FOR INTERNATIONAL BUSINESS AND FIN...	1127.0	Manhattan
118	GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M...	1014.0	Manhattan
121	INTERNATIONAL COMMUNITY HIGH SCHOOL	945.0	Bronx
148	BRONX BRIDGES HIGH SCHOOL	1208.0	Bronx
169	ACADEMY FOR LANGUAGE AND TECHNOLOGY	951.0	Bronx
170	BRONX INTERNATIONAL HIGH SCHOOL	965.0	Bronx
184	KINGSBRIDGE INTERNATIONAL HIGH SCHOOL	962.0	Bronx
187	INTERNATIONAL SCHOOL FOR LIBERAL ARTS	934.0	Bronx
190	ENGLISH LANGUAGE LEARNERS AND INTERNATIONAL SU...	1029.0	Bronx
220	NEW WORLD HIGH SCHOOL	1048.0	Bronx
233	PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE	970.0	Bronx
239	HIGH SCHOOL OF WORLD CULTURES	939.0	Bronx
250	BROOKLYN INTERNATIONAL HIGH SCHOOL	981.0	Brooklyn
300	INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS	913.0	Brooklyn
337	MULTICULTURAL HIGH SCHOOL	887.0	Brooklyn
350	INTERNATIONAL HIGH SCHOOL AT LAFAYETTE	1026.0	Brooklyn
377	PAN AMERICAN INTERNATIONAL HIGH SCHOOL	951.0	Queens
383	INTERNATIONAL HIGH SCHOOL AT LAGUARDIA COMMUNI...	1064.0	Queens
390	FLUSHING INTERNATIONAL HIGH SCHOOL	1049.0	Queens
446	NEWCOMERS HIGH SCHOOL	1127.0	Queens

In [23]:

full[['SCHOOL NAME', 'sat_score', 'borough']][full['is_intl'] == 1]

Out[23]:

	SCHOOL NAME	sat_score	borough
41	INTERNATIONAL HIGH SCHOOL AT UNION SQUARE	1208.0	Manhattan
121	INTERNATIONAL COMMUNITY HIGH SCHOOL	945.0	Bronx
170	BRONX INTERNATIONAL HIGH SCHOOL	965.0	Bronx
184	KINGSBRIDGE INTERNATIONAL HIGH SCHOOL	962.0	Bronx
233	PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE	970.0	Bronx
300	INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS	913.0	Brooklyn
350	INTERNATIONAL HIGH SCHOOL AT LAFAYETTE	1026.0	Brooklyn
377	PAN AMERICAN INTERNATIONAL HIGH SCHOOL	951.0	Queens
390	FLUSHING INTERNATIONAL HIGH SCHOOL	1049.0	Queens
446	NEWCOMERS HIGH SCHOOL	1127.0	Queens

For hispanic_per and black_per, we see a slight downward linear trend, but there are a number of outlier schools with less than 20 percent Black or Hispanic students that may make the negative correlation appear stronger than it is. However, it is worth noting that if we look at the schools that have over 80 percent Black or Hispanic students we find some interesting insights:

None of these schools are in Staten Island.
Schools with a high percentage of Hispanic students are primarily in Manhattan or the Bronx. Half of these schools are International.
Schools with a high percentage of Black students are overwhelmingly in Brooklyn (50 of 57), with the remainder located in Queens.
Primarily Black schools have slightly higher SAT scores on average (1147) than primarily Hispanic schools (1085).

In [24]:

full[['SCHOOL NAME', 'sat_score', 'borough']][full['hispanic_per'] > 80]

Out[24]:

	SCHOOL NAME	sat_score	borough
38	MANHATTAN ACADEMY FOR ARTS & LANGUAGE	1208.0	Manhattan
55	MANHATTAN BRIDGES HIGH SCHOOL	1058.0	Manhattan
108	CITY COLLEGE ACADEMY OF THE ARTS	1270.0	Manhattan
109	COMMUNITY HEALTH ACADEMY OF THE HEIGHTS	1105.0	Manhattan
110	WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL	1174.0	Manhattan
113	HIGH SCHOOL FOR INTERNATIONAL BUSINESS AND FIN...	1127.0	Manhattan
114	HIGH SCHOOL FOR MEDIA AND COMMUNICATIONS	1098.0	Manhattan
115	HIGH SCHOOL FOR LAW AND PUBLIC SERVICE	1102.0	Manhattan
116	HIGH SCHOOL FOR HEALTH CAREERS AND SCIENCES	1224.0	Manhattan
118	GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M...	1014.0	Manhattan
148	BRONX BRIDGES HIGH SCHOOL	1208.0	Bronx
169	ACADEMY FOR LANGUAGE AND TECHNOLOGY	951.0	Bronx
184	KINGSBRIDGE INTERNATIONAL HIGH SCHOOL	962.0	Bronx
187	INTERNATIONAL SCHOOL FOR LIBERAL ARTS	934.0	Bronx
188	INsTECH ACADEMY (M.S. / HIGH SCHOOL 368)	1181.0	Bronx
233	PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE	970.0	Bronx
239	HIGH SCHOOL OF WORLD CULTURES	939.0	Bronx
262	JUAN MOREL CAMPOS SECONDARY SCHOOL	1085.0	Brooklyn
276	EL PUENTE ACADEMY FOR PEACE AND JUSTICE	1035.0	Brooklyn
289	SUNSET PARK HIGH SCHOOL	1208.0	Brooklyn
331	FRANKLIN K. LANE HIGH SCHOOL	1208.0	Brooklyn
337	MULTICULTURAL HIGH SCHOOL	887.0	Brooklyn
377	PAN AMERICAN INTERNATIONAL HIGH SCHOOL	951.0	Queens
462	EBC HIGH SCHOOL FOR PUBLIC SERVICE–BUSHWICK	1154.0	Brooklyn

In [25]:

full[['SCHOOL NAME', 'sat_score', 'borough']][full['black_per'] > 80]

Out[25]:

	SCHOOL NAME	sat_score	borough
245	ACADEMY OF BUSINESS AND COMMUNITY DEVELOPMENT	1231.0	Brooklyn
252	ACORN COMMUNITY HIGH SCHOOL	1116.0	Brooklyn
253	FREEDOM ACADEMY HIGH SCHOOL	1193.0	Brooklyn
255	BROOKLYN ACADEMY HIGH SCHOOL	1197.0	Brooklyn
256	BEDFORD STUYVESANT PREPARATORY HIGH SCHOOL	1183.0	Brooklyn
257	BEDFORD ACADEMY HIGH SCHOOL	1312.0	Brooklyn
260	BENJAMIN BANNEKER ACADEMY	1391.0	Brooklyn
285	PACIFIC HIGH SCHOOL	993.0	Brooklyn
287	METROPOLITAN CORPORATE ACADEMY HIGH SCHOOL	1101.0	Brooklyn
291	FREDERICK DOUGLASS ACADEMY IV SECONDARY SCHOOL	1068.0	Brooklyn
292	BOYS AND GIRLS HIGH SCHOOL	1097.0	Brooklyn
296	ACADEMY FOR COLLEGE PREPARATION AND CAREER EXP...	1139.0	Brooklyn
297	ACADEMY OF HOSPITALITY AND TOURISM	1045.0	Brooklyn
299	W.E.B. DUBOIS ACADEMIC HIGH SCHOOL	1092.0	Brooklyn
301	THE HIGH SCHOOL FOR GLOBAL CITIZENSHIP	1176.0	Brooklyn
302	SCHOOL FOR HUMAN RIGHTS, THE	1088.0	Brooklyn
303	SCHOOL FOR DEMOCRACY AND LEADERSHIP	1153.0	Brooklyn
304	HIGH SCHOOL FOR YOUTH AND COMMUNITY DEVELOPMEN...	1027.0	Brooklyn
305	HIGH SCHOOL FOR SERVICE & LEARNING AT ERASMUS	1105.0	Brooklyn
306	SCIENCE, TECHNOLOGY AND RESEARCH EARLY COLLEGE...	1360.0	Brooklyn
307	INTERNATIONAL ARTS BUSINESS SCHOOL	1146.0	Brooklyn
308	HIGH SCHOOL FOR PUBLIC SERVICE: HEROES OF TOMO...	1273.0	Brooklyn
310	BROOKLYN SCHOOL FOR MUSIC & THEATRE	1151.0	Brooklyn
311	BROWNSVILLE ACADEMY HIGH SCHOOL	1063.0	Brooklyn
312	MEDGAR EVERS COLLEGE PREPARATORY SCHOOL	1436.0	Brooklyn
313	CLARA BARTON HIGH SCHOOL	1251.0	Brooklyn
314	PAUL ROBESON HIGH SCHOOL	1083.0	Brooklyn
315	ACADEMY FOR HEALTH CAREERS	1208.0	Brooklyn
316	IT TAKES A VILLAGE ACADEMY	963.0	Brooklyn
317	BROOKLYN GENERATION SCHOOL	1145.0	Brooklyn
318	BROOKLYN THEATRE ARTS HIGH SCHOOL	1118.0	Brooklyn
319	KURT HAHN EXPEDITIONARY LEARNING SCHOOL	1092.0	Brooklyn
320	VICTORY COLLEGIATE HIGH SCHOOL	1143.0	Brooklyn
321	BROOKLYN BRIDGE ACADEMY	1097.0	Brooklyn
322	ARTS & MEDIA PREPARATORY ACADEMY	1080.0	Brooklyn
323	HIGH SCHOOL FOR INNOVATION IN ADVERTISING AND ...	1183.0	Brooklyn
324	CULTURAL ACADEMY FOR THE ARTS AND SCIENCES	1169.0	Brooklyn
325	HIGH SCHOOL FOR MEDICAL PROFESSIONS	1159.0	Brooklyn
326	OLYMPUS ACADEMY	1140.0	Brooklyn
327	ACADEMY FOR CONSERVATION AND THE ENVIRONMENT	1111.0	Brooklyn
328	URBAN ACTION ACADEMY	1135.0	Brooklyn
329	EAST BROOKLYN COMMUNITY HIGH SCHOOL	1191.0	Brooklyn
335	PERFORMING ARTS AND TECHNOLOGY HIGH SCHOOL	1149.0	Brooklyn
336	WORLD ACADEMY FOR TOTAL COMMUNITY HEALTH HIGH ...	1106.0	Brooklyn
368	BROOKLYN COLLEGIATE: A COLLEGE BOARD SCHOOL	1185.0	Brooklyn
369	FREDERICK DOUGLASS ACADEMY VII HIGH SCHOOL	1091.0	Brooklyn
370	BROOKLYN DEMOCRACY ACADEMY	1018.0	Brooklyn
372	METROPOLITAN DIPLOMA PLUS HIGH SCHOOL	1028.0	Brooklyn
373	TEACHERS PREPARATORY HIGH SCHOOL	1196.0	Brooklyn
430	QUEENS PREPARATORY ACADEMY	1099.0	Queens
431	PATHWAYS COLLEGE PREPARATORY SCHOOL: A COLLEGE...	1173.0	Queens
432	EXCELSIOR PREPARATORY HIGH SCHOOL	1202.0	Queens
434	PREPARATORY ACADEMY FOR WRITERS: A COLLEGE BOA...	1100.0	Queens
435	CAMBRIA HEIGHTS ACADEMY	1208.0	Queens
437	LAW, GOVERNMENT AND COMMUNITY SERVICE HIGH SCHOOL	1139.0	Queens
438	BUSINESS, COMPUTER APPLICATIONS & ENTREPRENEUR...	1152.0	Queens
439	HUMANITIES & ARTS MAGNET HIGH SCHOOL	1151.0	Queens

One final aspect that stands out from our neg_corr grid involves students with special needs. We see that the percentage of Special Ed students appears to have some negative correlation with SAT scores, but it is easy to miss the two other features that align with this finding: Local - % of grads and Local - % of cohort. These features don't refer to the locale of students, but rather to the type of diploma they receive.

In NYC, high school students can receive one of three different diplomas: Advanced Regents, Regents, or Local. Local diplomas are only available to qualifying students: those with Individualized Education Plans or disabilities.

The takeaway from these features, however, is not that special needs students do worse on the SAT. Rather, when we look at the schools that have more than 70 percent of students graduating with Local diplomas or that have 25 percent of students in a Special Ed program, we see two main things:

The Bronx and Brooklyn dominate these lists again, comprising 32 of the 45 schools. (Staten Island had one school that met these criteria); and
Even having scores on the lower end, none of these 45 schools had scores in the sub-1000 range — despite their higher than average proportion of special needs students.

In [26]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District']][full['Local - % of grads'] > 70]

Out[26]:

	SCHOOL NAME	sat_score	borough	District
27	INSTITUTE FOR COLLABORATIVE EDUCATION	1424.0	Manhattan	DISTRICT 02
35	LANDMARK HIGH SCHOOL	1170.0	Manhattan	DISTRICT 02
39	LEGACY SCHOOL FOR INTEGRATED STUDIES	1062.0	Manhattan	DISTRICT 02
57	INDEPENDENCE HIGH SCHOOL	1095.0	Manhattan	DISTRICT 02
63	SATELLITE ACADEMY HIGH SCHOOL	1032.0	Manhattan	DISTRICT 02
87	EDWARD A. REYNOLDS WEST SIDE HIGH SCHOOL	1121.0	Manhattan	DISTRICT 03
98	HARLEM RENAISSANCE HIGH SCHOOL	1008.0	Manhattan	DISTRICT 05
122	JILL CHAIFETZ TRANSFER HIGH SCHOOL	1208.0	Bronx	DISTRICT 07
123	BRONX HAVEN HIGH SCHOOL	1208.0	Bronx	DISTRICT 07
145	BRONX COMMUNITY HIGH SCHOOL	1112.0	Bronx	DISTRICT 08
149	BRONX GUILD	1105.0	Bronx	DISTRICT 08
155	HIGH SCHOOL X560 s BRONX ACADEMY HIGH SCHOOL	1171.0	Bronx	DISTRICT 08
255	BROOKLYN ACADEMY HIGH SCHOOL	1197.0	Brooklyn	DISTRICT 13
259	BROOKLYN HIGH SCHOOL FOR LEADERSHIP AND COMMUN...	1208.0	Brooklyn	DISTRICT 13
276	EL PUENTE ACADEMY FOR PEACE AND JUSTICE	1035.0	Brooklyn	DISTRICT 14
290	SOUTH BROOKLYN COMMUNITY HIGH SCHOOL	1271.0	Brooklyn	DISTRICT 15
299	W.E.B. DUBOIS ACADEMIC HIGH SCHOOL	1092.0	Brooklyn	DISTRICT 17
329	EAST BROOKLYN COMMUNITY HIGH SCHOOL	1191.0	Brooklyn	DISTRICT 18
342	W. H. MAXWELL CAREER AND TECHNICAL EDUCATION H...	1102.0	Brooklyn	DISTRICT 19
467	BUSHWICK COMMUNITY HIGH SCHOOL	1034.0	Brooklyn	DISTRICT 32

In [27]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District']][full['sped_percent'] > 25]

Out[27]:

	SCHOOL NAME	sat_score	borough	District
2	EAST SIDE COMMUNITY SCHOOL	1149.0	Manhattan	DISTRICT 01
4	MARTA VALLE HIGH SCHOOL	1207.0	Manhattan	DISTRICT 01
9	47 THE AMERICAN SIGN LANGUAGE AND ENGLISH SECO...	1182.0	Manhattan	DISTRICT 02
50	UNITY CENTER FOR URBAN TECHNOLOGIES	1070.0	Manhattan	DISTRICT 02
111	HIGH SCHOOL FOR EXCELLENCE AND INNOVATION	1208.0	Manhattan	DISTRICT 06
130	NEW EXPLORERS HIGH SCHOOL	1084.0	Bronx	DISTRICT 07
134	SAMUEL GOMPERS CAREER AND TECHNICAL EDUCATION ...	1184.0	Bronx	DISTRICT 07
140	PABLO NERUDA ACADEMY FOR ARCHITECTURE AND WORL...	1038.0	Bronx	DISTRICT 08
152	BANANA KELLY HIGH SCHOOL	1131.0	Bronx	DISTRICT 08
154	SCHOOL FOR COMMUNITY RESEARCH AND LEARNING	1134.0	Bronx	DISTRICT 08
159	URBAN ASSEMBLY ACADEMY FOR HISTORY AND CITIZEN...	1084.0	Bronx	DISTRICT 09
171	SCHOOL FOR EXCELLENCE	1074.0	Bronx	DISTRICT 09
218	HARRY S TRUMAN HIGH SCHOOL	1151.0	Bronx	DISTRICT 11
225	BRONX AEROSPACE HIGH SCHOOL	1163.0	Bronx	DISTRICT 11
229	PERFORMANCE CONSERVATORY HIGH SCHOOL	1074.0	Bronx	DISTRICT 12
237	BRONX CAREER AND COLLEGE PREPARATORY HIGH SCHOOL	1208.0	Bronx	DISTRICT 12
240	FANNIE LOU HAMER FREEDOM HIGH SCHOOL	1029.0	Bronx	DISTRICT 12
262	JUAN MOREL CAMPOS SECONDARY SCHOOL	1085.0	Brooklyn	DISTRICT 14
263	FOUNDATIONS ACADEMY	1208.0	Brooklyn	DISTRICT 14
274	AUTOMOTIVE HIGH SCHOOL	1093.0	Brooklyn	DISTRICT 14
278	BROOKLYN SCHOOL FOR GLOBAL STUDIES	1111.0	Brooklyn	DISTRICT 15
279	BROOKLYN SECONDARY SCHOOL FOR COLLABORATIVE ST...	1179.0	Brooklyn	DISTRICT 15
314	PAUL ROBESON HIGH SCHOOL	1083.0	Brooklyn	DISTRICT 17
342	W. H. MAXWELL CAREER AND TECHNICAL EDUCATION H...	1102.0	Brooklyn	DISTRICT 19
458	RALPH R. MCKEE CAREER AND TECHNICAL EDUCATION ...	1235.0	Staten Island	DISTRICT 31

With our initial exploration of the negative correlations done, we can begin outlining our discoveries:

Locations (District and Borough) appear to be related to average score, but likely due to various other factors.
Gender variables did not receive top r-values, but should be looked into.
Cultural aspects seem to play a part. Percentages of both Black and Hispanic students may have negative correlations to SAT score, but English proficiency seems to have the more noticeable connection at the extreme ends.
Poorer students (those who receive free or reduced-price lunch) have the most consistent correlation to low scores.
Schools with a greater percentage of special needs students have lower, but not the lowest, scores.

Exploring Positive Correlations¶

Moving on to our positive correlations, we see that our original AP-related features show very little linear trend. On the other hand, the AP features we created in this notebook — pct_3_plus and pct_AP — show clearer upward trends, but the strength of their r-values may be somewhat overinflated by outliers.

In [28]:

graph_corrs(full, pos_corr, orientation='positive')

If we take a closer look at the schools with either a) more than 20 percent AP test takers; or b) more than 70 percent 3+ AP scores, we notice the following:

A handful of schools had extremely high proportions of test takers scoring 3 and above, but had very low average SAT scores.

These are schools with high percentages of English Language Learners, suggesting that either AP tests don't require the same command of the English language, or that students at these schools took AP test in math/science rather than reading/writing.

These schools are more evenly distributed among the boroughs (except Staten Island) than the schools we examined through our neg_corr exploration.

Again, there is only one school from Staten Island, but this time it is the borough's only — and significant — outlier: Staten Island Technical High School.

In [29]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'pct_3_plus']][full['pct_3_plus'] > 0.7].sort_values(by='sat_score')

Out[29]:

	SCHOOL NAME	sat_score	borough	District	pct_3_plus
337	MULTICULTURAL HIGH SCHOOL	887.0	Brooklyn	DISTRICT 19	0.886364
169	ACADEMY FOR LANGUAGE AND TECHNOLOGY	951.0	Bronx	DISTRICT 09	1.000000
262	JUAN MOREL CAMPOS SECONDARY SCHOOL	1085.0	Brooklyn	DISTRICT 14	0.750000
446	NEWCOMERS HIGH SCHOOL	1127.0	Queens	DISTRICT 30	0.858896
128	FOREIGN LANGUAGE ACADEMY OF GLOBAL STUDIES	1186.0	Bronx	DISTRICT 07	0.923077
5	LOWER EAST SIDE PREPARATORY HIGH SCHOOL	1205.0	Manhattan	DISTRICT 01	0.923077
386	QUEENS VOCATIONAL AND TECHNICAL HIGH SCHOOL	1270.0	Queens	DISTRICT 24	0.740741
104	FREDERICK DOUGLASS ACADEMY	1374.0	Manhattan	DISTRICT 05	0.703540
58	HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES	1424.0	Manhattan	DISTRICT 02	0.927083
28	PROFESSIONAL PERFORMING ARTS HIGH SCHOOL	1522.0	Manhattan	DISTRICT 02	0.750000
29	BARUCH COLLEGE CAMPUS HIGH SCHOOL	1577.0	Manhattan	DISTRICT 02	0.765217
34	MILLENNIUM HIGH SCHOOL	1614.0	Manhattan	DISTRICT 02	0.705263
83	BEACON HIGH SCHOOL	1744.0	Manhattan	DISTRICT 03	0.710660
33	ELEANOR ROOSEVELT HIGH SCHOOL	1758.0	Manhattan	DISTRICT 02	0.719149
249	BROOKLYN TECHNICAL HIGH SCHOOL	1833.0	Brooklyn	DISTRICT 13	0.727790
427	QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...	1868.0	Queens	DISTRICT 28	0.813609
396	TOWNSEND HARRIS HIGH SCHOOL	1910.0	Queens	DISTRICT 25	0.785176
206	HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE	1920.0	Bronx	DISTRICT 10	0.804636
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31	0.893923
198	BRONX HIGH SCHOOL OF SCIENCE	1969.0	Bronx	DISTRICT 10	0.898973
48	STUYVESANT HIGH SCHOOL	2096.0	Manhattan	DISTRICT 02	0.939340

In [30]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'pct_AP']][full['pct_AP'] > 0.2].sort_values(by='sat_score')

Out[30]:

	SCHOOL NAME	sat_score	borough	District	pct_AP
115	HIGH SCHOOL FOR LAW AND PUBLIC SERVICE	1102.0	Manhattan	DISTRICT 06	0.203170
180	BRONX ENGINEERING AND TECHNOLOGY ACADEMY	1150.0	Bronx	DISTRICT 10	0.240991
161	EXIMIUS COLLEGE PREPARATORY ACADEMY: A COLLEGE...	1169.0	Bronx	DISTRICT 09	0.252809
384	HIGH SCHOOL FOR ARTS AND BUSINESS	1174.0	Queens	DISTRICT 24	0.239130
253	FREEDOM ACADEMY HIGH SCHOOL	1193.0	Brooklyn	DISTRICT 13	0.339450
263	FOUNDATIONS ACADEMY	1208.0	Brooklyn	DISTRICT 14	0.297872
14	URBAN ASSEMBLY SCHOOL OF DESIGN AND CONSTRUCTI...	1269.0	Manhattan	DISTRICT 02	0.229698
308	HIGH SCHOOL FOR PUBLIC SERVICE: HEROES OF TOMO...	1273.0	Brooklyn	DISTRICT 17	0.254197
374	ACADEMY OF FINANCE AND ENTERPRISE	1280.0	Queens	DISTRICT 24	0.226190
312	MEDGAR EVERS COLLEGE PREPARATORY SCHOOL	1436.0	Brooklyn	DISTRICT 17	0.246377
366	LEON M. GOLDSTEIN HIGH SCHOOL FOR THE SCIENCES	1627.0	Brooklyn	DISTRICT 22	0.359100
84	FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...	1707.0	Manhattan	DISTRICT 03	0.265259
33	ELEANOR ROOSEVELT HIGH SCHOOL	1758.0	Manhattan	DISTRICT 02	0.305720
249	BROOKLYN TECHNICAL HIGH SCHOOL	1833.0	Brooklyn	DISTRICT 13	0.397037
107	HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN...	1847.0	Manhattan	DISTRICT 05	0.280788
427	QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...	1868.0	Queens	DISTRICT 28	0.514354
396	TOWNSEND HARRIS HIGH SCHOOL	1910.0	Queens	DISTRICT 25	0.537719
206	HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE	1920.0	Bronx	DISTRICT 10	0.514589
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31	0.478261
198	BRONX HIGH SCHOOL OF SCIENCE	1969.0	Bronx	DISTRICT 10	0.394955
48	STUYVESANT HIGH SCHOOL	2096.0	Manhattan	DISTRICT 02	0.457992

Although our AP features weren't as promising as we'd hoped, we did see some correlation. However, another feature related to high-achieving students may shed some light on the matter. With a quick glance at our regplot grid, we see that all the schools flagged as "Specialized" have average SAT scores upwards of 1700. All except The Brooklyn Latin School also hold top spots in our AP features.

What is special about these schools? A Google search reveals that eight of the nine "Specialized" schools in our dataset are among the top 10 high schools in New York State (top 100 in the U.S.) and require eighth graders to achieve a certain rank on the Specialized High School Admissions Test prior to admission.

The ninth school on this list, Fiorello H. Laguardia High School of Music & Arts and Performing Arts, is also elite, though more emphasis is placed on their auditions than academics. Incoming students to this school only have to show "evidence of satisfactory achievement" which is defined as 75 or higher in core subjects and a 90 percent attendance rate. Emphasis on the Arts over academics may explain why this school has the lowest mean SAT score of the nine, but 1707 is still one of the highest in our dataset.

It is quite possible that this small group of elite schools is inflating our r-values for pct_AP and pct_3_plus.

In [31]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District']][full['is_specialized'] == 1].sort_values(by='sat_score')

Out[31]:

	SCHOOL NAME	sat_score	borough	District
84	FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...	1707.0	Manhattan	DISTRICT 03
265	BROOKLYN LATIN SCHOOL, THE	1740.0	Brooklyn	DISTRICT 14
249	BROOKLYN TECHNICAL HIGH SCHOOL	1833.0	Brooklyn	DISTRICT 13
107	HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN...	1847.0	Manhattan	DISTRICT 05
427	QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...	1868.0	Queens	DISTRICT 28
206	HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE	1920.0	Bronx	DISTRICT 10
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31
198	BRONX HIGH SCHOOL OF SCIENCE	1969.0	Bronx	DISTRICT 10
48	STUYVESANT HIGH SCHOOL	2096.0	Manhattan	DISTRICT 02

Moving on to the cultural demographics that appear in pos_corr, we see that SAT scores trend upward in relation to larger percentages of White and Asian students in a school. However, neither plot is particularly linear and asian_per shows a number of highly-Asian schools in the mid-to-lower SAT score range. Upon closer inspection, we find the lower-scoring schools are once again International schools. Schools with over 30 percent White students, on the other hand, had average SAT scores that bottomed out at 1195.

Interestingly, when comparing these ethnic demographic variables to our highly Hispanic or Black schools, we find some remarkable contrast in both location and saturation:

Our cutoff for examining "highly" Hispanic and Black schools was 80 percent. For White and Asian student proportions it is 30 percent. There are only 3 schools in our data that are above 80 percent White or Asian.
The majority of highly Black or Hispanic schools were located in Brooklyn, The Bronx or Manhattan. No Schools in Staten Island had over 80 percent Black or Hispanic students.
There are more highly Asian schools in Queens than the other boroughs, though Brooklyn and Manhattan are well-represented. Staten Island and the Bronx only have one school each in this category.
Highly White schools are more evenly spread among the boroughs — except the Bronx, where there is only one school with more than 30 percent White students. However, seven of Staten Island's 13 high schools fall into this category — more than any feature we've examined so far.
50+ percent Black or Hispanic student body and 30+ percent White or Asian student body are mutually exclusive.

In [32]:

# School info where 30+ percent Asian
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'asian_per']][full['asian_per'] > 30].sort_values(by='borough')

Out[32]:

	SCHOOL NAME	sat_score	borough	District	asian_per
198	BRONX HIGH SCHOOL OF SCIENCE	1969.0	Bronx	DISTRICT 10	63.5
363	MIDWOOD HIGH SCHOOL	1473.0	Brooklyn	DISTRICT 22	31.7
357	JOHN DEWEY HIGH SCHOOL	1262.0	Brooklyn	DISTRICT 21	39.1
350	INTERNATIONAL HIGH SCHOOL AT LAFAYETTE	1026.0	Brooklyn	DISTRICT 21	42.0
348	THE URBAN ASSEMBLY SCHOOL FOR CRIMINAL JUSTICE	1208.0	Brooklyn	DISTRICT 20	35.0
347	FRANKLIN DELANO ROOSEVELT HIGH SCHOOL	1244.0	Brooklyn	DISTRICT 20	44.2
346	FORT HAMILTON HIGH SCHOOL	1306.0	Brooklyn	DISTRICT 20	30.3
344	NEW UTRECHT HIGH SCHOOL	1272.0	Brooklyn	DISTRICT 20	35.9
265	BROOKLYN LATIN SCHOOL, THE	1740.0	Brooklyn	DISTRICT 14	36.8
250	BROOKLYN INTERNATIONAL HIGH SCHOOL	981.0	Brooklyn	DISTRICT 13	42.5
249	BROOKLYN TECHNICAL HIGH SCHOOL	1833.0	Brooklyn	DISTRICT 13	60.3
5	LOWER EAST SIDE PREPARATORY HIGH SCHOOL	1205.0	Manhattan	DISTRICT 01	84.7
79	INNOVATION DIPLOMA PLUS	1200.0	Manhattan	DISTRICT 03	48.7
64	MANHATTAN COMPREHENSIVE NIGHT AND DAY HIGH SCHOOL	1205.0	Manhattan	DISTRICT 02	35.3
58	HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES	1424.0	Manhattan	DISTRICT 02	89.5
48	STUYVESANT HIGH SCHOOL	2096.0	Manhattan	DISTRICT 02	72.1
45	MANHATTAN INTERNATIONAL HIGH SCHOOL	1227.0	Manhattan	DISTRICT 02	33.5
41	INTERNATIONAL HIGH SCHOOL AT UNION SQUARE	1208.0	Manhattan	DISTRICT 02	40.3
32	N.Y.C. MUSEUM SCHOOL	1419.0	Manhattan	DISTRICT 02	33.0
29	BARUCH COLLEGE CAMPUS HIGH SCHOOL	1577.0	Manhattan	DISTRICT 02	60.6
25	THE HIGH SCHOOL FOR LANGUAGE AND DIPLOMACY	1208.0	Manhattan	DISTRICT 02	44.3
24	EMMA LAZARUS HIGH SCHOOL	1188.0	Manhattan	DISTRICT 02	62.4
107	HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN...	1847.0	Manhattan	DISTRICT 05	36.2
420	JAMAICA GATEWAY TO THE SCIENCES	1307.0	Queens	DISTRICT 28	45.1
422	JAMAICA HIGH SCHOOL	1063.0	Queens	DISTRICT 28	31.0
423	HILLCREST HIGH SCHOOL	1194.0	Queens	DISTRICT 28	36.5
446	NEWCOMERS HIGH SCHOOL	1127.0	Queens	DISTRICT 30	48.3
425	QUEENS GATEWAY TO HEALTH SCIENCES SECONDARY SC...	1538.0	Queens	DISTRICT 28	36.5
427	QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...	1868.0	Queens	DISTRICT 28	74.4
440	YOUNG WOMEN'S LEADERSHIP SCHOOL, ASTORIA	1208.0	Queens	DISTRICT 30	39.2
415	HIGH SCHOOL FOR CONSTRUCTION TRADES, ENGINEERI...	1345.0	Queens	DISTRICT 27	30.7
424	THOMAS A. EDISON CAREER AND TECHNICAL EDUCATIO...	1372.0	Queens	DISTRICT 28	50.9
413	RICHMOND HILL HIGH SCHOOL	1154.0	Queens	DISTRICT 27	32.9
393	JOHN BOWNE HIGH SCHOOL	1243.0	Queens	DISTRICT 25	34.3
401	FRANCIS LEWIS HIGH SCHOOL	1474.0	Queens	DISTRICT 26	51.9
400	BENJAMIN N. CARDOZO HIGH SCHOOL	1514.0	Queens	DISTRICT 26	45.7
396	TOWNSEND HARRIS HIGH SCHOOL	1910.0	Queens	DISTRICT 25	55.5
391	EASTsWEST SCHOOL OF INTERNATIONAL STUDIES	1271.0	Queens	DISTRICT 25	62.7
390	FLUSHING INTERNATIONAL HIGH SCHOOL	1049.0	Queens	DISTRICT 25	64.5
389	QUEENS SCHOOL OF INQUIRY, THE	1396.0	Queens	DISTRICT 25	40.2
383	INTERNATIONAL HIGH SCHOOL AT LAGUARDIA COMMUNI...	1064.0	Queens	DISTRICT 24	43.0
378	BARD HIGH SCHOOL EARLY COLLEGE II	1663.0	Queens	DISTRICT 24	32.0
448	BACCALAUREATE SCHOOL FOR GLOBAL EDUCATION	1636.0	Queens	DISTRICT 30	34.8
403	BAYSIDE HIGH SCHOOL	1449.0	Queens	DISTRICT 26	45.2
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31	31.8

In [33]:

# School info where 30+ percent White
full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'white_per']][full['white_per'] > 30].sort_values(by='borough')

Out[33]:

	SCHOOL NAME	sat_score	borough	District	white_per
206	HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE	1920.0	Bronx	DISTRICT 10	53.8
356	EDWARD R. MURROW HIGH SCHOOL	1431.0	Brooklyn	DISTRICT 21	30.8
366	LEON M. GOLDSTEIN HIGH SCHOOL FOR THE SCIENCES	1627.0	Brooklyn	DISTRICT 22	65.6
364	JAMES MADISON HIGH SCHOOL	1350.0	Brooklyn	DISTRICT 22	46.4
361	BROOKLYN STUDIO SECONDARY SCHOOL	1313.0	Brooklyn	DISTRICT 21	56.5
355	KINGSBOROUGH EARLY COLLEGE SCHOOL	1208.0	Brooklyn	DISTRICT 21	44.7
351	RACHEL CARSON HIGH SCHOOL FOR COASTAL STUDIES	1237.0	Brooklyn	DISTRICT 21	37.6
346	FORT HAMILTON HIGH SCHOOL	1306.0	Brooklyn	DISTRICT 20	34.2
344	NEW UTRECHT HIGH SCHOOL	1272.0	Brooklyn	DISTRICT 20	30.5
84	FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...	1707.0	Manhattan	DISTRICT 03	49.2
6	NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ...	1621.0	Manhattan	DISTRICT 01	44.9
81	FRANK MCCOURT HIGH SCHOOL	1208.0	Manhattan	DISTRICT 03	37.0
34	MILLENNIUM HIGH SCHOOL	1614.0	Manhattan	DISTRICT 02	35.9
33	ELEANOR ROOSEVELT HIGH SCHOOL	1758.0	Manhattan	DISTRICT 02	63.7
31	SCHOOL OF THE FUTURE HIGH SCHOOL	1565.0	Manhattan	DISTRICT 02	38.0
30	N.Y.C. LAB SCHOOL FOR COLLABORATIVE STUDIES	1677.0	Manhattan	DISTRICT 02	45.9
28	PROFESSIONAL PERFORMING ARTS HIGH SCHOOL	1522.0	Manhattan	DISTRICT 02	41.8
27	INSTITUTE FOR COLLABORATIVE EDUCATION	1424.0	Manhattan	DISTRICT 02	54.6
8	BARD HIGH SCHOOL EARLY COLLEGE	1856.0	Manhattan	DISTRICT 01	49.8
83	BEACON HIGH SCHOOL	1744.0	Manhattan	DISTRICT 03	49.8
448	BACCALAUREATE SCHOOL FOR GLOBAL EDUCATION	1636.0	Queens	DISTRICT 30	37.3
444	FRANK SINATRA SCHOOL OF THE ARTS HIGH SCHOOL	1494.0	Queens	DISTRICT 30	46.3
426	QUEENS METROPOLITAN HIGH SCHOOL	1208.0	Queens	DISTRICT 28	40.4
410	SCHOLARS' ACADEMY	1532.0	Queens	DISTRICT 27	41.6
392	WORLD JOURNALISM PREPARATORY: A COLLEGE BOARD ...	1441.0	Queens	DISTRICT 25	51.6
378	BARD HIGH SCHOOL EARLY COLLEGE II	1663.0	Queens	DISTRICT 24	30.3
421	FOREST HILLS HIGH SCHOOL	1407.0	Queens	DISTRICT 28	33.7
456	SUSAN E. WAGNER HIGH SCHOOL	1388.0	Staten Island	DISTRICT 31	49.4
449	CSI HIGH SCHOOL FOR INTERNATIONAL STUDIES	1353.0	Staten Island	DISTRICT 31	58.5
450	GAYNOR MCCOWN EXPEDITIONARY LEARNING SCHOOL	1195.0	Staten Island	DISTRICT 31	60.0
451	THE MICHAEL J. PETRIDES SCHOOL	1426.0	Staten Island	DISTRICT 31	55.9
452	NEW DORP HIGH SCHOOL	1277.0	Staten Island	DISTRICT 31	53.3
455	TOTTENVILLE HIGH SCHOOL	1418.0	Staten Island	DISTRICT 31	82.1
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31	61.3

In [34]:

# Number of schools with 50+ percent Black or Hisp., 30+/50+ percent White or Asian
print('Schools with 50+ percent Black or Hispanic students: {}'.format(len(full[full['black_per'] > 50]) + len(full[full['hispanic_per'] > 50])))
print('Schools with 30+ percent White or Asian students: {}'.format(len(full[full['white_per'] > 30]) + len(full[full['asian_per'] > 30])))
print('Schools with 50+ percent White or Asian students: {}'.format(len(full[full['white_per'] > 50]) + len(full[full['asian_per'] > 50])))

Schools with 50+ percent Black or Hispanic students: 324
Schools with 30+ percent White or Asian students: 79
Schools with 50+ percent White or Asian students: 25

In [35]:

# Max percent of other races by type
race = ['White', 'Asian', 'Black', 'Hispanic']
feat = ['white_per', 'asian_per', 'black_per', 'hispanic_per']
race_per = [30, 30, 50, 50]

print('Max. racial percentages for schools with... \n')

for x in range(len(race)):
    print('{}+ percent {} students:'.format(race_per[x], race[x]))
    print(full[[f for f in feat if f != feat[x]]][full[feat[x]] >= race_per[x]].max())
    print('\n')

Max. racial percentages for schools with... 

30+ percent White students:
asian_per       35.9
black_per       25.0
hispanic_per    38.6
dtype: float64


30+ percent Asian students:
white_per       61.3
black_per       44.8
hispanic_per    46.4
dtype: float64


50+ percent Black students:
white_per       11.2
asian_per       24.1
hispanic_per    45.4
dtype: float64


50+ percent Hispanic students:
white_per    24.9
asian_per    27.1
black_per    45.5
dtype: float64

The last standout from our pos_corr regplots is the apparent positive correlation between the percentage of "Advanced Regents" graduates and SAT scores. Of all the plots, the Advanced Regents features had the clearest linear relationship to SAT performance. As briefly discussed earlier, students in New York high schools can earn Local, Regents or Advanced Regents diplomas. It is somewhat intuitive that Advanced Regents graduates would perform better on the SAT, as it has the strictest requirements of the three diploma types.

Schools with a high proportion of Advanced Regents graduates (40-plus percent) included all nine of our Specialized schools and were found in all five boroughs. However, only two schools each from the Bronx and Staten Island fit this classification — three of which were Specialized schools: High School of American Studies at Lehman College, Bronx High School of Science and Staten Island Technical High School.

In [36]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'Advanced Regents - % of cohort']][full['Advanced Regents - % of cohort'] > 40].sort_values(by='borough')

Out[36]:

	SCHOOL NAME	sat_score	borough	District	Advanced Regents - % of cohort
206	HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE	1920.0	Bronx	DISTRICT 10	80.666667
198	BRONX HIGH SCHOOL OF SCIENCE	1969.0	Bronx	DISTRICT 10	97.271429
257	BEDFORD ACADEMY HIGH SCHOOL	1312.0	Brooklyn	DISTRICT 13	60.780000
363	MIDWOOD HIGH SCHOOL	1473.0	Brooklyn	DISTRICT 22	51.500000
345	HIGH SCHOOL OF TELECOMMUNICATION ARTS AND TECH...	1323.0	Brooklyn	DISTRICT 20	40.442857
265	BROOKLYN LATIN SCHOOL, THE	1740.0	Brooklyn	DISTRICT 14	51.000000
249	BROOKLYN TECHNICAL HIGH SCHOOL	1833.0	Brooklyn	DISTRICT 13	82.514286
107	HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN...	1847.0	Manhattan	DISTRICT 05	73.483333
6	NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ...	1621.0	Manhattan	DISTRICT 01	77.380000
84	FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...	1707.0	Manhattan	DISTRICT 03	55.700000
58	HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES	1424.0	Manhattan	DISTRICT 02	72.920000
48	STUYVESANT HIGH SCHOOL	2096.0	Manhattan	DISTRICT 02	96.771429
33	ELEANOR ROOSEVELT HIGH SCHOOL	1758.0	Manhattan	DISTRICT 02	61.800000
30	N.Y.C. LAB SCHOOL FOR COLLABORATIVE STUDIES	1677.0	Manhattan	DISTRICT 02	52.557143
29	BARUCH COLLEGE CAMPUS HIGH SCHOOL	1577.0	Manhattan	DISTRICT 02	58.228571
92	MANHATTAN CENTER FOR SCIENCE AND MATHEMATICS	1430.0	Manhattan	DISTRICT 04	52.642857
374	ACADEMY OF FINANCE AND ENTERPRISE	1280.0	Queens	DISTRICT 24	44.033333
396	TOWNSEND HARRIS HIGH SCHOOL	1910.0	Queens	DISTRICT 25	98.485714
401	FRANCIS LEWIS HIGH SCHOOL	1474.0	Queens	DISTRICT 26	40.985714
403	BAYSIDE HIGH SCHOOL	1449.0	Queens	DISTRICT 26	46.371429
425	QUEENS GATEWAY TO HEALTH SCIENCES SECONDARY SC...	1538.0	Queens	DISTRICT 28	65.714286
427	QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...	1868.0	Queens	DISTRICT 28	86.533333
447	ACADEMY OF AMERICAN STUDIES	1470.0	Queens	DISTRICT 30	41.557143
451	THE MICHAEL J. PETRIDES SCHOOL	1426.0	Staten Island	DISTRICT 31	43.914286
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31	95.000000

In [37]:

full[['SCHOOL NAME', 'sat_score', 'borough', 'District', 'Advanced Regents - % of grads']][full['Advanced Regents - % of grads'] > 40].sort_values(by='borough')

Out[37]:

	SCHOOL NAME	sat_score	borough	District	Advanced Regents - % of grads
206	HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE	1920.0	Bronx	DISTRICT 10	84.233333
198	BRONX HIGH SCHOOL OF SCIENCE	1969.0	Bronx	DISTRICT 10	99.185714
260	BENJAMIN BANNEKER ACADEMY	1391.0	Brooklyn	DISTRICT 13	40.685714
363	MIDWOOD HIGH SCHOOL	1473.0	Brooklyn	DISTRICT 22	62.171429
356	EDWARD R. MURROW HIGH SCHOOL	1431.0	Brooklyn	DISTRICT 21	47.814286
345	HIGH SCHOOL OF TELECOMMUNICATION ARTS AND TECH...	1323.0	Brooklyn	DISTRICT 20	54.700000
265	BROOKLYN LATIN SCHOOL, THE	1740.0	Brooklyn	DISTRICT 14	53.150000
257	BEDFORD ACADEMY HIGH SCHOOL	1312.0	Brooklyn	DISTRICT 13	64.740000
249	BROOKLYN TECHNICAL HIGH SCHOOL	1833.0	Brooklyn	DISTRICT 13	89.671429
107	HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGIN...	1847.0	Manhattan	DISTRICT 05	91.216667
5	LOWER EAST SIDE PREPARATORY HIGH SCHOOL	1205.0	Manhattan	DISTRICT 01	44.771429
88	MANHATTAN / HUNTER SCIENCE HIGH SCHOOL	1446.0	Manhattan	DISTRICT 03	41.000000
84	FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...	1707.0	Manhattan	DISTRICT 03	58.385714
58	HIGH SCHOOL FOR DUAL LANGUAGE AND ASIAN STUDIES	1424.0	Manhattan	DISTRICT 02	84.000000
48	STUYVESANT HIGH SCHOOL	2096.0	Manhattan	DISTRICT 02	99.242857
33	ELEANOR ROOSEVELT HIGH SCHOOL	1758.0	Manhattan	DISTRICT 02	62.000000
30	N.Y.C. LAB SCHOOL FOR COLLABORATIVE STUDIES	1677.0	Manhattan	DISTRICT 02	54.571429
29	BARUCH COLLEGE CAMPUS HIGH SCHOOL	1577.0	Manhattan	DISTRICT 02	59.214286
6	NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ...	1621.0	Manhattan	DISTRICT 01	78.660000
92	MANHATTAN CENTER FOR SCIENCE AND MATHEMATICS	1430.0	Manhattan	DISTRICT 04	65.714286
447	ACADEMY OF AMERICAN STUDIES	1470.0	Queens	DISTRICT 30	49.471429
427	QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...	1868.0	Queens	DISTRICT 28	92.200000
425	QUEENS GATEWAY TO HEALTH SCIENCES SECONDARY SC...	1538.0	Queens	DISTRICT 28	71.514286
403	BAYSIDE HIGH SCHOOL	1449.0	Queens	DISTRICT 26	62.128571
374	ACADEMY OF FINANCE AND ENTERPRISE	1280.0	Queens	DISTRICT 24	49.400000
400	BENJAMIN N. CARDOZO HIGH SCHOOL	1514.0	Queens	DISTRICT 26	47.357143
396	TOWNSEND HARRIS HIGH SCHOOL	1910.0	Queens	DISTRICT 25	98.857143
387	AVIATION CAREER & TECHNICAL EDUCATION HIGH SCHOOL	1364.0	Queens	DISTRICT 24	43.185714
401	FRANCIS LEWIS HIGH SCHOOL	1474.0	Queens	DISTRICT 26	53.557143
451	THE MICHAEL J. PETRIDES SCHOOL	1426.0	Staten Island	DISTRICT 31	46.957143
459	STATEN ISLAND TECHNICAL HIGH SCHOOL	1953.0	Staten Island	DISTRICT 31	97.814286

Exploring the Male-Female Split¶

To this point we haven't seen any information relating to a school's male to female ratio. It is possible that there is no correlation between SAT scores and gender. After all, male_per and female_per received r-values of -0.096 and 0.097 respectively. As we mentioned earlier though, Pearson's r is only accurate if certain assumptions are met, and it only accounts for linear relationships. To make sure we don't rule out a gender correlation early, we should at least take a look at the scatter plots for each.

In [38]:

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5), sharey=True)
sns.scatterplot(data=full, x='male_per', y='sat_score', color='tab:blue', ax=ax1)
sns.scatterplot(data=full, x='female_per', y='sat_score', color='tab:pink', ax=ax2)
plt.tight_layout()

The scatter plots indicate virtually no linear correlation, which lines up with the r-values they received from our Pearson analysis. However, we do see that all the higher scoring schools have a male to female split between roughly 30/70 to 70/30. This is almost certainly becuase the vast majority of schools falls into that range. However, we can rule out a curvilinear relationship (in this case an upside-down U) by plotting these features with a regplot and adding the keyword argument order=2. Changing the order argument to a number greater than 1 prompts regplot to call np.polyfit under the hood and estimate a polynomial regression on our data. If these features did have a curvilinear relationship, the regression line would show it here.

In [39]:

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5), sharey=True)
sns.regplot(data=full, x='male_per', y='sat_score', order=2, ax=ax1, color='tab:blue', line_kws=dict(color='k'))
sns.regplot(data=full, x='female_per', y='sat_score', order=2, ax=ax2, color='tab:pink', line_kws=dict(color='k'))
plt.tight_layout()

Having completed our exploratory analysis, we can answer the first part of our initial inquiry:

Both District and Borough appear to be related to average SAT score, influenced by factors such as cultural and socioeconomic makeup.
There is likely no correlation between gender and SAT performance.
Cultural demographics are correlated to SAT performance, though in differing degrees:

Higher percentages of both Black and Hispanic students tend to correlate with lower SAT scores, while larger percentages of White and Asian students correlate with higher SAT scores. However, the metric for classifying schools with high Black/Hispanic students is different than that for White/Asian students: Many schools have well over 50 percent Black or Hispanic students (324 of 478), while the number of schools comprised of 50 percent White or Asian students is minimal (25 of 478; 79 schools are 30+ percent White or Asian).
Schools with 30+ percent White or Asian students and schools with 50+ percent Black or Hispanic students are mutually exclusive
Poor English proficiency seems to outweigh ethnic correlations — schools with high proportions of English Language Learners accounted for the lowest SAT scores regardless of racial split.

Poorer students and neighborhoods have the most consistent correlation to low scores. Percent of students receiving free or reduced-price lunches had a clear negative correlation with SAT scores, and the Bronx — New York's poorest borough — had the most low-scoring schools in the city.
The percentage of high-performing/advanced students is the best indicator of top SAT scores, and while schools with a greater percentage of special needs students tended to have lower scores, they were not as low as scores from International schools.

Strength and Significance of Correlations¶

Correlation to Numerical Features¶

To wrap up this project, we want to know the strength of the correlations we found and whether they are statistically significant. Earlier in this notebook we touched briefly on Pearson's Product-Moment Correlation and mentioned that it is unlikely to be accurate for our data. That is because two of the Pearson test's main assumptions are 1) bivariate normality; and 2) homoscedasticity of the data. This is a fancy way of saying that the two variables being compared need to have 1) a normal bell curve distribution; and 2) roughly equal variances.

We already know that our main variable, sat_score, has a skew-right distribution, so we automatically break that assumption. What about our other variables though? Is sat_score close enough to a normal distribution to use Pearson's r if our other variable is normally distributed? The following examples help illustrate the issue.

In [40]:

sns.jointplot(data=full, y='sat_score', x='sped_percent', kind='reg', color='r',  marginal_kws=dict(color='k'))

Out[40]:

<seaborn.axisgrid.JointGrid at 0x1eef62187f0>

In [41]:

sns.jointplot(data=full, y='sat_score', x='white_per', kind='reg', color='b',  marginal_kws=dict(color='k'))

Out[41]:

<seaborn.axisgrid.JointGrid at 0x1eef438b390>

Above, we have joint plots representing two of our variables, sped_percent and white_per, plotted against sat_score. The histograms on the side and top represent the shape of each variable. As we can see, sped_percent is much closer to a normal distribution than white_per, which is heavily skewed right. However, in both plots we see evidence of heteroscedasticity. The telltale sign of heteroscedasticity is data that fans out in a cone shape rather than a tube shape. The outliers we see in both plots due to unequal variances make it highly unlikely that Pearson's r will accurately tell us the strength of the correlation of either one.

Although this probably holds true for each of the features we are interested in, Pingouin makes it relatively painless to double check. Using pg.homoscedasticity and pg.multivariate_normality we can simply loop through our variables and print out the results:

In [42]:

neg_feats = ['frl_percent', 'Local - % of grads', 'sped_percent', 'ell_percent', 'hispanic_per', 'black_per']
pos_feats = ['pct_3_plus', 'asian_per', 'pct_AP', 'white_per', 'Advanced Regents - % of grads', 'Advanced Regents - % of cohort']

for x in neg_feats:
    print(x)
    print(pg.homoscedasticity(full[['sat_score', x]]), pg.multivariate_normality(full[['sat_score', x]]))
    print('\n')

for x in pos_feats:
    print(x)
    print(pg.homoscedasticity(full[['sat_score', x]]), pg.multivariate_normality(full[['sat_score', x]]))
    print('\n')

frl_percent
                   W  pval  equal_var
levene  3.093425e+30   0.0      False (False, 6.120765230391291e-25)


Local - % of grads
                   W  pval  equal_var
levene  3.222483e+30   0.0      False (False, 1.25375145341369e-29)


sped_percent
                   W  pval  equal_var
levene  3.311444e+30   0.0      False (False, 2.198084091158776e-27)


ell_percent
                   W  pval  equal_var
levene  3.108404e+30   0.0      False (False, 2.8912797050762057e-51)


hispanic_per
                   W  pval  equal_var
levene  3.306292e+30   0.0      False (False, 7.985685861620852e-25)


black_per
                   W  pval  equal_var
levene  3.563503e+30   0.0      False (False, 1.1467838914476226e-30)


pct_3_plus
                   W  pval  equal_var
levene  7.619586e+30   0.0      False (False, 6.135013853048197e-52)


asian_per
                   W  pval  equal_var
levene  2.701160e+30   0.0      False (False, 2.0947573843702347e-53)


pct_AP
                   W  pval  equal_var
levene  4.348325e+30   0.0      False (False, 7.291229071964563e-41)


white_per
                   W  pval  equal_var
levene  2.532577e+30   0.0      False (False, 8.316863964555129e-53)


Advanced Regents - % of grads
                   W  pval  equal_var
levene  2.986968e+30   0.0      False (False, 3.7261450769584264e-38)


Advanced Regents - % of cohort
                   W  pval  equal_var
levene  1.815248e+30   0.0      False (False, 4.156416769203743e-43)

Unsurprisingly, all of our features fail the tests for bivariate normality and homoscedasticity. This is not uncommon with real world data, which is rarely a perfect fit for theoretical statistical models out of the box. Fortunately, there are still ways to analyze correlation strength and significance within our dataset. Two of the most common ways are to use non-parametric statistical tests (tests that make fewer assumptions on the structure/distribution of data) or to transform the data using a power transform, such as a Box-Cox transformation to stabalize variance and bring the distribution closer to normal.

For our purposes, using a non-parametric test for analyzing our numeric features will be more straightforward than using power transforms. Since we know that several (if not all) of our features have outliers, we want to use a test that — unlike Pearson's r — is robust to such outliers. A relatively new test called Shepherd's pi has been shown to be more robust to bivariate outliers than Pearson's r, Spearman's rho and skipped correlation (r'). Once again, we can easily conduct this test on our data using Pingouin's pairwise_corr function.

In [43]:

shep_corr_neg = pg.pairwise_corr(full, columns=[['sat_score'], neg_feats], method='shepherd')
shep_corr_pos = pg.pairwise_corr(full, columns=[['sat_score'], pos_feats], method='shepherd')
all_shep_corrs = pd.concat([shep_corr_neg, shep_corr_pos]).sort_values(by='r').reset_index().drop(columns=['index'])

all_shep_corrs['p_doubled'] = all_shep_corrs['p-unc'].apply(lambda x: 1 if x*2 > 1 else x*2) # p-val x 2 to account for removing outliers

all_shep_corrs[['X', 'Y', 'outliers', 'r', 'CI95%', 'p-unc', 'p_doubled']]

Out[43]:

	X	Y	outliers	r	CI95%	p-unc	p_doubled
0	sat_score	sped_percent	42.0	-0.463	[-0.53, -0.39]	1.666789e-24	3.333578e-24
1	sat_score	Local - % of grads	37.0	-0.454	[-0.52, -0.38]	8.124567e-24	1.624913e-23
2	sat_score	frl_percent	34.0	-0.446	[-0.52, -0.37]	4.409605e-23	8.819210e-23
3	sat_score	ell_percent	46.0	-0.309	[-0.39, -0.23]	4.827200e-11	9.654400e-11
4	sat_score	black_per	26.0	-0.270	[-0.35, -0.19]	5.289608e-09	1.057922e-08
5	sat_score	hispanic_per	25.0	-0.162	[-0.25, -0.07]	5.334939e-04	1.066988e-03
6	sat_score	pct_AP	36.0	0.239	[0.15, 0.32]	3.562404e-07	7.124807e-07
7	sat_score	pct_3_plus	39.0	0.393	[0.31, 0.47]	1.192349e-17	2.384697e-17
8	sat_score	Advanced Regents - % of grads	37.0	0.503	[0.43, 0.57]	1.172161e-29	2.344321e-29
9	sat_score	white_per	40.0	0.508	[0.44, 0.57]	3.566473e-30	7.132945e-30
10	sat_score	Advanced Regents - % of cohort	36.0	0.536	[0.47, 0.6]	3.176937e-34	6.353874e-34
11	sat_score	asian_per	42.0	0.566	[0.5, 0.62]	2.855790e-38	5.711580e-38

As we can see, the Shepherd's pi correlation identified a number of outliers in each of the features we analyzed. However, even after removing those outliers from the analysis, all 12 of our features have a statistically significant correlation (p_doubled < 0.05), including those with the weakest correlation (black_per, hispanic_per and pct_AP).

To visualize how the Shepherd's pi correlation flags outliers, we can take a look at the Pingouin source code and use the shepherd function to create columns in our full dataset that correspond to the outliers. The code below does the following:

Imports underlying shepherd function from Pingouin
Loops through each of our top correlating features, passing each through shepherd() and saving the list of outliers as a new column in the full dataset
Creates a list of the outlier column names
Plots each regplot using our graph_corrs function, where the list of outlier column names is used to create a custom color map for the scatter plots: red indicates outliers, blue for everything else.

In [44]:

from pingouin.correlation import shepherd

outlier_columns = []

for feat in all_shep_corrs['Y']:
    shep_r, shep_pval, outliers = shepherd(full[feat], full['sat_score'])
    full['{}_outliers'.format(feat)] = outliers
    outlier_columns.append('{}_outliers'.format(feat))

In [45]:

# First 10 rows of outlier info
full[outlier_columns].head(10)

Out[45]:

	sped_percent_outliers	Local - % of grads_outliers	frl_percent_outliers	ell_percent_outliers	black_per_outliers	hispanic_per_outliers	pct_AP_outliers	pct_3_plus_outliers	Advanced Regents - % of grads_outliers	white_per_outliers	Advanced Regents - % of cohort_outliers	asian_per_outliers
0	False	False	False	False	False	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False	False	False	False
5	False	False	False	True	False	False	False	True	True	False	False	True
6	True	True	True	True	True	True	True	True	True	True	True	True
7	False	False	False	False	False	False	False	False	False	False	False	False
8	True	True	True	True	True	True	True	True	True	True	True	True
9	True	False	False	False	False	False	False	False	False	False	False	False

In [46]:

graph_corrs(full, all_shep_corrs, coef='pi', outliers=outlier_columns)

Correlation to District and Borough¶

We began this notebook by using maps and boxplots to view correlations between sat_score and both District and borough, so it is only fitting to end by looking at the significance of those correlations. However, boxplots and maps alone can't indicate statistical significance. Because our District and borough columns are categorical rather than numerical, we cannot use the same Shepherd's pi method we used for our other features. Instead we will need to do an analysis of variance (ANOVA) test, which determines whether the difference between the means of three or more groups are statistically significant from one another.

Although the traditional ANOVA (or F-test) is robust against the assumption of normal distributions in our data, much like Pearson's correlation it is unreliable when the groups of data have unequal variances. Fortunately, another ANOVA — Welch's ANOVA — is not sensitive to unequal variances and actually tends to be more accurate than the traditional ANOVA in most circumstances.

For our analysis, we will use Welch's ANOVA to test whether there is a significant difference between SAT scores in any two districts or boroughs, followed by the Games-Howell post-hoc test to determine pair(s) are significantly different from one another.

Our null hypothesis is that there are no significant differences between the groups' means: $H_0 : \mu_1 = \mu_2 = \mu_3 \cdots \mu_k$, where $\mu$ = group mean and $k$ = number of groups.

Our alternative hypothesis is that there is a significant difference between group means: $H_a : \mu_i \neq \mu_j$, where $\mu_i$ and $\mu_j$ can be the mean of any group.

If there is at least one group with a significant difference from another group (p < 0.05), we will reject our null hypothesis.

Using Pingouin's welch_anova() function to analyze both districts and boroughs, we see that our p-values are less than 0.05 and we can thus reject the null hypothesis for both.

In [47]:

pg.welch_anova(dv='sat_score', between='District', data=full)

Out[47]:

	Source	ddof1	ddof2	F	p-unc
0	District	33	99.517	3.664	3.122246e-07

In [48]:

pg.welch_anova(dv='sat_score', between='borough', data=full)

Out[48]:

	Source	ddof1	ddof2	F	p-unc
0	borough	4	77.497	12.508	6.722958e-08

When we run our post-hoc Games-Howell test on pairs of districts, we see that our p-values indicate significance mostly between pairs containing District 02. We also see that the effect size (hedges) is very large, even for those district pairs that were not found to be significantly different. For example, District 16 vs. District 31 receieved a p-value of 0.118, even though it has an effect size of 1.9 (indicating 79.4% non-overlap between the two).

So how can we explain this pair of districts not meeting statistical significance? We simply do not have enough schools in each district to be sure for most district pairs. If we take a look at the number of schools in each district, we see that District 02 has the most (65), with District 10 coming in second (28), and 29 of our 34 districts with fewer than 20 schools. With group sizes so low, it is important to remember that a p-value greater than 0.05 doesn't necessarily mean there is no statistical significance — it means that we cannot confidently reject our null hypothesis with the data we have.

In [49]:

district_gh = pg.pairwise_gameshowell(data=full, dv='sat_score', between='District').sort_values(by='pval')
district_gh[district_gh['pval'] < 0.15] # Top 20 of over 500 different pairs

Out[49]:

	A	B	mean(A)	mean(B)	diff	se	tail	T	df	pval	hedges
74	DISTRICT 02	DISTRICT 12	1276.585	1107.722	168.862	21.858	two-sided	5.463	62.674	0.001000	1.441
80	DISTRICT 02	DISTRICT 18	1276.585	1123.286	153.299	19.704	two-sided	5.501	68.744	0.001000	1.605
71	DISTRICT 02	DISTRICT 09	1276.585	1128.636	147.948	22.383	two-sided	4.674	68.783	0.003125	1.143
81	DISTRICT 02	DISTRICT 19	1276.585	1118.714	157.870	23.811	two-sided	4.688	41.013	0.004563	1.368
69	DISTRICT 02	DISTRICT 07	1276.585	1141.176	135.408	20.901	two-sided	4.581	66.943	0.004629	1.236
70	DISTRICT 02	DISTRICT 08	1276.585	1159.810	116.775	18.429	two-sided	4.481	83.271	0.006096	1.115
455	DISTRICT 18	SPECIAL ED DISTRICT 75	1123.286	1222.000	-98.714	14.627	two-sided	-4.772	17.373	0.010279	-2.121
350	DISTRICT 12	SPECIAL ED DISTRICT 75	1107.722	1222.000	-114.278	17.421	two-sided	-4.638	22.550	0.010710	-1.998
78	DISTRICT 02	DISTRICT 16	1276.585	1111.600	164.985	23.994	two-sided	4.862	13.711	0.011632	2.231
73	DISTRICT 02	DISTRICT 11	1276.585	1163.211	113.374	20.317	two-sided	3.946	75.562	0.043497	1.020
85	DISTRICT 02	DISTRICT 23	1276.585	1085.167	191.418	30.248	two-sided	4.475	9.995	0.044086	1.888
348	DISTRICT 12	DISTRICT 31	1107.722	1359.167	-251.444	44.336	two-sided	-4.010	13.598	0.087499	-1.454
513	DISTRICT 23	DISTRICT 31	1085.167	1359.167	-274.000	49.019	two-sided	-3.952	15.850	0.089291	-1.882
337	DISTRICT 12	DISTRICT 20	1107.722	1247.333	-139.611	24.899	two-sided	-3.965	10.433	0.113031	-1.805
422	DISTRICT 16	DISTRICT 31	1111.600	1359.167	-247.567	45.427	two-sided	-3.854	13.977	0.118128	-1.947
453	DISTRICT 18	DISTRICT 31	1123.286	1359.167	-235.881	43.315	two-sided	-3.851	12.451	0.125475	-1.467
284	DISTRICT 09	SPECIAL ED DISTRICT 75	1128.636	1222.000	-93.364	18.075	two-sided	-3.652	26.088	0.137700	-1.541
467	DISTRICT 19	DISTRICT 31	1118.714	1359.167	-240.452	45.331	two-sided	-3.751	14.644	0.140433	-1.429
469	DISTRICT 19	SPECIAL ED DISTRICT 75	1118.714	1222.000	-103.286	19.816	two-sided	-3.686	18.622	0.144867	-1.638
94	DISTRICT 02	DISTRICT 32	1276.585	1107.286	169.299	31.595	two-sided	3.789	11.198	0.148178	1.491

In [50]:

full['DBN'].groupby(full['District']).count()

Out[50]:

District
ALTERNATIVE DISTRICT 79     3
DISTRICT 01                 9
DISTRICT 02                65
DISTRICT 03                17
DISTRICT 04                 7
DISTRICT 05                10
DISTRICT 06                11
DISTRICT 07                17
DISTRICT 08                21
DISTRICT 09                22
DISTRICT 10                28
DISTRICT 11                19
DISTRICT 12                18
DISTRICT 13                18
DISTRICT 14                16
DISTRICT 15                13
DISTRICT 16                 5
DISTRICT 17                20
DISTRICT 18                14
DISTRICT 19                14
DISTRICT 20                 6
DISTRICT 21                13
DISTRICT 22                 5
DISTRICT 23                 6
DISTRICT 24                15
DISTRICT 25                11
DISTRICT 26                 5
DISTRICT 27                11
DISTRICT 28                14
DISTRICT 29                10
DISTRICT 30                 9
DISTRICT 31                12
DISTRICT 32                 7
SPECIAL ED DISTRICT 75      7
Name: DBN, dtype: int64

We are, however, in luck when it comes to our boroughs. The results of the Games-Howell on borough match up to what we see in our boxplots (re-illustrated below):

Bronx and Brooklyn are both significantly different from Manhattan, Queens and Staten Island.
Bronx and Brooklyn are not significantly different from each other.
There is no significant difference between any combination of Manhattan, Queens and Staten Island.

In [51]:

boro_gh = pg.pairwise_gameshowell(data=full, dv='sat_score', between='borough').sort_values(by='pval')
boro_gh

Out[51]:

	A	B	mean(A)	mean(B)	diff	se	tail	T	df	pval	hedges
1	Bronx	Manhattan	1153.500	1253.711	-100.211	14.665	two-sided	-4.832	210.263	0.001000	-0.611
2	Bronx	Queens	1153.500	1268.462	-114.962	15.695	two-sided	-5.179	132.512	0.001000	-0.741
4	Brooklyn	Manhattan	1171.775	1253.711	-81.935	14.511	two-sided	-3.993	207.857	0.001000	-0.496
5	Brooklyn	Queens	1171.775	1268.462	-96.686	15.551	two-sided	-4.396	129.613	0.001000	-0.621
3	Bronx	Staten Island	1153.500	1347.538	-194.038	40.282	two-sided	-3.406	13.031	0.011764	-0.986
6	Brooklyn	Staten Island	1171.775	1347.538	-175.763	40.227	two-sided	-3.090	12.960	0.027973	-0.892
8	Manhattan	Staten Island	1253.711	1347.538	-93.828	41.309	two-sided	-1.606	14.400	0.492869	-0.466
9	Queens	Staten Island	1268.462	1347.538	-79.077	41.686	two-sided	-1.341	14.915	0.638245	-0.398
0	Bronx	Brooklyn	1153.500	1171.775	-18.275	11.261	two-sided	-1.148	262.592	0.834129	-0.140
7	Manhattan	Queens	1253.711	1268.462	-14.751	18.168	two-sided	-0.574	178.623	0.900000	-0.083

In [52]:

f, ax = plt.subplots(figsize=(16, 12))
ax = sns.boxplot(data=full, x='sat_score', y='borough', palette='muted', showmeans=True, meanline=True)
plt.show()

In [53]:

all_shep_corrs[['X', 'Y', 'outliers', 'r', 'CI95%', 'p-unc', 'p_doubled']]

Out[53]:

	X	Y	outliers	r	CI95%	p-unc	p_doubled
0	sat_score	sped_percent	42.0	-0.463	[-0.53, -0.39]	1.666789e-24	3.333578e-24
1	sat_score	Local - % of grads	37.0	-0.454	[-0.52, -0.38]	8.124567e-24	1.624913e-23
2	sat_score	frl_percent	34.0	-0.446	[-0.52, -0.37]	4.409605e-23	8.819210e-23
3	sat_score	ell_percent	46.0	-0.309	[-0.39, -0.23]	4.827200e-11	9.654400e-11
4	sat_score	black_per	26.0	-0.270	[-0.35, -0.19]	5.289608e-09	1.057922e-08
5	sat_score	hispanic_per	25.0	-0.162	[-0.25, -0.07]	5.334939e-04	1.066988e-03
6	sat_score	pct_AP	36.0	0.239	[0.15, 0.32]	3.562404e-07	7.124807e-07
7	sat_score	pct_3_plus	39.0	0.393	[0.31, 0.47]	1.192349e-17	2.384697e-17
8	sat_score	Advanced Regents - % of grads	37.0	0.503	[0.43, 0.57]	1.172161e-29	2.344321e-29
9	sat_score	white_per	40.0	0.508	[0.44, 0.57]	3.566473e-30	7.132945e-30
10	sat_score	Advanced Regents - % of cohort	36.0	0.536	[0.47, 0.6]	3.176937e-34	6.353874e-34
11	sat_score	asian_per	42.0	0.566	[0.5, 0.62]	2.855790e-38	5.711580e-38

Results¶

Despite the fact that this data was inherently difficult to fit into existing statistical models, our analysis uncovered some excellent insights into possible contributing factors for SAT performance across New York City. Circling back to the inquiries that motivated this project, we find that significant correlations to SAT score were found in nearly every demographic we set out to explore.

Cultural and racial differences in schools showed a strong positive relationship between White- and Asian-heavy schools, with a weaker negative relationship to primarily Black or Hispanic schools. The percentage of English Language Learners also had a moderate negative correlation to the SAT.

The proportion of high-achieving students also had a strong positive correlation to SAT score, though more so when looking to the percentage of students who acheived an Advanced Regents diploma than those who took or excelled on AP tests. Conversely, we also saw a moderately strong negative correlation to the percentage of both Special Ed and Local diploma earners.

For socioeconomic demographics, we saw a strong negative correlation to the percentage of students receiving free or reduced-price lunch. The city's poorest borough, the Bronx, also had the lowest mean and median SAT scores in our dataset.

The Bronx and Brooklyn had significantly different (lower) means from the other three boroughs, with medium to large effect sizes for each pair. Districts, on the other hand showed very large effect sizes, but were mostly too small to confidently say their difference in means was significant.

Only a school's gender split proved to be virtually unrelated to its SAT score, which makes sense as it is the least intertwined of all the features we examined.

Limitations¶

Although this project paints a compelling narrative for what lies behind the scenes of a school's average SAT score, there are numerous reasons why it should not be relied on for prediction purposes and why it should reassessed in the future by other analysts:

First, this data is nearly a decade old now. NYC Open Data hasn't released newer data on SAT results, so there is no way to know whether these schools are still performing at the level they did in 2012.

Second, the underlying data was compiled from several different sources, which did not all contain the same schools. Rather than drop all our missing values, we imputed values that were missing for schools that were in our sat_results data. In doing so, we preserved our sample size at the cost of some amount of accuracy.

We also had no ideal way to measure statistical significance due to the highly varied sample sizes, data shapes and standard deviations. Although Shepherd's pi analysis was more accurate than Pearson's r, a careful look at the outliers it excluded from analysis were virtually all of the International schools — a highly informative, yet small portion of our data.

Finally, it would be nice to include more socioeconomic demographics. Of all the features in our data, only frl_percent directly addressed student body affluence (or lack thereof). It is entirely possible that wealth plays a larger part than this analysis reveals, as students with means often have an advantage when it comes to test preparation.