Projects

Network range: An R function for network analysis

34 minute read, more or less

Created: May 02, 2020

I wrote this up a few years back and updated it to include {ggraph} and {tidygraph}, my go-tos now for network manipulation and visualization.

Regression tables in R: An only slightly harmful approach

23 minute read, more or less

Created: April 22, 2020

Creating tables in R inevitably entails harm–harm to your self-confidence, your sense of wellbeing, your very sanity. Stack Overflow overfloweth with folks desparately trying to figure out how to get their regression tables exported to html, pdf–or, the horror, word–formats. Tables are pretty complicated objects with lots of bells, whistles, and various points of customization. Packages abound for creating nicely formatted tables, and they have strengths and drawbacks. On SO, you see lots of people using {stargazer}. Now, I’m not going to harsh on someone’s hardwork and {stargazer} is a servicable packages that pretty easily creates nice looking regression tables. But, the API is very unclear and it is not customizable or extensible. I have adopted a workflow using {huxtable} and {flextable} to export tables to word format. Yes, word documents are still the standard format in the academic world. I conduct my analyses and write up my research in R, but typically I need to use word to share with colleagues or to submit to journals, conferences, etc.

Using R and Python to Predict Housing Prices

50 minute read, more or less

Created: April 17, 2020

Some folks work in R. Some work in Python. Some work in both. I’m more on the R side, which has served my needs as a Phd student, but I also use Python on occasion. I thought it would be fun, as an exercise, to do a side-by-side, nose-to-tail analysis in both R and Python, taking advantage of the wonderful {reticulate} package in R. {reticulate} allows one to access Python through the R interface. I find this especially cool in Rmarkdown, since you can knit R and Python chucks in the same document! You can, to some extent, pass objects back and forth between the R and Python environments. Wow.

An observation regarding robust standard errors in R and Stata

6 minute read, more or less

Created: April 02, 2020

A common question when users of Stata switch to R is how to replicate the vce(robust) option when running linear models to correct for heteroskedasticity. In Stata, this is trivially easy: reg y x, vce(robust). To get heteroskadastic-robust standard errors in R–and to replicate the standard errors as they appear in Stata–is a bit more work. First, we estimate the model and then we use vcovHC() from the {sandwich} package, along with coeftest() from {lmtest} to calculate and display the robust standard errors.

The Bike Lanes of Brooklyn

2 minute read, more or less

Created: February 06, 2020

I was inspired by this bit of code to make a map of Brooklyn bike lanes–the lanes upon which I once biked many a mile.

Participation in New York State Accountability Testing

4 minute read, more or less

Created: September 22, 2019

School-level accountability data for public schools in New York is available here in…Microsoft Access format. I have already cleaned and prepared these data for analysis, saved it locally, and loaded it into my environment. Most important for this analysis is that the data contain the percent of students participating in annual accountability testing in both ELA (English Language Arts) and math. I’ve subset the data to exclude secondary schools, since the landscape of testing is much different there. The data range from the 2007-2008 school year to the 2016-2017 school year. I’m going to aggregate the data at the county level. I will also do the district level in a moment.