Regression tables in R: An only slightly harmful approach

23 minute read, more or less

Created: April 22, 2020

Creating tables in R inevitably entails harm–harm to your self-confidence, your sense of wellbeing, your very sanity. Stack Overflow overfloweth with folks desparately trying to figure out how to get their regression tables exported to html, pdf–or, the horror, word–formats. Tables are pretty complicated objects with lots of bells, whistles, and various points of customization. Packages abound for creating nicely formatted tables, and they have strengths and drawbacks. On SO, you see lots of people using {stargazer}. Now, I’m not going to harsh on someone’s hardwork and {stargazer} is a servicable packages that pretty easily creates nice looking regression tables. But, the API is very unclear and it is not customizable or extensible. I have adopted a workflow using {huxtable} and {flextable} to export tables to word format. Yes, word documents are still the standard format in the academic world. I conduct my analyses and write up my research in R, but typically I need to use word to share with colleagues or to submit to journals, conferences, etc.

This is the workflow I developed while writing my dissertation. I prefer {huxtable} because it is {tidyverse} friendly, pipeable %>%, has an excellent API, and is well-documented. I use {flextable} because it exports beautifully to word. This workflow is pretty good, but not perfect. So trial and error and futzing around in word is necessary.

tl;dr

If you are knitting a word document in R markdown, build the table in {huxtable} using huxreg() and customize with other {huxtable} functions and then convert to a {flextable} object and adjust the width as needed. This produces a nicely formatted regression table for my different types of models in

# Not run! 
fit <- lm(Y ~ X, data = dat)

tab <- huxreg(fit,
              # Rename coefficients
              coefs = c("Cool variable name" = "X")
              # Add the desired summary statistics
              statistics = c(N = "nobs", R2 = "r.squared"),
              # Add a note
              note = "Note. Important note about this table") %>% 
  # Convert the huxtable table to flextable
  as_flextable()

# Set a custom width to ensure it fits on the page
tab %>% flextable::width(width = 1)

A more detailed regression table workflow

I like to use real-world data when I’m presenting a tutorial or doing a training (well, I used to teach an R bootcamp workshop for incoming Master of Analytics students at Northwestern University). I have felt the need to upgrade my SQL chops, so I set up an SQL database on an external drive I have and upload series of datasets that are (a) large and (b) I work with lot. These datasets consist administrative and demographic information on nearly all public school in the United States, made available by the National Center for Education Statistics. So first things first, I will access the data.

library(tidyverse)

library(dbplyr)

## 
## Attaching package: 'dbplyr'

## The following objects are masked from 'package:dplyr':
## 
##     ident, sql

path <- "/Volumes/the safe box/CCD_2001_2016/ccd_2001-2016.sqlite"

ccd <- DBI::dbConnect(RSQLite::SQLite(), path)

DBI::dbListTables(ccd)

## [1] "ccd_district_dir"     "ccd_district_enroll"  "ccd_district_finance"
## [4] "ccd_school_dir"       "ccd_school_enroll"    "sqlite_stat1"        
## [7] "sqlite_stat4"

query <- 
  "SELECT * 
    FROM (SELECT fips, year AS e_year, leaid AS e_leaid, race, enrollment 
            FROM ccd_district_enroll 
            WHERE (race IN ('White', 'Black', 'Hispanic', 'Asian', 'American Indian or Alaska Native', 'Total') AND year == 2015)) AS e 
    INNER JOIN 
      (SELECT leaid, year, rev_total, rev_fed_total, rev_local_total, rev_state_total 
        FROM ccd_district_finance) AS f 
    ON (e.e_leaid = f.leaid AND e.e_year = f.year)"

dat <- tbl(ccd, sql(query)) %>% 
  collect() %>% 
  select(-starts_with("e_")) %>% 
  pivot_wider(names_from = race, values_from = enrollment) %>% 
  janitor::clean_names() %>% 
  mutate_at(vars(white, black, hispanic, asian), list(per = ~ round(100 * (./total), 2))) %>% 
  na_if(-2) %>% 
  na_if(-1) %>% 
  filter_at(vars(contains("rev")), all_vars(. != 0))

Now, you may want to get these data your self. Luckily, it is easy to do so thanks to the {educationdata} package from the Urban Institute. The API is great, but take some time.

enroll <- get_education_data(level = "school-districts",
                        source = "ccd",
                        topic = "enrollment",
                        filters = list(year = 2015, grade = 99),
                        by = list("race"),
                        add_labels = TRUE)

fin <- get_education_data(level = "school-districts",
                     source = "ccd",
                     topic = "finance",
                     filters = list(year = 2015),
                     add_labels = TRUE)

dat <- enroll %>% 
  select(leaid, year, race, enrollment)
  inner_join(fin %>% 
               select(leaid, year, rev_total, rev_fed_total, rev_local_total),
             by = c("leaid", "year")) %>% 
    pivot_wider(names_from = race, values_from = enrollment) %>% 
    janitor::clean_names()

Ok, so let’s specify a simple log-linear model regression the total local revenue of a school district (logged) on the share of white students in that district. We might anticipate, given predominant patterns of segregation and income inequality, that districts with more white students will have more local revenue, since local revenue is often generate through local taxes and levies.

To display the regression, {huxtable} has a beautiful and simple function: huxreg(). This is produce, in your console, a well formatted and attactive regression table. The function will accept any object that has an associated tidy() and/or glance() in the {broom} package (for a list of these objects, see here–it’s quite extensive). Straight outta the box, this will make an acceptable table.

library(huxtable)
fit1 <- lm(log(rev_local_total) ~ white_per, data = dat)

huxreg(fit1)

	(1)
(Intercept)	14.732 ***
	(0.040)
white_per	0.006 ***
	(0.001)
N	15692
R2	0.008
logLik	-33809.860
AIC	67625.721
* p < 0.001; p < 0.01; * p < 0.05.

According to this simple model, for each percentage point increase in the share of white students in a district, there is a 0.6% increase in local funding. Ok, so if we have a 10 percentage-point increase in the share of white students, there is a 6% increase in local funding. That seems meaningful–with the caveat that this is a very simple model with a very low $R^2$. We have a large enough N to capture realtively small associations.

So far so good. However, there is quite a bit of customizing we can do to tweak the table. This is where table packages become a bit unwieldy. I like {huxtable} because I feel like the API is quite sensible and easy to use.

For example, I don’t need to report the log-likelihood or AIC here in the summary stats–just the N and $R^2$. This is controlled by the statistics argument. What’s more, I can give custom names to the stats.

huxreg(fit1,
       statistics = c(N = "nobs", R2 = "r.squared"))

	(1)
(Intercept)	14.732 ***
	(0.040)
white_per	0.006 ***
	(0.001)
N	15692
R2	0.008
* p < 0.001; p < 0.01; * p < 0.05.

huxreg(fit1,
       statistics = c(`Number of observations` = "nobs", `R-squared` = "r.squared"))

	(1)
(Intercept)	14.732 ***
	(0.040)
white_per	0.006 ***
	(0.001)
Number of observations	15692
R-squared	0.008
* p < 0.001; p < 0.01; * p < 0.05.

Also, I don’t like that for the independent variable of interest we have the variable name. We can rename it!

huxreg(fit1,
       coefs = c("%White Students" = "white_per"),
       statistics = c(`Number of observations` = "nobs", `R-squared` = "r.squared"))

	(1)
%White Students	0.006 ***
	(0.001)
Number of observations	15692
R-squared	0.008
* p < 0.001; p < 0.01; * p < 0.05.

Often, we want to display multiple models with various types of controls. Easy! Fit the models and send them to huxreg{} as a list! So let’s update the original model and control for the (log) of state and federal revenue, separately and jointly. Also, this time I will add a note to the bottom of the table–some thing that we often do to provide necessary information to interpret the results. This is does with the note argument.

fit2 <- update(fit1, . ~ . + log(rev_state_total))
fit3 <- update(fit1, . ~ . + log(rev_fed_total))
fit4 <- update(fit2, . ~ . + log(rev_fed_total))

fits <- list(fit1, fit2, fit3, fit4)

huxreg(fits,
       coefs = c("%White Students" = "white_per",
                 "State revenue (logged)" = "log(rev_state_total)",
                 "Federal revenue (logged)" = "log(rev_fed_total)"),
       statistics = c(N = "nobs", R2 = "r.squared"))

	(1)	(2)	(3)	(4)
%White Students	0.006 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)
State revenue (logged)		0.823 ***		0.149 ***
		(0.008)		(0.011)
Federal revenue (logged)			0.987 ***	0.862 ***
			(0.007)	(0.012)
N	15692	15692	15692	15692
R2	0.008	0.423	0.566	0.571
* p < 0.001; p < 0.01; * p < 0.05.

fits <- list("Model 1" = fit1, "Model 2" = fit2, 
             "Model 3" = fit3, "Model 4" = fit4)

huxreg(fits,
       statistics = c(N = "nobs", R2 = "r.squared"),
       note = "Note. Data from the Common Core of Data, National Center for Education Statistics")

	Model 1	Model 2	Model 3	Model 4
(Intercept)	14.732 ***	1.619 ***	0.289 **	-0.262 *
	(0.040)	(0.127)	(0.105)	(0.113)
white_per	0.006 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)
log(rev_state_total)		0.823 ***		0.149 ***
		(0.008)		(0.011)
log(rev_fed_total)			0.987 ***	0.862 ***
			(0.007)	(0.012)
N	15692	15692	15692	15692
R2	0.008	0.423	0.566	0.571
Note. Data from the Common Core of Data, National Center for Education Statistics

Adding these controls increases the relationship between the share of white students in a district and the amount of local funding it receives. So net of federal and state revenue, for each percentage point increase in the share of white students, a school district an a 2.1% increase in local funding.

Well, we might also want to add a term for state, since states vary quite a bit in how they administer and fund school districts. States have constitutional authority to administer and fund schools. The federal government add supplemental funding to school serving low-income students. Let’s update the model to include a categorical term for state.

Doing this will BLOW UP the table with FOFTY-NINE STATES!! See??

state_fits <- map(fits, ~ update(.x, . ~ . + fips))

huxreg(state_fits, 
       statistics = c(N = "nobs", R2 = "r.squared"),
       note = "Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics")

	Model 1	Model 2	Model 3	Model 4
(Intercept)	15.749 ***	1.865 ***	0.306 *	-0.307 *
	(0.156)	(0.168)	(0.149)	(0.151)
white_per	0.007 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)
fipsAlaska	-1.971 ***	-1.269 ***	-0.913 ***	-0.931 ***
	(0.289)	(0.214)	(0.190)	(0.188)
fipsArizona	-3.164 ***	-1.356 ***	-0.672 ***	-0.678 ***
	(0.171)	(0.127)	(0.113)	(0.112)
fipsArkansas	-1.239 ***	-0.367 **	-0.489 ***	-0.411 ***
	(0.190)	(0.141)	(0.125)	(0.124)
fipsCalifornia	-0.310	0.237	0.899 ***	0.816 ***
	(0.164)	(0.121)	(0.108)	(0.107)
fipsColorado	-0.792 ***	0.383 *	0.940 ***	0.917 ***
	(0.203)	(0.150)	(0.134)	(0.133)
fipsConnecticut	0.155	0.786 ***	1.573 ***	1.474 ***
	(0.200)	(0.148)	(0.132)	(0.130)
fipsDelaware	-0.564	0.001	0.944 ***	0.812 ***
	(0.309)	(0.228)	(0.203)	(0.201)
fipsFlorida	1.682 ***	0.723 ***	0.159	0.198
	(0.268)	(0.198)	(0.176)	(0.174)
fipsGeorgia	0.142	0.180	0.216	0.212
	(0.200)	(0.148)	(0.131)	(0.130)
fipsHawaii	2.051	-1.819	-1.379	-1.705
	(1.803)	(1.334)	(1.184)	(1.173)
fipsIdaho	-2.814 ***	-1.610 ***	-1.316 ***	-1.291 ***
	(0.212)	(0.157)	(0.140)	(0.138)
fipsIllinois	-0.486 **	0.534 ***	0.932 ***	0.927 ***
	(0.166)	(0.123)	(0.109)	(0.108)
fipsIndiana	-1.257 ***	-0.806 ***	-0.304 *	-0.364 **
	(0.180)	(0.133)	(0.118)	(0.117)
fipsIowa	-0.912 ***	0.037	0.518 ***	0.493 ***
	(0.183)	(0.136)	(0.121)	(0.119)
fipsKansas	-1.728 ***	-0.858 ***	-0.026	-0.118
	(0.187)	(0.139)	(0.124)	(0.122)
fipsKentucky	-0.786 ***	-0.688 ***	-1.070 ***	-0.996 ***
	(0.206)	(0.152)	(0.135)	(0.134)
fipsLouisiana	-0.116	0.376 *	0.401 **	0.428 **
	(0.229)	(0.169)	(0.150)	(0.149)
fipsMaine	-1.753 ***	0.493 ***	0.401 **	0.562 ***
	(0.196)	(0.146)	(0.129)	(0.128)
fipsMaryland	2.458 ***	0.947 **	0.959 ***	0.860 ***
	(0.398)	(0.294)	(0.261)	(0.259)
fipsMassachusetts	0.301	1.641 ***	1.451 ***	1.570 ***
	(0.179)	(0.133)	(0.118)	(0.117)
fipsMichigan	-2.046 ***	-1.066 ***	-0.523 ***	-0.557 ***
	(0.165)	(0.122)	(0.109)	(0.108)
fipsMinnesota	-2.312 ***	-1.439 ***	-0.488 ***	-0.601 ***
	(0.173)	(0.128)	(0.114)	(0.113)
fipsMississippi	-0.295	0.157	-0.045	0.020
	(0.215)	(0.159)	(0.141)	(0.140)
fipsMissouri	-1.346 ***	0.135	-0.105	0.032
	(0.172)	(0.128)	(0.113)	(0.112)
fipsMontana	-3.041 ***	-0.442 ***	-0.077	0.025
	(0.178)	(0.134)	(0.119)	(0.118)
fipsNebraska	-0.849 ***	1.146 ***	0.966 ***	1.126 ***
	(0.192)	(0.143)	(0.127)	(0.126)
fipsNevada	0.439	0.668 *	0.594 *	0.622 *
	(0.450)	(0.333)	(0.296)	(0.293)
fipsNew Hampshire	-0.707 ***	0.642 ***	0.874 ***	0.920 ***
	(0.210)	(0.156)	(0.138)	(0.137)
fipsNew Jersey	0.060	0.974 ***	1.625 ***	1.568 ***
	(0.169)	(0.125)	(0.111)	(0.110)
fipsNew Mexico	-2.724 ***	-1.540 ***	-0.328 *	-0.468 ***
	(0.214)	(0.158)	(0.141)	(0.140)
fipsNew York	0.258	0.506 ***	1.077 ***	0.992 ***
	(0.169)	(0.125)	(0.111)	(0.110)
fipsNorth Carolina	-1.223 ***	-0.445 **	0.406 **	0.305 *
	(0.189)	(0.140)	(0.124)	(0.123)
fipsNorth Dakota	-2.185 ***	-0.536 ***	0.166	0.147
	(0.206)	(0.153)	(0.137)	(0.135)
fipsOhio	-1.895 ***	-0.883 ***	-0.568 ***	-0.559 ***
	(0.165)	(0.122)	(0.109)	(0.107)
fipsOklahoma	-1.923 ***	-0.201	-0.093	-0.002
	(0.172)	(0.128)	(0.113)	(0.112)
fipsOregon	-1.392 ***	-0.439 **	-0.149	-0.139
	(0.200)	(0.148)	(0.132)	(0.131)
fipsPennsylvania	0.191	1.437 ***	1.152 ***	1.283 ***
	(0.168)	(0.125)	(0.111)	(0.110)
fipsRhode Island	-0.266	0.634 **	0.768 ***	0.802 ***
	(0.280)	(0.207)	(0.184)	(0.182)
fipsSouth Carolina	0.772 **	0.443 *	0.349 *	0.344 *
	(0.250)	(0.185)	(0.164)	(0.163)
fipsSouth Dakota	-1.439 ***	0.754 ***	0.125	0.378 **
	(0.213)	(0.158)	(0.140)	(0.139)
fipsTennessee	-0.202	-0.183	-0.530 ***	-0.467 ***
	(0.216)	(0.159)	(0.142)	(0.140)
fipsTexas	-0.937 ***	0.139	0.496 ***	0.502 ***
	(0.162)	(0.120)	(0.107)	(0.106)
fipsUtah	-2.912 ***	-1.957 ***	-1.116 ***	-1.204 ***
	(0.215)	(0.159)	(0.142)	(0.140)
fipsVermont	-4.350 ***	-3.297 ***	0.071	-0.460 **
	(0.217)	(0.160)	(0.145)	(0.147)
fipsVirginia	0.674 **	0.549 ***	0.569 ***	0.557 ***
	(0.219)	(0.162)	(0.144)	(0.142)
fipsWashington	-1.276 ***	-0.714 ***	-0.162	-0.224
	(0.185)	(0.137)	(0.122)	(0.121)
fipsWest Virginia	-0.224	-0.496 *	-0.953 ***	-0.889 ***
	(0.284)	(0.210)	(0.187)	(0.185)
fipsWisconsin	-0.540 **	0.476 ***	0.538 ***	0.592 ***
	(0.177)	(0.131)	(0.117)	(0.116)
fipsWyoming	-0.500	-0.132	0.227	0.187
	(0.302)	(0.223)	(0.198)	(0.196)
log(rev_state_total)		0.815 ***		0.197 ***
		(0.007)		(0.011)
log(rev_fed_total)			0.963 ***	0.792 ***
			(0.007)	(0.012)
N	15692	15692	15692	15692
R2	0.267	0.599	0.684	0.690
Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics

Fortunately, this is easy to deal with. I don’t need to see all those coefficients for states. I can omit them in two ways: (1) using the omit_coefs argument and supplying the names of the coefficients.

fips_coefs <- names(coef(state_fits[[1]]))[str_detect(names(coef(state_fits[[1]])), "fips")]

huxreg(state_fits, 
       omit_coefs = fips_coefs,
       statistics = c(N = "nobs", R2 = "r.squared"),
       note = "Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics")

	Model 1	Model 2	Model 3	Model 4
(Intercept)	15.749 ***	1.865 ***	0.306 *	-0.307 *
	(0.156)	(0.168)	(0.149)	(0.151)
white_per	0.007 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)
log(rev_state_total)		0.815 ***		0.197 ***
		(0.007)		(0.011)
log(rev_fed_total)			0.963 ***	0.792 ***
			(0.007)	(0.012)
N	15692	15692	15692	15692
R2	0.267	0.599	0.684	0.690
Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics

Or, even easier in this case, I can just use the same coefs argument as before. huxreg will omit any coefficients not specifically named! I might also append my note to let people know I included the state controls.

huxreg(state_fits, 
       coefs = c("%White Students" = "white_per",
                 "State revenue (logged)" = "log(rev_state_total)",
                 "Federal revenue (logged)" = "log(rev_fed_total)"),
       statistics = c(N = "nobs", R2 = "r.squared"),
       note = "Note. Standard errors clustered at the state level. All models include a control for state. Data from the Common Core of Data, National Center for Education Statistics")

	Model 1	Model 2	Model 3	Model 4
%White Students	0.007 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)
State revenue (logged)		0.815 ***		0.197 ***
		(0.007)		(0.011)
Federal revenue (logged)			0.963 ***	0.792 ***
			(0.007)	(0.012)
N	15692	15692	15692	15692
R2	0.267	0.599	0.684	0.690
Note. Standard errors clustered at the state level. All models include a control for state. Data from the Common Core of Data, National Center for Education Statistics

Getting a bit more advanced

Ok, great so far. But huxreg is really just a shortcut function. We can use the entire power of the {huxtable} API to do all sorts of wacky things. Hey, let’s throw all these models together! But I want to make a clear indication for which model includes state as a control. To do this, I will use the function insert_row() and make a pipeline. To facilitate this, I need to know the dimensions of the table, so I can figure out where to insert the note.

tab <- huxreg(c(fits, state_fits),
         coefs = c("%White Students" = "white_per",
                 "State revenue (logged)" = "log(rev_state_total)",
                 "Federal revenue (logged)" = "log(rev_fed_total)"),
       statistics = c(N = "nobs", R2 = "r.squared"),
       note = "Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics")

dim(tab)

## [1] 10  9

There are 10 rows. Here’s where things get a bit fiddly. I want to insert the note above the N row. It looks like it is in row 8. So I will insert the new row before line 8 and after line 7. The syntax for insert_row() is insert_row(c(cell_contents), after = row_number), where cell_contents is a vector of values to go in each column for that row. Here, I want the first column to have the value “State Control” and then an “Y” for the models with and “N” for the models without. So it will look like this: c("State Control", rep("N", 4), rep("Y", 4)) since only the last 4 models have the state controls.

tab %>% insert_row(c("State Control", rep("N", 4), rep("Y", 4)), after = 7)

	Model 1	Model 2	Model 3	Model 4	Model 1	Model 2	Model 3	Model 4
%White Students	0.006 ***	0.011 ***	0.023 ***	0.021 ***	0.007 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)	(0.000)	(0.000)	(0.000)
State revenue (logged)		0.823 ***		0.149 ***		0.815 ***		0.197 ***
		(0.008)		(0.011)		(0.007)		(0.011)
Federal revenue (logged)			0.987 ***	0.862 ***			0.963 ***	0.792 ***
			(0.007)	(0.012)			(0.007)	(0.012)
State Control	N	N	N	N	Y	Y	Y	Y
N	15692	15692	15692	15692	15692	15692	15692	15692
R2	0.008	0.423	0.566	0.571	0.267	0.599	0.684	0.690
Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics

There’s an undesired border–I can fix this with set_bottom_border()! We can start to see the building blocks of the API.

tab %>% 
  insert_row(c("State control", rep("N", 4), rep("Y", 4)), after = 7) %>% 
  set_bottom_border(row = 8, col = 2:9, value = 0)

	Model 1	Model 2	Model 3	Model 4	Model 1	Model 2	Model 3	Model 4
%White Students	0.006 ***	0.011 ***	0.023 ***	0.021 ***	0.007 ***	0.011 ***	0.023 ***	0.021 ***
	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)	(0.000)	(0.000)	(0.000)
State revenue (logged)		0.823 ***		0.149 ***		0.815 ***		0.197 ***
		(0.008)		(0.011)		(0.007)		(0.011)
Federal revenue (logged)			0.987 ***	0.862 ***			0.963 ***	0.792 ***
			(0.007)	(0.012)			(0.007)	(0.012)
State control	N	N	N	N	Y	Y	Y	Y
N	15692	15692	15692	15692	15692	15692	15692	15692
R2	0.008	0.423	0.566	0.571	0.267	0.599	0.684	0.690
Note. Standard errors clustered at the state level. Data from the Common Core of Data, National Center for Education Statistics

For set_bottom_border(), you specify the row number, the relevant column numbers, and then 0 to turn the border off. Looks great!

Now, you can also use the main hux() function and create a regression table from stratch if you need more customization, using the power of {broom}. I did this for my dissertation. For certain tables, I wanted more control and built the tables from the ground up using {huxtable}’s API. Here’s an example:

models <- list(m1, m2, m3, m4, m5, m6)

did_tidy <- map(models, tidy)

did_tidy <-
  did_tidy %>%
  map(~ mutate(.,
               estimate =
                 case_when(
                 p.value <= 0.001 ~ paste0(round(estimate, 3), " ***"),
                 dplyr::between(p.value, 0.001, 0.01) ~ paste0(round(estimate, 3), " **"),
                 dplyr::between(p.value, 0.01, 0.05) ~ paste0(round(estimate, 3), " *"),
                 TRUE ~ as.character(round(estimate, 3))
               ),
               std.error = paste0("(", round(std.error, 3), ")")
               )) %>%
  map(~ filter(.,
               str_detect(term, ":"))
      ) %>%
  map(~ select(.,
               term, estimate, std.error))

did_tidy <-
  did_tidy %>%
  map(., ~
        reshape2::melt(.x,
                       id.var = "term")
      )

did_tidy <-
  reduce(did_tidy,
         full_join,
         by = c("term",
                "variable")
         ) %>%
  select(-variable)

did_tidy[["term"]] <-
  c("Increase %Black or Latinx, 2013-2015 X post-2013",
    "")

tab <-
  did_tidy %>%
  hux() %>%
  insert_row("",
             "Suspension Rate", "", "",
             "Attendance Rate", "", "",
             after = 0) %>%
  set_colspan(1, c(2, 5), 3) %>%
  insert_row("",
             "All grades", "Grades 3-5", "Grades 6-8",
             "All grades", "Grades 3-5", "Grades 6-8",
             after = 1) %>%
  # 4 total rows, 5 total columns here
  insert_row("School Fixed Effects",
             "Y", "Y",
             "Y", "Y",
             "Y", "Y",
             after = 4) %>%
  insert_row("Year Fixed Effects",
             "Y", "Y",
             "Y", "Y",
             "Y", "Y",
             after = 5) %>%
  insert_row("Observations",
             map(models, `[`, "N") %>%
               unlist(),
             after = 6) %>%
  insert_row("R2",
             map(models, glance) %>%
               map(`[`, "r.squared") %>%
               unlist() %>%
               round(3),
             after = 7) %>%
  insert_row("Adj. R2",
             map(models, glance) %>%
               map(`[`, "adj.r.squared") %>%
               unlist() %>%
               round(3),
             after = 8) %>%
  insert_row("*** p < 0.001;  ** p < 0.01;  * p < 0.05. \nNote. Includes controls for the lagged non-proficiency rate in mathematics for all students and for white students, percent of teachers with less than 3 years of experience, percent of teachers with MAs or above, the percent of English Language Learners, the percent of students qualifying for free or reduced price lunch, and the natural log of total school enrollment. Heteroscedastic-robust standard errors clustered at the school level are in parentheses.",
             rep("", 6),
             after = 9) %>%
  insert_column(rep("", 10), after = 4) %>%
  set_colspan(10, 1, 8) %>%
  # 9 rows, 5 columns
  set_top_border(c(1, 3, 5), 1:8, 1) %>%
  set_top_border(2, c(2:4, 6:8), 1) %>%
  set_bottom_border(9, 1:8, 1) %>%
  set_all_padding(1:10, 1:8, 0) %>%
  set_align(1:10, 2:8, "center")

Putting it all together: `{huxtable}` + `{flextable}`

Now that we have a nicely formatted regression table, exporting to word is easy: Just convert it to a {flextable} object, which plays very nicely with word!

I wrote my dissertation in R markdown, knitting to word, and so I just converted the table using the as_flextable() function in {huxtable} and made a few minor tweaks. There are also the quick_rft() and quick_docx() functions in {huxtable}, which outputs a nicely formatted table to a word doc. So if you cutting and pasting, that’s one way to go.

tab %>% quick_docx()

I knit the whole dissertation in word, so converted to {flextable} and adjusted the width of each column as I wanted.

tab %>%
  as_flextable() %>%
  flextable::width(width = c(2.5, rep(1, 8)))

Share on

Twitter Facebook Google+ LinkedIn

Regression tables in R: An only slightly harmful approach

tl;dr

A more detailed regression table workflow

Getting a bit more advanced

Putting it all together: `{huxtable}` + `{flextable}`

Share on

Other things

Network range: An R function for network analysis

Using R and Python to Predict Housing Prices

An observation regarding robust standard errors in R and Stata

The Bike Lanes of Brooklyn

tl;dr

A more detailed regression table workflow

Getting a bit more advanced

Putting it all together: {huxtable} + {flextable}

Share on

Other things

Network range: An R function for network analysis

Using R and Python to Predict Housing Prices

An observation regarding robust standard errors in R and Stata

The Bike Lanes of Brooklyn

Putting it all together: `{huxtable}` + `{flextable}`