Codebook generator

The codebook generated here will be stored for 24 hours. Unless you share the link, others cannot easily discover it. The data you upload is not stored, but if you do not want to upload the data, you can also install the codebook R package on your computer using install.packages("codebook"). This will also make it easier to document multiple data files in the same document, should you want to.

The following file formats are supported, among others: .sav (SPSS), .dta (Stata), .rds (R), .rdata (R), .por, .xpt, .csv, .tsv, .csv2. All are read using rio, which means you can also upload zipped files, see rio docs for more information.

The codebook package uses variable and value labels, as well as labelled missing values to make sense of the data. You can upload files without such metadata (e.g., .csv), but the resulting codebook will be less useful. You'll get the most mileage out of this package by using data collected with formr.org and imported using the formr R package.

If you prefer a PDF over HTML (but remember, PDFs are much less readable for machines and hard to read on mobile devices), just remove the html_document block below.

The webapp sets reasonable defaults and it is possible to edit the text and the R code to improve the resulting codebook. However, the webapp does not store edits, is not as interactive as working in R, and it requires the user to upload the dataset to a server. This is not permissible for certain restricted-use datasets. Moreover, for very large datasets, you may get an error message, because the server limits the resources you can use. If one wants to document large, private, or many datasets, or if you first need to add the metadata, it is easier to install the codebook package locally.

More documentation on the R package.

Reports bugs on Github

--- title: "Codebook" output: html_document: toc: true toc_depth: 4 toc_float: true code_folding: 'hide' self_contained: true pdf_document: toc: yes toc_depth: 4 latex_engine: xelatex --- ```{r setup} knitr::opts_chunk$set( warning = TRUE, # show warnings during codebook generation message = TRUE, # show messages during codebook generation error = TRUE, # do not interrupt codebook generation in case of errors, # usually better for debugging echo = TRUE # show R code ) ggplot2::theme_set(ggplot2::theme_bw()) pander::panderOptions("table.split.table", Inf) ``` We collected the following data. ```{r codebook} # omit the following lines, if your missing values are already properly labelled codebook_data <- detect_missing(codebook_data, only_labelled = TRUE, # only labelled values are autodetected as # missing negative_values_are_missing = FALSE, # negative values are missing values ninety_nine_problems = TRUE, # 99/999 are missing values, if they # are more than 5 MAD from the median ) # If you are not using formr, the codebook package needs to guess which items # form a scale. The following line finds item aggregates with names like this: # scale = scale_1 + scale_2R + scale_3R # identifying these aggregates allows the codebook function to # automatically compute reliabilities. # However, it will not reverse items automatically. codebook_data <- detect_scales(codebook_data) # Does your dataset have a name that is not reflected in the file name? # Uncomment the line below and change the name # metadata(codebook_data)$name <- "My Awesome Dataset" codebook(codebook_data) ```