Tools#

This section highlights tools that support reproducible analysis and research. This includes tools for general software development and bespoke packages that have been developed for government analysis. Those developed or contributed to within government are marked with the abbreviation (gov).

If you have developed a package for use in analysis or recommend any that are not included here, please add them to the list. You can request a new tool to be added to the list by creating an issue on GitHub or contacting us by email. Alternatively, you can add it directly to the project by creating a Pull Request. You can do this using the “Suggest edit” link under the GitHub logo at the top of this page. Please include a link and brief description when requesting a new tool to be added.

The tools included on this page will in general follow the good quality assurance practices described in this guidance. However, as with any software there is a chance that they may still contain bugs or limitations. Please apply your own judgement when using them. If you feel a tool should no longer be included in this list, please suggest an edit or get in touch.

Data manipulation and analysis#

Manipulating and analysing data.

Python#

  • pandas - common data analysis and manipulation

  • Polars - high performance data manipulation

  • PySpark - data manipulation for distributed (large) data

  • Splink (gov) - probabilistic data linkage

R#

  • dplyr - common data analysis and manipulation

  • sparklyr - for distributed (large) data

Publishing#

Testing#

Tools for implementing automated code testing.

Python#

R#

  • testthat - common testing framework

  • assertr - common testing framework

  • patrick - parameterised testing extension for testthat

  • covr - measuring test coverage

Dependency management#

  • venv (Python) - manage packages using virtual environments

  • pyenv (Python) - manage independent Python versions for different projects

  • renv (R) - virtual environments for managing packages

  • conda - manage language versions and packages for most languages

Version control#

  • Git - common open source version control system

  • pre-commit - trigger checks (e.g. linters and formatters) before Git commits are created

Project templates#

Code Linters#

Analysing code for stylistic errors, and sometimes bugs.

Python#

  • pylint - check coding style and identify some logical errors

  • flake8 - check code style

  • Bandit - check for common security issues

  • mypy - check static types

  • Radon - check code complexity

R#

  • lintr - check code style

Code Formatters#

Automated code formatters. These check code style, like linters, but also actively make changes to your code to meet a particular style.

Python#

R#

Packaging Code#

Creating and releasing code as a package.

Python#

R#

  • goodpractice - gives advice on the quality of your R packages

  • fusen - builds R packages from Rmarkdown file specifications

Pipeline Orchestration#

Continuous Integration Platforms#