Ways of working
Motivation
These ways of working make RAP easier to implement and more productive. While coding standards are important, it’s just as important to manage your analysis code in the right way.
6. Code modules should run end to end without manual intervention
You should not have to make manual changes to the code or modify parameters during execution. Pipeline scripts should run from end to end. This doesn’t mean automating everything! Rather, think about how best to break the workflow down into sensible steps. If you have manual QA checks to perform, you can do those after the pipeline has finished by building the checks into the run and saving run logs and QA outputs for review.
6.1 Minimise manual steps
Automation isn’t just about efficiency. Automating manual steps reduces the risk of error. While those steps might be easy to carry out, each manual step introduces the risk of human error. Automate analysis steps that computers can do more reliably than us, such as performing basic checks, generating standard quality reports and saving outputs.
6.2 Use configuration files
Analysts often write pipelines that need frequent adjustment, sometimes even while running the pipeline.
For example, we often see scripts where variables need updating or code must be commented in and out (effectively adding or removing code chunks during execution). Sometimes comments are even used to record when changes were made.
Working in this way is bad for reproducibility because:
- It is very hard to see what the code looked like for each pipeline run.
- It makes the process hard to document, as the analyst needs to know which parts of it to change and when.
- It makes it hard to version control, because it is not clear who made which changes, when and why.
Use configuration to handle these changes outside of the pipeline itself.
Configuration files store settings for your pipeline. For example, if you know you might need to manually change the input file name, it makes sense for it to be loaded in from the configuration file, so you don’t have to change every time it appears in your script.
The file also acts as documentation, because it allows others to see which settings were used to run the code, without needing to read the entire code base. Other configuration settings might be things like dates, data and output folder locations, geographical areas, and model parameters.
Configuration files are especially useful for re-usable pipelines, like regular publications. However, they are also useful for one-off pieces work.
Using a configuration file from the start makes code development easier if your code needs a lot of set up. For example, you might need to frequently change parameters when developing a model, even if the resulting model will be a single-use piece of work.
7. Use git for version control
Version control is about managing changes to your analysis over time. Using git for version control means you can see what previous versions of the code looked like and even revert to previous iterations of your pipeline, which is very difficult to do manually.
If used correctly, git will help you keep an audit trail of who made which contributions, when and why. This also improves efficiency by helping you to collaborate and track changes more effectively.
Avoid archiving code in your repository or commenting out blocks of code that you no longer use. Using git means you can delete code that you don’t need anymore because it is preserved in the version history. Your repository should not need to contain archived folders or commented out code.
8. Make sure your code meets quality standards
Ultimately, the aim of RAP is to produce high quality statistical outputs. When thinking about the quality of your code you should consider how the ways you work help you to achieve this. This means making sure code quality is proportionate to how complex and risky your pipeline is. Below are the essential quality assurance practices for analysis code. Remember that complex, high risk projects may require extra forms of quality assurance.
8.1 Make sure your analysis meets quality standards
Analysis carried out using code should meet the same quality standards as any other analysis. That means using the principles and practices set out in the [AQuA Book]https://www.gov.uk/government/publications/the-aqua-book-guidance-on-producing-quality-analysis-for-government) and the ONS quality standard for analysis.
8.2 Review all code as you go
Aim to review your entire code base at least once. Do this as you develop the code rather than at the end. If possible, code should be reviewed during development by someone who didn’t write it. Consider the quality of the code as well as whether it’s doing what it’s supposed to. Use code reviews to make sure your code is readable, reproducible and well documented as well as whether it works. The code review ONS Wiki template provides a set of criteria for code review.
8.3 Test your code
Test your code to make sure it does what you expected it to. Just because code runs without errors and produces outputs that look plausible does not guarantee that it works as intended!
There are many ways to test code. Usually, testing is a lot easier when your code is modular or packaged. Make sure your tests are proportionate to risks and complexity. Ideally, tests should be formalised, whether they are manual or automated.
Automating tests makes it much easier to document exactly how you tested your code and makes your tests easier to re-run.
Ways to test code include:
- Unit tests: automated testing of each function against expected outcomes
- Integration tests: tests of how different functions work together
- Parallel runs: running a new pipeline and comparing results with those of the existing pipeline
- Running the code on different systems to ensure it produces a reliable outcome
Tests are not a tick box exercise! Each test should serve a clear purpose. Good tests will help you spot issues with the code before errors happen and speed up code production.
9 Maintain and improve re-usable code continuously
Code that is intended for re-use is never truly “finished”. Expect to maintain the code for as long as it needs to be used. Things like changes to your input data or the systems you use might require unexpected code changes.
9.1 Make code improvements before errors occur
Code is much harder to fix once an error has already occurred. Finding the source of the error and fixing the issue is much harder under pressure.
Instead, focus on reducing risks in the code before they cause errors. This might mean making the code easier to change in future, doing more quality assurance and testing, or coding solutions for issues that are likely to happen, even if they haven’t yet. Aim to improve your code while it is functioning correctly to avoid future mistakes.
9.2 Develop your code iteratively
Don’t expect to develop everything to the highest standard right away. If your team are still learning to implement RAP you might aim for a basic set of code at first and improve it over time. Early code might not fully meet user needs. You may want to prioritise the most important functionality first and add lower priority functionality later.
Lower quality code usually needs more work in future compared with code that is well written, modular, documented and tested from the start.
10 Don’t reinvent the wheel
Open source code is a great tool when it comes to building pipeline. There are usually already solutions out there for most common analysis problems that mean you don’t need to write everything from scratch.
When tacking a new problem, check whether a package already exists that can help you. We recommend using packages that are well established and actively maintained, which you can find out by visiting their github repositories or equivalents.
There are many resources online that can help you if such a package isn’t available. Forums like Stack Overflow carry solutions to common errors or bugs and users often offer code examples.