This is a living document. We want our users to get as much value as possible from our analysis. Because of that, we will be updating these pages with additional outputs and insights when possible. To make sure you always access the latest insights, we recommend linking to these pages rather than sharing this analysis any other way at this stage.

Coding frequency and tools

Most respondents use code at least regularly in their current role

Over half of respondents reported using code to achieve their work objectives regularly or all the time. This survey is likely to have attracted many analysts with an interest in programming, meaning these figures may be skewed. Nevertheless, the figures here show that many analysts are regularly using code to carry out their work.

Show chart Show table
Coding frequency Count
Never 109
Rarely 119
Sometimes 168
Regularly 282
All the time 234
a Sample size = 912


Respondents use code to perform a variety of data operations

For each of the data operations included in the survey, the majority of analysts use code to carry them out at least some of the time. Many respondents reported other data operations in the free text box provided. Government analysts are capable to carry out a variety of tasks using code, including every part of the analytical pipeline.

Show chart Show table
Data operation I do some or all of this by coding I do this without coding
Analysis 701 164
Data cleaning 653 102
Data linking 553 68
Data transfer 310 128
Data visualisation 565 234
Machine learning 169 15
Modelling 357 106
a Sample size = 912


Access to and knowledge of programming languages

Given a list of programming tools, we asked respondents to answer “Yes”, “No” or “Don’t know” for the following statements;

  • This tool is available to use for my work.
  • I know how to program with this tool to a level suitable for my work.

Please note we did not set precise definitions for availability and knowledge here, as these are dependent on individual circumstances.


Open source tools are the most available for analysts work

R, SQL and Python were the three most available programming languages to government analysts. All three are open source languages, the use of which is a key component of the RAP methodology. Such tools help analysts make their work more transparent and accessible to users, as cost is not a barrier to learning these languages.

Show chart Show table
Programming language Yes Don't Know No
C++ / C# 62 488 362
Java / Scala 86 487 339
Javascript / Typescript 136 462 314
Matlab 36 457 419
Python 603 164 145
R 808 52 52
SAS 412 289 211
SPSS 283 350 279
SQL 618 206 88
Stata 200 431 281
VBA 529 272 111
a Sample size = 912


Open source tools are leading the way in capability

More respondents said they were able to carry out their work using Python, SQL and R than any other programming language. R in particular is consistently popular across professions, having the highest or second highest levels of capability across all analytical professions. In contrast, proprietary tools are much more profession-specific. Popular Open source languages present the best opportunity for collaboration both across professions and between departments.

Show chart Show table
Programming language Yes Don't Know No
C++ / C# 79 40 793
Java / Scala 51 37 824
Javascript / Typescript 90 37 785
Matlab 178 46 688
Python 379 34 499
R 623 25 264
SAS 305 28 579
SPSS 305 37 570
SQL 560 20 332
Stata 142 40 730
VBA 281 38 593
a Sample size = 912
Show chart Show table
Programming language Profession Know how to use (percent)
R Actuaries 33.3
R Digital and data (DDAT) 55.6
R Data scientists 82.1
R Economists (GES) 37.9
R Operational researchers (GORS) 86.4
R Social researchers (GSR) 46.5
R Statisticians (GSG) 74.0
SQL Actuaries 25.0
SQL Digital and data (DDAT) 88.9
SQL Data scientists 85.1
SQL Economists (GES) 31.0
SQL Operational researchers (GORS) 79.5
SQL Social researchers (GSR) 18.6
SQL Statisticians (GSG) 64.4
Python Actuaries 16.7
Python Digital and data (DDAT) 61.1
Python Data scientists 74.6
Python Economists (GES) 13.8
Python Operational researchers (GORS) 65.9
Python Social researchers (GSR) 23.3
Python Statisticians (GSG) 46.2
SAS Actuaries 25.0
SAS Digital and data (DDAT) 38.9
SAS Data scientists 29.9
SAS Economists (GES) 10.3
SAS Operational researchers (GORS) 38.6
SAS Social researchers (GSR) 14.0
SAS Statisticians (GSG) 34.6
SPSS Actuaries 8.3
SPSS Digital and data (DDAT) 11.1
SPSS Data scientists 16.4
SPSS Economists (GES) 10.3
SPSS Operational researchers (GORS) 9.1
SPSS Social researchers (GSR) 72.1
SPSS Statisticians (GSG) 42.3
VBA Actuaries 41.7
VBA Digital and data (DDAT) 33.3
VBA Data scientists 34.3
VBA Economists (GES) 17.2
VBA Operational researchers (GORS) 40.9
VBA Social researchers (GSR) 4.7
VBA Statisticians (GSG) 26.9
Matlab Actuaries 8.3
Matlab Digital and data (DDAT) 16.7
Matlab Data scientists 25.4
Matlab Economists (GES) 17.2
Matlab Operational researchers (GORS) 54.5
Matlab Social researchers (GSR) 11.6
Matlab Statisticians (GSG) 23.1
Stata Actuaries 0.0
Stata Digital and data (DDAT) 11.1
Stata Data scientists 11.9
Stata Economists (GES) 51.7
Stata Operational researchers (GORS) 2.3
Stata Social researchers (GSR) 23.3
Stata Statisticians (GSG) 18.3


Many respondents have capability in tools they cannot access in work

Using data collected from previous questions we calculated the number of respondents who have access but no knowledge, access and knowledge, and knowledge but no access for each programming language.

A large minority of respondents have capability in programming languages they do not have access to. This is particularly true for proprietary software. A further advantage of open-source software is that it is easier for analysts’ coding skills to be transferable if they learn an open-source language, as they are more likely to have access to the same software at another department in future.

Show chart Show table
Programming language Access only Access and knowledge Knowledge only
C++ / C# 45 17 62
Java / Scala 73 13 38
Javascript / Typescript 75 61 29
Matlab 25 11 167
Python 295 308 71
R 211 597 26
SAS 175 237 68
SPSS 136 147 158
SQL 144 474 86
Stata 129 71 71
VBA 273 256 25
a Sample size = 912


Coding capability


Most respondents learned to code outside of the public sector

Analysts are generally learning to code for the first time before joining the public sector, particularly in education. However, many reported learning to code for the first time in their current role or in previous public sector employment. This shows that not only are analysts with previous coding experience improving, but those without those skills are being upskilled as well.

Show chart Show table
First coding experience Count
In current role 138
In education 411
In private sector employment 44
In public sector employment 116
Self-taught 139
Other 14
a Sample size = 865


Most analysts said their coding ability has improved during current role

Responses show many analysts in government are either maintaining or actively improving their coding capability. But this is not true across the board. A large minority reported a decrease in their coding abilities in their current role.

This question was only asked of respondents with coding experience outside of their current role. This means the data does not cover those who first learned to code in their current role.

Show chart Show table
Ability Change Count
Significantly worse 51
Slightly worse 99
No change 136
Slightly better 226
Significantly better 237
a Sample size = 749


How often people code in their work is correlated with changes to their coding abilities

Across government, code capability appears to be increasing. Many are picking up and improving these skills while working in government. How often people write code at work is positively correlated with the change in their coding ability (r(16) = .59, p < .01). Conversely, those who do not code in their work are much more likely to report their coding abilities are getting worse. Although this finding is not surprising, it highlights the fact that analysts’ coding abilities are not static, and need to be maintained.

Show chart Show table
Coding frequency Significantly worse Slightly worse No change Slightly better Significantly better
Never 0.3 0.2 0.4 0.1 0.0
Rarely 0.2 0.4 0.3 0.1 0.0
Sometimes 0.0 0.2 0.3 0.3 0.2
Regularly 0.0 0.1 0.1 0.3 0.4
All the time 0.0 0.0 0.1 0.4 0.5
a Sample size = 912


Coding practices

We asked respondents who said they use code in their work how often they carry out various coding practices considered to be good practice. For more information on the practices presented below, please read our guidance on quality assurance of code for analysis and research.


Many good coding practices are not regularly applied by analysts

The practices list below are intended to improve the reproducibility, assurance and auditability of code. Most respondents reporting using these practices at least some of the time, but often not regularly. It appears many analysts have the skills to use these practices, but either do not have time or do not feel the need to do so.

Show chart Show table
Percent
Question I don't understand this question Never Rarely Sometimes Regularly All the time
I use open source software when programming 7.1 14.4 9.0 13.8 22.2 33.5
My team open sources its code 12.2 42.8 16.3 15.1 8.3 5.2
I use a source code version control system e.g. Git 5.9 30.6 13.9 13.1 14.4 22.0
Code my team writes is reviewed by a colleague 1.1 4.5 9.1 23.0 33.1 29.1
I write repetitive elements in my code as functions 3.6 8.7 9.3 23.5 30.6 24.2
I unit test my code 26.8 15.4 14.4 20.4 13.8 9.1
I collect my code and supporting material into packages 11.0 43.5 15.4 17.3 8.6 4.2
I follow a standard directory structure when programming 20.9 12.8 10.6 21.0 22.4 12.2
I follow coding guidelines or style guides when programming 5.1 9.6 9.0 26.2 34.2 15.9
I write code to automatically quality assure data 4.2 21.4 17.8 31.8 16.6 8.2
My team applies the principles set out in the Aqua book when carrying out analysis as code 37.1 13.6 8.1 16.6 18.3 6.4
a Sample size = 803


Most types of documentation are not regularly used

Similarly, most respondents do not write most forms of code documentation regularly, with the exception of code comments.

Embedded documentation is one of the components which make up a RAP minimum viable product. Documentation is important to help others be clear on what the code is doing.

Show chart Show table
Percent
Question I don't understand this question Never Rarely Sometimes Regularly All the time
Code comments 3.4 2.1 1.7 8.2 28.1 56.4
Documentation for each function or class 11.2 17.6 12.6 23.5 23.8 11.3
README files 7.2 20.2 15.7 23.8 20.5 12.6
Desk notes 22.4 18.8 10.3 21.5 18.7 8.2
Analytical Quality Assurance (AQA) logs 23.2 24.9 14.9 17.9 14.1 5.0
Data or assumptions registers 19.3 34.5 10.6 12.3 15.9 7.3
Flow charts 7.1 30.3 22.0 26.2 11.1 3.4
a Sample size = 803


A minority are using dependency management or reproducible workflow packages

More advanced coding practices are less common still, with only a minority carrying them out at all.

Show chart Show table
Use dependency management software Count
Yes 171
No 328
I don't know what dependency management is 304
a Sample size = 803


Show chart Show table
Use reproducible workflow packages Count
Yes 31
No 480
I don't know what reproducible workflows are 292
a Sample size = 803


RAP knowledge and opinions

We asked respondents about their knowledge of and opinions on reproducible analytical pipelines (RAP). RAP refers to the use of practices from software engineering to make analysis more reproducible. These practices build on the advantages of writing analysis as code by ensuring increased quality, trust, efficiency, business continuity and knowledge management. The RAP champions are a network of analysts across government who promote and support RAP development in their departments.


Most respondents have heard of RAP, but many do not know who their RAP champions are

65% of respondents have heard of RAP which means a large minority haven’t heard of RAP and its principles. Of those who have heard or RAP, most do not know who their department RAP champion is. RAP champions promote and support RAP development across government. There is clearly more to be done to make sure people are aware of RAP and know where they can find support.

Show chart Show table
RAP champion knowledge Count
Have not heard of RAP 261
Heard of RAP, have not heard of RAP champions 133
Heard of RAP, does not know department champion 253
Heard of RAP champions, no champion in department 20
Knows department RAP champion 189
a Sample size = 912


Respondents are positive about RAP knowledge and implementation

We asked respondents who had heard of RAP whether they agreed with a series of statements.

Show chart Show table
Percent
Question Strongly disagree Disagree Neutral Agree Strongly agree
I feel confident implementing RAP in my work 10.0 17.8 22.6 32.7 16.9
I feel supported to implement RAP in my work 7.2 17.8 25.0 31.2 18.7
I know where to find resources to help me implement RAP 7.8 18.7 18.4 38.1 16.9
I understand what the key components of the RAP methodology are 7.2 15.5 20.1 39.8 17.4
I think it is important to implement RAP in my work 3.2 4.5 19.0 36.6 36.7
I and/or my team are currently implementing RAP 12.3 18.9 20.3 27.3 21.2
I or my team are planning on implementing RAP in the next 12 months 9.8 14.7 25.7 28.0 21.8
a Sample size = 651


RAP scores

RAP scores are the sum of scores for each component of the minimum viable product (MVP). A score of one for each RAP component is derived where respondents answered “regularly” or “all the time” to the relevant questions. For documentation, this includes both code comments and README files. For the continuous integration and dependency management components we only collected “yes”, “no” or “I don’t understand the question” responses. As such, we gave “yes” responses a score of 1. The sum total of each respondent’s scores is presented here as “RAP scores”. “Basic components” are the components which make up the RAP MVP. “Advanced components” are RAP components that go beyond what the RAP champions saw as the minimum standard.


None of the RAP components are applied consistently

While some RAP components are applied often by most respondents, the majority are not, including the majority of the MVP components. Open sourcing in particular is only done by a minority of respondents. That means that, most of the time, analysts are not meeting the mimimum RAP guidance when writing code.

Show chart Show table
Component Type Count
AQUA book guidance Basic 198
Documentation Basic 245
Peer review Basic 500
Team open source code Basic 109
Use open source software Basic 447
Version control Basic 293
Code packages Advanced 103
Continuous integration Advanced 143
Dependency management Advanced 171
Follow code style guidelines Advanced 403
Function documentation Advanced 282
Functions Advanced 440
Unit testing Advanced 184
a Sample size = 803


Few respondents regularly use half or more of the advanced RAP components

Only a minority of respondents apply the majority of the more advanced RAP practices included here. However, it should be noted that not all of these would be applicable or proportionate for every analytical pipelines, and we do not necessarily recommend they all be applied every time.

Show chart Show table
Advanced RAP score Count
0 169
1 189
2 158
3 99
4 96
5 37
6 30
7 25
a Sample size = 803


Respondents who tell us they are implementing RAP use more of the RAP MVP practices, more often

Although very few regularly meet the RAP MVP, there is a positive correlation (r(24) = .37, p < .01) between respondents self-reporting that they currently implement RAP and their basic RAP score. However, the correlation is relatively weak. This suggests that more can done to improve awareness of the RAP MVP guidance across government.

Show chart Show table
I am Currently Implementing RAP 0 1 2 3 4 5 6
Strongly Disagree 25.0 29.4 14.7 16.2 14.7 0.0 0.0
Disagree 16.4 25.9 35.3 11.2 7.8 2.6 0.9
Neutral 7.4 25.6 25.6 19.8 10.7 8.3 2.5
Agree 8.7 17.9 28.9 20.8 14.5 8.1 1.2
Strongly Agree 2.3 8.4 14.5 22.9 24.4 17.6 9.9
a Sample size = 609


Respondents with a better understanding of RAP use minimum RAP practices more often

There is a positive correlation between self-reported understanding of RAP and basic RAP scores (r(24) = .39, p < .01). This suggests increasing awareness of RAP standards and guidelines could help increase the use of RAP practices. However, this is only a correlation and should be treated with caution, as it does not necessarily imply a causal direction. It should be noted that access to tools varies across departments and may impact analysts’ ability to implement RAP practices.

Show chart Show table
I Understand Key concepts of RAP 0 1 2 3 4 5 6
Strongly Disagree 37.5 20.0 20.0 15.0 5.0 2.5 0.0
Disagree 17.0 34.1 21.6 17.0 9.1 1.1 0.0
Neutral 8.9 22.6 37.1 14.5 10.5 5.6 0.8
Agree 8.0 19.3 25.3 22.1 14.9 8.4 2.0
Strongly Agree 1.9 8.3 13.9 18.5 26.9 18.5 12.0
a Sample size = 609