Profession summary: government data scientists

How to use this research

Responding to CARS is voluntary. The results presented here are from a self-selecting sample of government analysts. Because respondents are self-selecting, the results we present reflect the views of the analysts who participated.

For more detail, see the data collection page.

Coding frequency and tools

How often analysts are using code at work

We asked respondents “In your current role, how often do you write code to complete your work objectives?”

Show chart Show table
Coding frequency Percent
Never 0.7%
Rarely 2.1%
Sometimes 12.5%
Regularly 29.9%
All the time 54.9%
Sample size = 144

Access to and knowledge of programming languages

Given a list of programming tools, we asked all respondents if the tool was available to use for their work.

Access to tools does not necessarily refer to official policy. Some analysts may have access to tools others cannot access within the same organisation.

Access to coding tools

Show chart Show table
Programming tool Yes No Don't Know
Python 87.5% 6.2% 6.2%
R 97.9% 2.1% 0%
SQL 85.4% 4.2% 10.4%
Matlab 4.9% 55.6% 39.6%
SAS 29.9% 36.1% 34%
SPSS 14.6% 37.5% 47.9%
Stata 13.9% 41% 45.1%
VBA 40.3% 24.3% 35.4%
Sample size = 144

Given the same list of programming tools, all respondents were asked if they knew how to program with the tool to a level suitable for their work, answering “Yes”, “No” or “Not required for my work”.

Please note that capability in programming languages is self-reported here and was not objectively defined or tested. The statement “not required for my work” was similarly not defined.

Knowledge of coding tools

Show chart Show table
Programming tool Yes No Not required for my work
Python 77.8% 14.6% 7.6%
R 84.7% 9.7% 5.6%
SQL 79.2% 14.6% 6.2%
Matlab 9.7% 34.7% 55.6%
SAS 16.7% 36.1% 47.2%
SPSS 11.1% 36.1% 52.8%
Stata 4.2% 35.4% 60.4%
VBA 14.6% 32.6% 52.8%
Sample size = 144

Access to and knowledge of git

We asked respondents to answer “Yes”, “No” or “Don’t know” for the following questions:

  • Is git available to use in your work?
  • Do you know how to use git to version-control your work?

Please note these outputs include people who do not code at work.

Access to git

Show chart Show table
Response Percent
Yes 93.1%
No 4.9%
I don't know 2.1%
Sample size = 144

Knowledge of git

Show chart Show table
Response Percent
Yes 93.1%
No 6.2%
I don't know 0.7%
Sample size = 144

Coding capability and change

Where respondents first learned to code

Respondents with coding experience outside their current role were asked where they first learned to code. Those analysts who code in their current role but reported no other coding experience, are included as having learned ‘In current role’. Those who reported first learning to code outside of a work or educational environment were categorised as ‘self-taught’ based on free-text responses.

These data only show where people first learned to code. They do not show all the settings in which they had learned to code, to what extent, or how long ago.

Show chart Show table
Where learned Percent
Current employment 11.9%
Education 62.2%
Previous private sector employment 4.9%
Previous public sector employment 9.1%
Self-taught 9.8%
Other 2.1%
Sample size = 143

Change in coding ability during current role

We asked “Has your coding ability changed during your current role?”

This question was only asked of respondents with coding experience outside of their current role. This means analysts who first learned to code in their current role are not included in the data.

Show chart Show table
Ability change Percent
Significantly worse 1.6%
Slightly worse 4.8%
Stayed the same 13.5%
Slightly better 28.6%
Significantly better 51.6%
Sample size = 126

Coding practices

We asked respondents who said they currently use code in their work, how often they carry out various coding practices. For more information on the practices presented below, please read our guidance on Quality Assurance of Code for Analysis and Research

Open sourcing was defined as ‘making code freely available to be modified and redistributed’

Consistency of good coding practices

Show chart Show table
Statement I don't understand this question (%) Never (%) Rarely (%) Sometimes (%) Regularly (%) All the time (%)
Automated data quality assurance 4.9% 10.5% 11.9% 24.5% 23.8% 24.5%
Code review 0.7% 2.8% 8.4% 23.8% 25.9% 38.5%
Coding guidelines / Style guides 2.1% 5.6% 3.5% 18.2% 35.7% 35%
Functions 0.7% 2.8% 2.8% 13.3% 25.2% 55.2%
Open source own code 6.3% 33.6% 20.3% 16.1% 13.3% 10.5%
Packaging code 1.4% 28.7% 18.9% 23.8% 10.5% 16.8%
Proportionate quality assurance 8.4% 1.4% 2.1% 9.1% 38.5% 40.6%
Quality assurance throughout development 4.9% 3.5% 4.9% 16.1% 33.6% 37.1%
Standard directory structure 9.8% 10.5% 5.6% 16.1% 26.6% 31.5%
Unit testing 7% 10.5% 16.8% 26.6% 19.6% 19.6%
Use open source software 0% 0% 1.4% 3.5% 22.4% 72.7%
Version control 1.4% 5.6% 3.5% 9.8% 20.3% 59.4%
Sample size = 143

Documentation

We asked respondents who reported writing code at work how frequently they write different forms of documentation when programming in their current role.

Embedded documentation is one of the components which make up a RAP minimum viable product. Documentation is important to help others be clear on how to use the product and what the code is intended to do.

Show chart Show table
Statement I don't understand this question (%) Never (%) Rarely (%) Sometimes (%) Regularly (%) All the time (%)
Analytical Quality Assurance (AQA) logs 12.6% 28.7% 14% 26.6% 11.2% 7%
Code comments 0% 1.4% 0.7% 4.2% 30.1% 63.6%
Data or assumptions registers 17.5% 36.4% 19.6% 11.2% 9.8% 5.6%
Desk notes 21% 11.9% 13.3% 22.4% 18.9% 12.6%
Documentation for each function or class 1.4% 11.9% 7.7% 20.3% 27.3% 31.5%
Flow charts 4.2% 19.6% 18.2% 31.5% 16.1% 10.5%
README files 2.1% 13.3% 9.1% 14.7% 27.3% 33.6%
Sample size = 143

Dependency Management

We asked respondents who reported writing code at work if they manage dependencies for their projects.

We provided examples of tools that may be used for dependency management:

  • Requirements files, e.g. python requirements.txt or R DESCRIPTION files
  • Virtual environments (e.g. venv or renv) or virtual machines
  • Containers e.g. Docker
Show chart Show table
Use dependency management software Percent
Yes 67.1%
No 22.4%
I don't know what dependency management is 10.5%
Sample size = 143

Continuous integration

We asked respondents who reported writing code at work if they use continuous integration.

We provided some examples of continuous integration technologies:

  • GitHub actions
  • Jenkins
  • Travis
Show chart Show table
Use continuous integration Percent
Yes 32.9%
No 46.2%
I don't know what continuous integration is 21%
Sample size = 143

Reproducible workflow packages

We asked respondents who reported writing code at work whether they use reproducible workflow packages.

We provided some examples of packages:

  • drake
  • make
  • pymake
  • targets
Show chart Show table
Use reproducible workflow packages Percent
Yes 11.9%
No 65.7%
I don't know what reproducible workflows are 22.4%
Sample size = 143

Reproducible analytical pipelines (RAP)

We asked respondents about their knowledge of and opinions on reproducible analytical pipelines (RAP). RAP refers to the use of practices from software engineering to make analysis more reproducible. These practices build on the advantages of writing analysis as code by ensuring increased quality, trust, efficiency, business continuity and knowledge management.

The RAP champions are a network of analysts across government who promote and support RAP development in their departments. Please contact the analysis standards and pipelines team for any enquiries about RAP or the champions network.

The Analysis Function RAP strategy was released in June 2022 and sets out plans for adopting RAP across government.

Knowledge of RAP

We asked respondents who reported writing code at work, if they had heard of RAP.

Show chart Show table
Knowledge Percent
Yes 90.2%
No 9.8%
Sample size = 143

RAP Champions

We asked respondents who had heard of RAP, if their department has a RAP champion and if they know who it is.

Show chart Show table
Knowledge Percent
Yes, and I am a RAP Champion 10.9%
Yes, and I know who the RAP Champion is 36.4%
Yes, but I don't know who the RAP Champion is 11.6%
No 7%
I don't know 34.1%
Sample size = 129

Awareness of RAP strategy

We asked respondents who had heard of RAP, if they had heard of the RAP strategy.

Show chart Show table
RAP strategy knowledge Percent
Yes 42.6%
Yes, but I haven't read it 35.7%
No 21.7%
Sample size = 129

Opinions on RAP

We asked respondents who had heard of RAP whether they agreed with a series of statements.

Show chart Show table
Statement Strongly Disagree (%) Disagree (%) Neutral (%) Agree (%) Strongly Agree (%)
I and/or my team are currently implementing RAP 9.3% 14% 16.3% 31.8% 28.7%
I feel confident implementing RAP in my work 6.2% 8.5% 10.1% 41.1% 34.1%
I feel supported to implement RAP in my work 6.2% 10.1% 18.6% 36.4% 28.7%
I know where to find resources to help me implement RAP 5.4% 14% 19.4% 27.9% 33.3%
I or my team are planning on implementing RAP in the next 12 months 8.5% 6.2% 18.6% 31% 35.7%
I think it is important to implement RAP in my work 3.9% 3.1% 7% 31.8% 54.3%
I understand what the key components of the RAP methodology are 3.1% 13.2% 9.3% 36.4% 38%
Sample size = 129

RAP scores

In this section we present RAP components and RAP scores.

For each RAP component a percent positive was calculated. Positive responses were recorded where an answer of “regularly” or “all the time” was given. For documentation, a positive response was recorded if both code comments and README files questions received positive responses. For the continuous integration and dependency management components, responses of “yes” were recorded as positive.

“Basic” components are the components which make up the RAP MVP. “Advanced” components are components which help improve reproducibility, but are not considered part of the minimum standard.

RAP components

Show chart Show table
RAP component Type Percentage of analysts who code in their work
Use open source software Basic 95.1%
Version control Basic 79.7%
Proportionate QA Basic 79%
Peer review Basic 64.3%
Documentation Basic 58.7%
Team open source code Basic 23.8%
Functions Advanced 80.4%
Follow code style guidelines Advanced 70.6%
Dependency management Advanced 67.1%
Function documentation Advanced 58.7%
Unit testing Advanced 39.2%
Continuous integration Advanced 32.9%
Code packages Advanced 27.3%
Sample size = 143