Sample assumptions log

Note

This guidance is an ALPHA draft. It is in development and we are still working to ensure that it meets user needs.

Please get in touch with feedback to support the guidance by creating a GitHub Issue or emailing us.

This log contains a list of assumptions used in an example analysis of data from UK universities. It also provides a score for each assumption.

Definitions

Assumptions are scored as Red, Amber or Green (a RAG rating) depending on quality. In RAG, green denotes a favourable value, red unfavourable and amber neutral. The quality of an assumption measures both how certain and robust an assumption is and how appropriate it is for its intended use.

For example, we would usually consider a well documented assumption drawn from published evidence to be very robust, but if it needs to be transformed or adapted significantly to fit the analysis, the quality rating might need downgrading.

You would normally lower the quality rating of an assumption if you cannot get technical sign-off (for example because of lack of technical knowledge) or if the information on which it is based is incomplete or poor quality. You would also normally lower the quality if the confidence interval or uncertainty range is wide (i.e. you wouldn’t be surprised if the value was 50% different from what you measure because of uncertainty).

RAG Rating Assumption quality
GREEN Based on validated data; Methodology is robust; No or few transformations, or transformation methodology is fully verified and robust; Data is current and signed off by experts; Confidence intervals are narrow.
AMBER The methodology is robust but based on limited data; Data required significant transformation to fit the model; Confidence interval is quite wide; Data has not been reviewed recently.
RED Unclear/unreliable data source or no data source provided; Based on limited data and methodology not robust; Data is not current; Confidence interval is wide or quality is unknown.
Assumption ID Depends on Assumptions Location in code, documentation or publication Plain English description of assumption Basis for assumption Numerical value of the assumption Range around the estimated value Estimated distribution Links to supporting analysis Documentation dependencies Date of last review/update Externally reviewed by Date of external review Next review/update due on Quality rating Sensitivity score Risk score
1 2,3,4,5,6,7,8,9,10 Assumption log We assume that the dataset is representative of the population. Team opinion N/A N/A N/A Descriptive statistics (link), comparison to existing data source/publication (link) Final report: methods, caveats 14/02/2024 John Doe 14/02/2024 14/05/2024 GREEN High High
2 3,4,5,6,8,9,10 Assumption log We assume that the data does not exclude any population groups based on their demographic and socio-economic characteristics. Team opinion N/A N/A N/A Descriptive statistics (link), comparison to existing data source/publication (link) Final report: methods, descriptive statistics, caveats 14/02/2024 Jane Roe 14/02/2024 14/05/2024 GREEN High High
3 Correspondence with data provider We assume that all UK universities report data to the Higher Education Statistics Agency (HESA). Validated from data provider, coverage check against university list N/A N/A N/A N/A N/A 14/02/2024 Jane Roe 14/02/2024 14/05/2024 GREEN High Medium
4 3 Correspondence with data provider We assume that our list of UK universities is correct, current and comprehensive. Team opinion, reliable source (HESA list) N/A N/A N/A N/A N/A 14/02/2024 Jane Roe 14/02/2024 14/05/2024 GREEN Low Medium
5 2,3,4 Assumption log We assume that all universities accurately report the number of students enrolled during the academic year. Expert opinion N/A N/A N/A N/A Final report: Caveats 14/02/2024 Jane Roe 14/02/2024 14/05/2024 AMBER High High
6 3,4 Assumption log We assume that all universities accurately report the number of students who dropped out during the academic year. Expert opinion N/A N/A N/A Publication on drop-out rates Final report: Caveats 14/02/2024 Jane Roe 14/02/2024 12/05/2024 AMBER High High
7 3 Correspondence with data providers. We assume that the academic year is consistently measured across UK universities. Validated from data provider N/A N/A N/A Quality assurance based on sampling universities from their websites Quality assurance log, final report: caveats 14/02/2024 Jane Roe 14/02/2024 14/05/2024 RED High High
8 9 Correspondence with data providers. We assume that students who receive special education services are excluded from the calculation of dropout rates. Validated from data provider N/A N/A N/A Sensitivity analysis comparing dropout rates with and without this population included. Quality assurance log, final report: methods, caveats 14/02/2024 Jane Roe 14/02/2024 14/05/2024 RED High High
9 2,3,4,10 Exploratory data analysis notebook (link) We assume that there is complete information for all the variables in the analysis. Robustness testing N/A N/A N/A Descriptive statistics notebook link Desk instructions, final report: methods, summarising the sample 14/02/2024 Jane Roe 14/02/2024 14/05/2024 AMBER High High
10 2,3,4,9 Correspondence with data provider We assume that the data collection process has not changed at all over time. Validated from data provider (link) N/A N/A N/A Methods documents from prior runs of this work, supplier data specifications and quality reports Data supply specification, data quality report, methods document Not yet assigned No external review 14/02/2024 14/05/2024 RED High High
11 2,3,4,5,6,7,8 Assumption log We assume that the correlation coefficient between the dropout rate and social grade of local authority of origin is 0.7 Statistical analysis of past data 0.7 +-0.1 Normal Correlation analysis report m(link), comparison to domain knowledge (link) Data analysis documentation 05/12/2023 John Doe 05/12/2023 01/05/2024 GREEN Medium Low