Using ORCHID for Research
The Oxford-RCGP RSC has a current membership of over 1900 general practices in England covering almost 18 million patients who are broadly representative of the English general population. The emergent COVID-19 pandemic has seen a rapid increase in the number of new practices joining the network to support the national surveillance program. The Oxford-RCGP RSC team transforms routine clinical data from individual patient records at practice level into an accessible repository of data for health research. Data is transformed to series of themed datasets readily available research. These datasets include standard RSC population configuration (based on age band, ethnicity, index of multiple deprivation, and rurality), clinical case definitions (generally ontological), covariates, and outcomes. These are benchmarked to ONS standard populations to enable rapid comparisons between the study and national populations.
How to access RCGP RSC data
RCGP RSC data is available for researchers interested in conducting primary care/ linked data studies. To work with us, please complete the data request form (click the button below). If you have any questions, please email orchid-reg@phc.ox.ac.uk.
We would also need:
- a protocol, clearly defining the purpose of the request and use of data
- a list of variables (no abbreviations for the medical terms)
- ethics approval (evidence if such is not needed – for what evidence you can present, please refer to the guidance notes on the form).
Once processed the request is assigned a reference number and added to the log. It is then scheduled for consideration at the weekly technical meeting and monthly RSC Operational meeting. The official approval happens at the fourth Wednesday of each month when the RCGP RSC operational meeting is held.
Oxford-RCGP RSC Theme Datasets
The Oxford-RCGP RSC team is developing a series of themed datasets that will allow researchers rapid access to theme specific datasets that can be used for research studies. We offer standard protocols that can be modified to cater to the needs of specific research studies. These datasets can be linked to other data sources and prepared according a data structure that fits the analytical requirements of the proposed study.
Our currently planned themes (at different stages of development) are given below. Explore our theme datasets which are ready for research studies by clicking on the buttons on the right.
Explore our theme datasets
Our theme datasets are extracted using theme variables curated using SNOMED CT. We currently have 1000+ theme variables that are featured in our theme datasets. They are also available for bespoke data extractions. The theme variables are refreshed following major SNOMED CT releases. Click on the button below to view the theme variables.
Theme Variable Browser - Insert link
Learn more about stage 1 process - insert link
Learn more about stage 2 process - insert link
How we develop phenotypes and ontologies to support theme datasets
We start theme development by forming a group of theme experts led by a theme lead. The theme lead will design the theme dataset based on the common requirements for the research theme. The theme lead populates a theme template based on expert knowledge (or study requirement, if the theme is initially focused on producing a dataset for a study). The theme lead then works with a team of code curators to convert the data requirements of a in to a clinical code specification. This is carried out as a two stage approach (click on the links on the right to learn more). The outputs of the code curation process are stored in a code repository in the secure environment. The codes committed to the code repository are immediately available to be used for data extraction. The code repository is capable of updating automatically to reflect to most code changes in future SNOMED CT releases.
Systems involved in the theme development process
SNOMED CT Code Tool
Theme leads and clinicians (with expertise in the theme) use the SNOMED CT code tool developed by the Oxford-RCGP RSC team to curate variables/codes within each theme.
Training for Curators - link on current website is blank
General inclusions/exclusions and populations covariates for theme datasets
General inclusions/exclusions and population covariates:
- People will be included in cohorts who have a least one year’s historic data in their records.
- Age will be represented as year or birth
- Ethnicity will be listed in five categories (we can conduct multiple imputation for missing data)
- IMD provided at LSOA level to individual records, derived from postcode. Where missing we substitute the practice cohort. Plus provided in quintiles (quintile 1 most deprived)
- Household size (N.B. Relies on identical address & registered with same practice)
- Urban, town and city, conurbation (ONS population density)
- NHS Region – North, Midlands and East, South and London.
- Registration will be full registration and not include temporary residents
- We will include a practice indicator
- We will exclude records where: no valid age or gender, no agreement to share data (approx. 0.2%)