<- "NG16 1AA"
postcode
# base R
gsub(" ", "", postcode)
[1] "NG161AA"
# {stringr} also available in {tidyverse}
::str_remove(postcode, " ") stringr
[1] "NG161AA"
The latest available IMD data for England is from 2019.
This is expected to be released in late 2025 by Oxford Consultants for Social Inclusion (OCSI).
The consultation outcome was released in 2022 by the Department for Levelling up, Housing & Communities
Thanks to the NHS-R Community for finding and sharing these links on the NHS-R Slack
IMD is very useful for categorising the area a person lives in for deprivation. Deprivation is by country and do not include the other nations. Deciles are the most commonly used way of referring to IMD and are taken from the scores which are ordered and then cut into 10. For deciles the 1 is the most deprived area and 10 is the least deprived.
To get IMD scores or deciles for a local data a join will be needed to the postcode table (like a directory of postcodes) found:
https://digital.nhs.uk/services/organisation-data-service/data-downloads/ods-postcode-files
and then to the IMD data:
https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019
Other data is available from this dataset including IDAOPI which relates to only older people.
Note that the column headers change, in 2015 it was LADistrictCode2013 and in 2019 it is LADistrictCode2019. Also LADistrictName2013 has become LADistrictName2019.
Postcode lengths vary in original data depending on whether one or two spaces are used between the parts. Consequently, it is always better when joining by postcodes to remove the spaces altogether.
In SQL this would be with the code:
REPLACE(postcode, ' ', '')
in R it can be
<- "NG16 1AA"
postcode
# base R
gsub(" ", "", postcode)
[1] "NG161AA"
# {stringr} also available in {tidyverse}
::str_remove(postcode, " ") stringr
[1] "NG161AA"
Partial postcodes are sometimes provided to protect the data and may be the first part (before spaces) and 1 or 2 characters from the second part. This will not give a sufficiently reliable IMD score.
To get the IMD score the LSOA (Lower Super Output Area) code is required which is taken from the full postcode.
This code will not run and is dependent on the naming conventions of the SQL server. The column names of LSOA11 and LSOAcode2011
will have come from the data sources. The column PostCode_space
has been added to the original data.
SELECT imd.*
FROM DIM_AI.PatientData AS p
LEFT JOIN DIM_AI.PostCodes AS pc ON p.PostCode = pc.PostCode_space
LEFT JOIN DIM_AI.IMD AS i ON pc.PC.LSOA11 = i.LSOAcode2011
To be able to join to the data scores will need to be put into quintiles (group of 5) rather than deciles (group of 10). Where the number of areas divides into the number of quintiles an equal number of areas can be assigned to each quintile. When it does not divide however, a choice must be made as to which quintiles should have a larger number of areas. The Office of Health Improvement & Disparities recommends using the following approach in their Technical Guide to Assigning Deprivation Categories:
Divide the number of small areas within the higher geography by the number of deprivation categories required (up to a maximum of 10), giving an integer and fractional part.
The integer-part of this number represents the minimum number of small areas that will be assigned to each deprivation category within each higher geography.
The below tables then shows which deprivation categories should be assigned additional small areas based on the fractional part of this number and the number of quintiles being used.
Deciles | |
---|---|
Number after decimal point | Deciles receiving an extra area |
0.0 | None |
0.1 | 1 |
0.2 | 1, 6 |
0.3 | 1, 4, 7 |
0.4 | 1, 3, 6, 8 |
0.5 | 1, 3, 5, 7, 9 |
0.6 | 1, 2, 4, 6, 7, 9 |
0.7 | 1, 2, 3, 5, 6, 8, 9 |
0.8 | 1, 2, 3, 4, 6, 7, 8, 9 |
Quintiles | |
---|---|
Number after decimal point | Quintiles receiving an extra area |
0.0 | None |
0.2 | 1 |
0.4 | 1, 3 |
0.6 | 1, 2, 4 |
0.8 | 1, 2, 3, 4 |
Quantiles | |
---|---|
Number after decimal point | Quantiles receiving an extra area |
0.00 | None |
0.25 | 1 |
0.50 | 1, 3 |
0.75 | 1, 2, 3 |
SELECT DISTINCT IMDDecile,
FLOOR((IMDDecile-1)/2) + 1 AS IMDQuintile
FROM DIM_AI.IMD
ORDER BY IMDDecile
The PHEindicatormethods
provides a convenient function that can be used in R to generate quintiles.
<- data.frame(
df region = as.character(rep(c("Region1", "Region2", "Region3", "Region4"),
each = 250
)),smallarea = as.character(paste0("Area", seq_along(1:1000))),
vals = as.numeric(sample(200, 1000, replace = TRUE)),
stringsAsFactors = FALSE
)
# assign small areas to deciles across whole data frame
# print the top 15
::phe_quantile(df, vals, type = "standard") |>
PHEindicatormethods::slice_head(n = 15) dplyr
region smallarea vals quantile
1 Region1 Area1 4 10
2 Region1 Area2 6 10
3 Region1 Area3 194 1
4 Region1 Area4 160 3
5 Region1 Area5 53 8
6 Region1 Area6 181 2
7 Region1 Area7 166 2
8 Region1 Area8 19 9
9 Region1 Area9 67 7
10 Region1 Area10 8 10
11 Region1 Area11 172 2
12 Region1 Area12 146 3
13 Region1 Area13 43 8
14 Region1 Area14 198 1
15 Region1 Area15 61 8
In areas like Nottingham/Nottinghamshire the differences between the LSOA areas is diminished when ranked against England as a whole, but when ranked locally, the variation is much more pronounced. It is possible to take the original scores and apply deciles or quintiles to those scores in order to create a local IMD.
To apply a rank use the windows partition function ROW_NUMBER() OVER(ORDER BY IMDRank)
to create a new ranking score and NTILE(10) OVER (ORDER BY IMDRank)
to create new deciles.
library(tidyverse)
library(PostcodesioR)
library(NHSRpostcodetools) # installed from GitHub not CRAN
library(NHSRpopulation) # installed from GitHub not CRAN
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
# Generate random example postcodes
# Restricted to NG postcodes from Nottinghamshire because postcodes are drawn
# from all nations and don't validate within the {NHSRpopulation} package
# currently
<- purrr::map_chr(
postcodes 1:10,
.f = ~ PostcodesioR::random_postcode("NG16") |>
::pluck(1)
purrr
)
# Create a tibble
<- dplyr::tibble(
tibble_postcodes random_postcodes = postcodes,
)
## Debugging
# NHSRpopulation::get_data(tibble_postcodes,
# column = "random_postcodes",
# url_type = "imd"
# ) |>
# dplyr::select(
# random_postcodes,
# new_postcode,
# imd_decile,
# imd_rank,
# imd_score
# ) |>
# mutate(imd_decile_local = ntile(-imd_score, n = 10)) # creating new deciles from the data provided
https://fingertips.phe.org.uk/search/imd
http://dclgapps.communities.gov.uk/imd/idmap.html
Technical report for 2019: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/833951/IoD2019_Technical_Report.pdf
Example Shiny app from Trafford Data Lab with published code on GitHub.
English indices of deprivation 2019: Postcode Lookup for csv uploads. Do check for Information Governance when using sensitive data as, even though these should be loaded separate to other information, this may not be authorised. Also this won’t be considered part of a Reproducible Analytical Pipeline as this is a manual step.
Ministry of Housing, Communities and Local Government. English Indices of Deprivation 2015. https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 (Accessed 22 March 2024)