<- "NG16 1AA"
postcode
# base R
gsub(" ", "", postcode)
[1] "NG161AA"
# {stringr} also available in {tidyverse}
::str_remove(postcode, " ") stringr
[1] "NG161AA"
The latest available IMD data for England is from 2019.
This is expected to be released in late 2025 by Oxford Consultants for Social Inclusion (OCSI).
The consultation outcome was released in 2022 by the Department for Levelling up, Housing & Communities
Thanks to the NHS-R Community for finding and sharing these links on the NHS-R Slack
IMD is very useful for categorising the area a person lives in for deprivation. Deprivation is by country and do not include the other nations. Deciles are the most commonly used way of referring to IMD and are taken from the scores which are ordered and then cut into 10. For deciles the 1 is the most deprived area and 10 is the least deprived.
Wikipedia link
English IMD 2015 - Guidance
To get IMD scores or deciles for a local data a join will be needed to the postcode table (like a directory of postcodes) found:
https://digital.nhs.uk/services/organisation-data-service/data-downloads/ods-postcode-files
and then to the IMD data:
https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019
Other data is available from this dataset including IDAOPI which relates to only older people.
Note that the column headers change, in 2015 it was LADistrictCode2013 and in 2019 it is LADistrictCode2019. Also LADistrictName2013 has become LADistrictName2019.
Postcode lengths vary in original data depending on whether one or two spaces are used between the parts. Consequently, it is always better when joining by postcodes to remove the spaces altogether.
In SQL this would be with the code:
REPLACE(postcode, ' ', '')
in R it can be
<- "NG16 1AA"
postcode
# base R
gsub(" ", "", postcode)
[1] "NG161AA"
# {stringr} also available in {tidyverse}
::str_remove(postcode, " ") stringr
[1] "NG161AA"
Partial postcodes are sometimes provided to protect the data and may be the first part (before spaces) and 1 or 2 characters from the second part. This will not give a sufficiently reliable IMD score.
To get the IMD score the LSOA (Lower Super Output Area) code is required which is taken from the full postcode.
This code will not run and is dependent on the naming conventions of the SQL server. The column names of LSOA11 and LSOAcode2011
will have come from the data sources. The column PostCode_space
has been added to the original data.
SELECT imd.*
FROM DIM_AI.PatientData AS p
LEFT JOIN DIM_AI.PostCodes AS pc ON p.PostCode = pc.PostCode_space
LEFT JOIN DIM_AI.IMD AS i ON pc.PC.LSOA11 = i.LSOAcode2011
To be able to join to the data scores will need to be put into quintiles (group of 5) rather than deciles (group of 10). Where the number of areas divides into the number of quintiles an equal number of areas can be assigned to each quintile. When it does not divide however, a choice must be made as to which quintiles should have a larger number of areas. The Office of Health Improvement & Disparities recommends using the following approach in their Technical Guide to Assigning Deprivation Categories:
Divide the number of small areas within the higher geography by the number of deprivation categories required (up to a maximum of 10), giving an integer and fractional part.
The integer-part of this number represents the minimum number of small areas that will be assigned to each deprivation category within each higher geography.
The below tables then shows which deprivation categories should be assigned additional small areas based on the fractional part of this number and the number of quintiles being used.
Deciles | |
---|---|
Number after decimal point | Deciles receiving an extra area |
0.0 | None |
0.1 | 1 |
0.2 | 1, 6 |
0.3 | 1, 4, 7 |
0.4 | 1, 3, 6, 8 |
0.5 | 1, 3, 5, 7, 9 |
0.6 | 1, 2, 4, 6, 7, 9 |
0.7 | 1, 2, 3, 5, 6, 8, 9 |
0.8 | 1, 2, 3, 4, 6, 7, 8, 9 |
Quintiles | |
---|---|
Number after decimal point | Quintiles receiving an extra area |
0.0 | None |
0.2 | 1 |
0.4 | 1, 3 |
0.6 | 1, 2, 4 |
0.8 | 1, 2, 3, 4 |
Quantiles | |
---|---|
Number after decimal point | Quantiles receiving an extra area |
0.00 | None |
0.25 | 1 |
0.50 | 1, 3 |
0.75 | 1, 2, 3 |
SELECT DISTINCT IMDDecile,
FLOOR((IMDDecile-1)/2) + 1 AS IMDQuintile
FROM DIM_AI.IMD
ORDER BY IMDDecile
The PHEindicatormethods
provides a convenient function that can be used in R to generate quintiles.
<- data.frame(
df region = as.character(rep(c("Region1", "Region2", "Region3", "Region4"),
each = 250
)),smallarea = as.character(paste0("Area", seq_along(1:1000))),
vals = as.numeric(sample(200, 1000, replace = TRUE)),
stringsAsFactors = FALSE
)
# assign small areas to deciles across whole data frame
# print the top 15
::phe_quantile(df, vals, type = "standard") |>
PHEindicatormethods::slice_head(n = 15) dplyr
region smallarea vals quantile
1 Region1 Area1 41 8
2 Region1 Area2 157 3
3 Region1 Area3 64 7
4 Region1 Area4 96 6
5 Region1 Area5 5 10
6 Region1 Area6 48 8
7 Region1 Area7 125 4
8 Region1 Area8 190 1
9 Region1 Area9 105 5
10 Region1 Area10 188 1
11 Region1 Area11 139 4
12 Region1 Area12 91 6
13 Region1 Area13 20 9
14 Region1 Area14 178 2
15 Region1 Area15 58 8
In areas like Nottingham/Nottinghamshire the differences between the LSOA areas is diminished when ranked against England as a whole, but when ranked locally, the variation is much more pronounced. It is possible to take the original scores and apply deciles or quintiles to those scores in order to create a local IMD.
To apply a rank use the windows partition function ROW_NUMBER() OVER(ORDER BY IMDRank)
to create a new ranking score and NTILE(10) OVER (ORDER BY IMDRank)
to create new deciles.
library(tidyverse)
library(PostcodesioR)
library(NHSRpostcodetools) # installed from GitHub not CRAN
library(NHSRpopulation) # installed from GitHub not CRAN
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
# Generate random example postcodes
# Restricted to NG postcodes from Nottinghamshire because postcodes are drawn
# from all nations and don't validate within the {NHSRpopulation} package
# currently
<- purrr::map_chr(
postcodes 1:10,
.f = ~ PostcodesioR::random_postcode("NG16") |>
::pluck(1)
purrr
)
# Create a tibble
<- dplyr::tibble(
tibble_postcodes random_postcodes = postcodes,
)
::get_data(tibble_postcodes,
NHSRpopulationcolumn = "random_postcodes",
url_type = "imd"
|>
) ::select(
dplyr
random_postcodes,
new_postcode,
imd_decile,
imd_rank,
imd_score|>
) mutate(imd_decile_local = ntile(-imd_score, n = 10)) # creating new deciles from the data provided
Joining with `by = join_by(random_postcodes)`
# A tibble: 10 × 6
random_postcodes new_postcode imd_decile imd_rank imd_score imd_decile_local
<chr> <chr> <int> <int> <dbl> <int>
1 NG16 3LS NG16 3LS 2 4475 39.3 4
2 NG16 3ER NG16 3ER 4 12227 22.7 5
3 NG16 1HL NG16 1HL 9 29293 6.14 10
4 NG16 6ND NG16 6ND 2 4116 40.5 3
5 NG16 2RR NG16 2RR 6 16941 17.1 6
6 NG16 4DP NG16 4DP 1 3110 44.7 1
7 NG16 3JB NG16 3JB 2 3595 42.5 2
8 NG16 3RW NG16 3RW 9 26797 8.22 9
9 NG16 2AU NG16 2AU 6 18704 15.2 7
10 NG16 3DR NG16 3DR 7 22108 12.1 8
https://fingertips.phe.org.uk/search/imd
http://dclgapps.communities.gov.uk/imd/idmap.html
Technical report for 2019: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/833951/IoD2019_Technical_Report.pdf
Example Shiny app from Trafford Data Lab with published code on GitHub.
English indices of deprivation 2019: Postcode Lookup for csv uploads. Do check for Information Governance when using sensitive data as, even though these should be loaded separate to other information, this may not be authorised. Also this won’t be considered part of a Reproducible Analytical Pipeline as this is a manual step.
Ministry of Housing, Communities and Local Government. English Indices of Deprivation 2015. https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 (Accessed 22 March 2024)