
Being BOLD with social care data
Owen Davies, our Data and Intelligence Manager, reflects on an event where our data team learned more about the BOLD initiative - Better Outcomes through Linked Data.
A couple of us from the data team went to an event recently about the BOLD initiative, and there was some interesting learning for social care.
BOLD in this case is an acronym for Better Outcomes through Linked Data. It’s a UK Government initiative that’s working with the Ministry of Justice (MoJ) and its partner organisations to link datasets together to get more insights into how the justice system works in the UK.
Identifying patterns and trends
Some of you will be familiar with the concept of linking data and will have heard of the SAIL databank (another acronym – Secure Anonymised Information Linkage) that’s run by the Population Science department in Swansea University.
If you aren’t that familiar with linked data research, the concept is quite straightforward. Think of all of the services you use. Your GP, a social service provider, schools, HMRC, DVLA, local authorities and so on. All of these are likely to collect data on you and your family each time you use one of their services.
What if we were able to see patterns and trends that could only be found by joining datasets together? What if we were able to, for example, see the number of times a care worker has visited primary care for their mental well-being? And what if we were to see if there were any differences depending on where you lived or the job you did?
This is what data linking does. It essentially strengthens the data that’s available to us to reveal insights that wouldn’t be visible otherwise.
It’s important to remember that it does this using data that’s been anonymised. Your name is removed from the data, and any other information that might be able to identify you is encrypted. This includes things like addresses and identification numbers. Encrypting them makes it virtually impossible to work out who you are, but these pieces of information can still be used to link your data together and understand the areas where people live.
Making the most of SAIL
The BOLD showcase had some impressive projects, but having SAIL on our doorstep means most projects were done with methods we’d seen (and been involved in) before.
SAIL is the most advanced linked data repository in the UK and is world renowned. It’s something we should be using more in social research in Wales and is the reason Social Care Wales is the social care theme lead for Administrative Data Research (ADR) Wales.
We want to encourage more social care organisations to provide their social care data to the SAIL databank so we can get more linked data research done in Wales on adult social care.
Learning from the Ministry of Justice
One piece of work did grab our attention, though. This was work that the Ministry of Justice had done with casefile data from many different organisations, such as prisons and probation services.
The work looked at unstructured data (free text and narrative) in data systems to determine if someone in the prison population was a parent.
We know that having a parent in prison is an adverse childhood experience (ACE). And we now know that ACEs can have a negative impact on future physical and mental health, which in turn can lead to poorer outcomes in adulthood.
Data on the number of parents in the prison system isn’t really that well known. So, if we aren’t already working with the families of parents in prison, then there are a lot of potentially vulnerable children out there that we could be helping.
But working with unstructured data can be extremely difficult. In the past, data with lots of free text had to be read by a real person.
The person would look for particular things in the data. In this case, they’d be looking for indications that a prisoner had any children. But a quick Google suggests there were 87,726 people in prison in England in Wales as of June 2024. That’s a lot of casefiles to go through! What if we could teach a computer to do it?
This is what Natural Language Processing (NLP) is. The team at the MoJ used this technology, a form of artificial intelligence (AI), to “teach” a computer to “read” a casefile. The quotation marks are mine because teach and read are human concepts. That isn’t really what NLP is, but they’re terms that are recognisable to us.
What the computer is doing is looking for patterns, phrases and words that might indicate that a person may have a child. The AI bit here is that, the more data the computer has to examine, the more it begins to be able to identify the context in which a word is used.
For example.
“When I was 16 I had a kid…”
and
“I was just a kid when I first got banged up”
Both of these use the phrase “a kid”, so the computer needs to recognise there’s a difference between the two. The computer also has to know all the different ways that someone can refer to a child too. It’s no small undertaking to do this.
The NLP model that was used to analyse the data eventually yielded a success rate around 90 per cent. The team think that by adding more information about differences in dialect and language, this can be even higher.