Innovating with generative AI to better understand community needs
In El Salvador, UNHCR is exploring the use of new tools to securely and effectively analyze unstructured information provided by communities â and provide more effective assistance.
El Salvador is the smallest country in Central America, with a population of around 6.3 million people. Of those, more than 71,000 â some 1.1% of all families â are internally displaced. Many more have previously experienced violence and displacement; some have returned to the country after attempts to migrate north; and an increasing number of people in mixed movements are now transiting the territory. This all adds up to a complex human mobility landscape.
UNHCR, the UN Refugee Agency, needs to have an up-to-date and nuanced understanding of the experience of people around the country in order to fulfill its protection mandate, address evolving needs, and ensure communitiesâ priorities shape the decisions affecting their lives. One way the organization seeks to do this is by, quite simply, talking to people.
In 2023 alone, UNHCR conducted some 122 focus group discussions (FGDs) in El Salvador, with more than 1,300 people. This generated more than 400 hours of audio, which became more than 8,000 pages of transcription. But how do you find time to read 8,000 pages, let alone analyze it?
Finding a solution â after a sharp pivot
To address this challenge, a team of Information Management and Community-Based Protection colleagues in El Salvador explored a potential solution with the support of the Data Innovation Fund. Initially, they planned to use typical data analysis approaches, to organize the unstructured information into data visualizations that quantified terms and themes, to drive evidence-informed programming.
But rapid technological advancements soon after they started their project made clear that this was no longer the most effective approach. âLast year, there was an amazing evolution in artificial intelligence technologies,â says Sebastian Salazar Tapia, Associate Information Management Officer. âItâs remarkable how you can chat with these tools: You can ask questions and the tool gives you answers like a human. But the problem was that these tools answer your questions using all the information on the internet.â
A multistage innovation process
What the El Salvador team needed was a genAI instance that only had access to their core dataset â the FGD transcriptions â and that guaranteed data protection and accuracy. So, they designed a process to gather, clean, and âmineâ the data in four main steps:
- Gathering high-quality recordings: The first crucial step was to ensure the audio recordings of FGDs were of a consistently high quality. The team acquired six professional recorders and developed a set of recording guidelines to support the Protection colleagues who run these meetings. For the pilot, the Information Management team managed the recordings themselves.
- Converting speech to text: Once the team had completed their recordings, they went through a process of reviewing and editing the hundreds of hours of audio, before transforming the speech into text using open-source, AI-supported text transcription tools including Sonix and Whisper. âWe got very good results,â Sebastian says, with roughly 80% confidence of accuracy.
- Cleaning and anonymizing the transcriptions: In the FGDs, people often mention the names of community members, locations, institutions. Using a database of common names, in conjunction with named entity recognition (NER) technologies, the team searched the transcriptions to identify personal information and anonymize it, thereby reducing protection risks.
- Mining the data for specific insights: The teamâs final dataset was more than 8,000 pages of clean, anonymized transcription. It was ready for analysis.
By this time, ChatGPT had been released, and the Azure OpenAI initiative by the Innovation Service and UNHCRâs Division of Information Systems and Telecommunications (DIST) to pilot secure genAI chatbot instances in a variety of use cases was still in its very early stages. The team in El Salvador were often innovating ahead of official recommendations â experimenting with new tools and approaches. âWe really wanted to personalize our tool, make changes, and go farther with the possibilities of artificial intelligence,â recalls Sebastian.
So, they experimented with LangChain and open-source large language models (LLMs) to create their own ChatGPT instance â one trained exclusively on their dataset and operated from a personal computer, without recourse to the internet. Sebastian notes:
âIt was a lot of work learning how to use this. It was a really new, new, new technology.â
Chatting to the data
The result of this process was a secure, accurate genAI tool run on a device used by the Information Management team. Called âSIVAR+â, the tool was fed the many thousands of pages of transcript and, when prompted, could deliver a concise summary of what communities said on a given topic.
In response to a question like âWhat types of support is the community interested in receiving from UNHCR regarding access to employment?â the tool provided broad information on expressed needs. Asked a more specific question, it provided more specific detail. âThe great thing is that you can do this with natural language, as if you were talking with a person, and the answer will be given in the same way,â says Sebastian. Since SIVAR+ could only utilize data from the transcripts, the risk of âhallucinationsâ was mitigated.
Given that the FGDs are informal, and the conversation jumps around thematically, itâs usually very difficult and time-consuming to gather information on a specific topic. SIVAR+ made this simple. It also enabled users to search specific FGDs, request direct quotations, and identify where in the transcript the information is drawn from. The FGDs are âa great conversation with the community that we have every year,â says Sebastian. SIVAR+ offered âa better way to analyze that information.â
Impact and outlook
The El Salvador operation uses information gleaned from FGDs to develop planning and results reports. Usually, such reports take two or three months to compile, using less detailed summaries of what communities have said. With SIVAR+, this work was completed in less than half the time â a process that was both easier and more accurate, because the tool made it possible to directly access what people said, in various levels of granularity. UNHCRâs Representative in El Salvador, Laura Almirall, says:
âSIVAR+ evolved into an invaluable tool for seamlessly accessing information shared by communities, ensuring that their needs and priorities consistently inform every decision we make.â
The tool not only met a very specific need, but has also informed the ongoing Azure OpenAI collaboration by the Innovation Service and DIST. Weâre continuing to explore different ways of responsibly and creatively leveraging generative AI and other AI technologies to further the mission of UNHCR, equipped with learnings from projects like this one.
Building coding capacity in the community
When the project was just getting started, before the great genAI disruption, the team hadnât planned to use Python at all. They thought theyâd be using the R programming language. Knowing they needed to brush up on their own R skills, they wanted to ensure communities could also build coding capacity. So, the team completed an online course in R alongside 23 community members identified as being at risk of displacement, meeting regularly for practical sessions on how to use these skills for data analysis.
UNHCR identified participants in partnership with a local university, and the graduating group â who all received a Google-backed diploma â included people with lived experience of forced displacement. âThis was a great experience because we had different people with different knowledge and skills,â says Sebastian. Participants completed a hands-on final project in collaboration with UNHCRâs Livelihoods unit, analyzing the results of a national household survey, with prizes for the five best assignments.
Even if the SIVAR+ project ultimately pivoted away from R, this community-based initiative was a highlight for the team â and another instance of their dismantling of disciplinary silos. âIt was a great opportunity for us,â says Sebastian. âYou know, Information Management, we usually work in the office. So it was great to work with people in communities.â
Read more about UNHCRâs Data Innovation Fund and discover other ways weâre creatively and responsibly exploring AIâs potential to further UNHCRâs mandate.