As the deadline for the Millenium Development Goals approaches, the UN is carrying out the post-2015 planning process to arrive at a new set of targets for coming years. Global Pulse is supporting this with sentiment analysis, to help understand needs based on different types of public discussion. This page visualises topics being discussed by young Ugandans, using messages from U-report, a citizen reporting service run by UNICEF.
U-report is a free SMS-based system run by UNICEF that allows young Ugandans to speak out on what's happening in communities across the country. We analysed of 3.1 million U-report messages for three years up to April 2014. There are currently over 282,000 U-reporters, from both urban and rural locations—the map in the next section shows the overall number of messages received per district.
The age range of U-reporters is shown in the chart below, as compared to the estimated distribution of ages for the overall population of Uganda (source), showing that opinions are sampled mainly from a young adult demographic.
Most U-reporters are male (63%), and male U-reporters send more messages on average than females (accounting for 72% of the total messages received).
We first look at which topics there is most discussion about, and the context in which these topics are being mentioned. Healthcare, education and jobs dominate the discussion, partly reflecting the youthful demographic of U-reporters. These are consistently the most discussed issues across the country, though there are regional differences—for example, the frequency of discussion about healthcare tends to be slightly higher in rural areas such as Karamoja, whereas jobs are discussed slightly more in urban areas such as Kampala and Mbarara.
We can take each of these categories and investigate the differences between genders. We find that men are more likely to report on governance/corruption and disease outbreaks. Women are more likely to report about medical services related to children and childbirth, as well as particular educational issues. The following chart shows the categories with the most significant female and male biases respectively.
Categories analysis: Data preprocessing was done by stripping punctuation, URLs, and expanding common abbreviations. Spelling mistakes for keywords of interest was done by looking for terms with a small Levenstein distance to a set of reference terms (e.g. medicine, affordable, malnutrition). Common names were then removed using a list of around 8500 Ugandan names compiled from the Makerere University graduation lists from recent years.
We constructed a taxonomy of categories, specifying the logic by which each could be matched against a message. This was done by carrying out iterations of (1) adding all terms thought to be relevant, and (2) assessing samples of the matching messages to ensure that precision was acceptably high, and adding further filtering logic where necessary (for example, messages mentioning diseases should not match the "healthcare" category if they also mention crops or livestock).
In order to find informative example messages for each category and subcategory, a Naive Bayes classifier was trained to distinguish between three priority levels: urgent/actionable (e.g. "A 15 years old boy has died measles in my village", "In my area there is no safe water. we use stream as our drinking water"), normal priority (e.g. "Unemployment is a significant rise in crimes among youths in uganda"), and irrelevant (e.g. "Happy christmas and new year"). We then apply the classifier to all messages in order to derive a priority score; example messages are then selected from those matching a category with the highest priority scores.
Gender differences: We calculated Wilson score confidence intervals for each category (using the number of messages in that category sent by males and by females). We then subtract the mean to obtain the gender difference score as shown in the plot.