top of page
Search
Writer's pictureJin Pu

Data For Good: How to leverage data analysis in the social sector

Updated: Aug 29, 2020

We’ve made it quick and convenient for you to manage your blog from anywhere. In this blog post we’ll share the ways you can post to your Wix Blog. By tracking online footprints on Google Trends and Twitter, we reflected on the 2020 momentum of the Black Lives Matter movement.


This Summer, I was selected into the Data Strategy Program hosted by ParsonsTKO and TechSoup where I met a bunch of amazing peers who share the same interests in the social sector as me. It didn’t take us long to decide to start a data-driven project of analyzing the 2020 momentum of the Black Lives Matter movement. To better showcase our work, we built a website (blmalways.org), which we hope functions as an open space to showcase research and data around this social issue, as well as a resource hub to support efforts working to address racial inequality.


I’ll introduce 2 main methods that we use - Google trends and Twitter Analysis - in order to spread knowledge especially in the social sector and provide examples on how to apply them.



Google Trends: Identify Movement Lifecycle

“Google Trends is a website by Google that analyzes the popularity of top search queries in Google Search across various regions and languages. Google Trends provides access to a largely unfiltered sample of actual search requests made to Google.” (Wikipedia)

Google Trends is built based on Google’s large actual search requests. It collects the true voices and aggregates them into the search index. Its volume and truthfulness make it one of the best practices/tools in social media analysis. Also, Google Trends is user friendly. Users with no technical background are able to leverage it as an analytical tool and get rich insights. Users with technical backgrounds can also access it through API which can be more efficient.


Search index in Google Trends is a normalized number reflecting the popularity of search requests. Index is scaled from 0 to 100, with 100 being the highest frequency within a time range.



We extracted a time slice from May 25th to June 25th, 2020. During this period, the life-cycle of the Black Lives Matter movement went through the “growing” stage (May 25th ~ June 1st), the “peak” stage (June 2nd ~ June 7th), and the “aging” stage (June 7th ~ Now) within a month.


Unsurprisingly, the popularity dropped down after several weeks. However, the index hasn’t been decreased to 0 but rather was getting stable at a level. Some are staying even in the aging stage of the momentum!


The life-cycle study guided us to more questions: Which states were involved in the early stage and which came later? Has the sentiment changed from stage to stage? Has the topic changed from stage to stage? Who is staying in the aging stage?



Google Trends: Geographic dynamics




Attention: DC is not shown on the map but DC was among the most active. Another limitation is that a higher value means a higher proportion of all queries in that state, thus states with a smaller population can be easier to have higher index scores.



Twitter Analysis: Topics Migration

Twitter is one of the largest social media platforms and provides access to its data through API. We leveraged Natural Language Processing to collect users’ true voices on Twitter. NLP has rich domains ranging from sentiment analysis to topic analysis to language modeling. For our project, we only touched on simple applications of NLP: sentiment analysis, word cloud visualizations, topic analysis.


In NLP, there is a domain of topic modeling. However, our method is to use hashtags as an indicator to classify topics. There are 2 advantages — First, it is simpler and there are no complicated models involved (user friendly). Second, hashtags are the essence/highlight/emphasis of a tweet (fewer words but rich content) which makes hashtags an excellent indicator in identifying topics.



Rich insights can be drawn: In the beginning, Twitter users talked the most about the murder of George Floyd. At around May 30, protests were trending up and led this momentum into another level. A week after, Breonna Taylor’s murder gained online traction. Around the same time, #defundthepolice stepped into our chart for the first time and stayed.

Later on, many unrelated hashtags were trending. People started to be distracted by other topics. Nevertheless, #defundthepolice was stable within the list of top 10 trending hashtags throughout the rest of June. At the end of June, #georgefloyd was no longer the top related hashtags to BLM, but #defundthepolice was.


This bar chart race graph gave us a sense of how the topics evolved from a single murder case into a complicated dialogic system and finally evolved into long-term actions. Back to the question — who is staying in the aging stage? Our answer might be: those who are going to devote themselves to long-term actions.



Twitter Analysis: Sentiment trend

Sentiment analyzer was built with Flair using GloVe Word Embedding & BERT pre-trained model and is retrained with data sets from SemEval. The accuracy of trained Flair is about 70% on the test set.



Most of the tweets were negative, especially in the first couple of days following George Floyd’s murder. Later on, as the movement transitioned into its peak stage, sentiments became more stable. Still, 60% of tweets were classified into the “negative” category. (The only major exception to this trend was seen on Juneteenth.)

More analyses on specific topics — donate, protest and police — are showcased in the blmalways.org.


Into the future:

Most of our analyses are merely descriptive analysis. However, simple data-driven methods can also provide rich insights and make an impact. Though data analysis has been prevalent in Business, it is just the beginning in the social sector. Luckily, we do have some organizations/data scientists that are spreading their data knowledge and skillsets. My team and I are trying to be one of those people, and we hope to see more data analysis applied in the social sector in the future. Data for good, for sure.


Also, we are welcome to all sorts of comments, feedback, and communications. If you are interested, please check out our website (blmalways.org) or contact us at blmalways.team@gmail.com.


This article was originally posted on August 12, 2020 on Medium. View it here.

Recent Posts

See All

Comments


Commenting has been turned off.
bottom of page