Trump v Biden 2020 election: Web scraping for media sentiment

Figure 1. Media sentiment towards Trump & Biden since September 2020

[UPDATE: Charts and dataset are being updated automatically and are available for download, links below.]

Web scraping news stories reveals interesting trends about how the media cover the 2020 election campaigns of Donald Trump and Joe Biden.  I used Import.io to access and analyze more than 100,000 news stories from the websites of 1,500 US news organizations in order to compare the media coverage of Trump and Biden in the run up to the 2020 US presidential election.

The web data story

During the September 29th, 2020 Presidential debate held in Cleveland, OH, Donald Trump said to Joe Biden:

“They give you good press, they give me bad press because that’s the way it is, unfortunately”

Is this a true statement?  Kind of.  I found that Trump does get consistently more negative press than Biden – but he also gets more than 5x the volume.  The Trump media strategy seems to be best expressed by the old saying “there is no such thing as bad publicity”. 

Events, dear boy, events

Figure 1. Media sentiment towards Trump & Biden since September 2020

Both candidates’ media sentiment is in the “negative neutral” territory and has moved up and down in response to events.  The presidential debate, for example, negatively impacted both candidates’ media sentiment scores.  Trump’s COVID-19 diagnosis caused media sentiment towards him to turn more positive, raising his daily media sentiment score above Biden for the longest continuous period since I started measuring at the beginning of September.  But his earlier than expected return to the White House reversed those gains.

Trump has more bad days, than good days

Figure 2. Trump v Biden media sentiment range since September 2020

Overall, media sentiment towards Trump is lower than media sentiment towards Biden.  In addition, media sentiment towards Trump swings more dramatically across a wider range than media sentiment towards Biden. 

Trump gets much more coverage than Biden

Figure 3. News stories about Trump, Biden or both since September 2020

Since September Trump has received more than double the media coverage than Biden (5x more if you compare stories just about Trump vs stories just about Biden). 

The longest week

Figure 4. News coverage of the 2020 election surged 1.7x first week of Oct

Did you feel like you experienced a year’s worth of news in the last week of September / first week of October?  Turns out that in terms of the volume of election news stories, it was only 1.7x more than the average of the previous 4 weeks.  Still…it felt like a lot.  

Methodology

Since the beginning of September I have used Import.io to collect 71,252 news stories, mentioning either Trump, Biden, or both from the websites of 2,135 English-language news sources.  Each news website was classified according to the primary country of the audience.  Articles from non-US news websites and US link aggregators (e.g. Reddit etc.) were excluded, leaving 49,682 news stories from 1,571 US news sources for sentiment analysis. Duplicate articles, identified by both URL and headline+snippet, were removed.  

Sentiment analysis

Entity-level sentiment analysis was performed on every one of the 49,682 headline+snippet combinations using Google’s Natural Language API.  Entity-level sentiment analysis first identifies entities from the text and then calculates a sentiment score for each entity.  That sentiment score rates how positively or negatively the entity is talked about in the news story based on an analysis of the language and scored as a decimal number on a range from +1 (positive sentiment) to -1 (negative sentiment).  For example, here is a headline and snippet combination that was positive for Joe Biden and negative for Donald Trump,

September 3rd, 2020: “Former Michigan governor Rick Snyder: I am a Republican vote for Biden. Donald Trump is a bully who lacks a moral compass. Joe Biden would bring back civility. Forty-four years ago, I celebrated my 18th birthday at the 1976 Republican National Convention as part of Gera…”

https://www.usatoday.com/story/opinion/2020/09/03/rick-snyder-why-im-voting-joe-biden-even-republican-column/5696508002/

Illustration 1. Entity-level sentiment analysis

You can see that in this news story, both Donald Trump and Joe Biden were identified.  Google’s Natural Language service judged that sentiment towards Trump was very negative (“is a bully who lacks a moral compass”), while sentiment towards Joe Biden was more neutral positive (“would bring back civility”).  

The Google Natural Language service returns a Wikipedia URL as entity metadata for each entity that it positively identifies with a high level of confidence.  Entities and their associated sentiment scores were only included for analysis where the entity Wikipedia URL was either https://en.wikipedia.org/wiki/Donald_Trump or https://en.wikipedia.org/wiki/Joe_Biden. This was in order to exclude sentiment scores for different entities with similar names to the two candidates, for example, I did not want to include sentiment scores for the Trump Organization https://en.wikipedia.org/wiki/The_Trump_Organization

Selection of sources

The selection of news sources was blind: I did not have a human sit down and choose news sources to use.  Such a deliberate selection of news sources would have inevitably introduced bias that would have been difficult to control for.  Instead, I monitored social media and news aggregators for the stories that people share and then as news stories appeared I included the news source into our catalogue of news sources to be searched for stories about Trump and Biden.  These are the news stories that the electorate actually see – on social media, as alerts on their phone, as talking points on the television news – taken in aggregate I believe that this dataset and the media sentiment scores calculated from it represents a good estimate of the media coverage and media sentiment of the two candidates going into the 2020 election.  

Individual news sources appear to have their own partisan biases reflected in the candidate sentiment scores that I calculated.  If you didn’t know anything about The New York Times, Fox News or Bloomberg, you might be able to guess their preferred candidate just by inspecting the candidate sentiment scores,

Figure 5. Sentiment distribution of news stories from New York Times, Fox News, Bloomberg

Dataset

A sample of the final analyzed dataset can be seen in the screenshot below, and can be downloaded from the following links in Excel and Parquet formats

Illustration 2. Sample of news stories from dataset

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s