CogMedia.|

Note: CogMedia is no longer accumulating news data, in part due to changes in Twitter's API requirements. However we continue to share our initial documentation, data and code below. The CogMedia dataset includes over 1.5 million headlines, many of them tagged with a "rate of sharing" metric obtained by sampling Twitter. All these details remain available to interested researchers. If any resources are used, I welcome a citation to the Dale (2020) preprint, linked below.

Investigating the mind-media connection

Welcome to the Cognition and Media (CogMedia) project, where you'll find aggregation and analysis of newsfeeds from major media headlines. Our research goal is to link cognition in news consumers to large-scale trends in media. Hosted by the Communicative Mind Laboratory in the Department of Communication at UCLA. CogMedia includes a large open database, and you can get lots of data for free through our code initiative, allowing you to import straight into R.

Research approach.

We hypothesize that subtle but measurable cognitive factors are useful in understanding what consumers read and share. These cognitive factors include linguistic measurements, such as accessibility, comprehensibility, and bias. Subtle aspects of human mental processing could help us to understand media data, from the level of individual consumers, to more collective levels, such as the distribution of news themes and the behavior of major newsmedia.

Summary of approach.

A summary of the research approach, and a review connecting communication and cognitive science, can be found here:

Dale, R. (2020, August 6). The CogMedia project: Open data and tools for linking cognitive science and mass media. OSF Preprints. doi: 10.31219/osf.io/z69ta.

Code + data.

Our database contains...
1,570,071
news stories, since mid 2019.

By source

ABC News
44,013
BBC
41,628
Boston Globe
5,931
Boston Herald
84,009
Business Insider
66,063
Chicago Tribune
16,621
CNN
130,214
Daily Beast
42,309
Fortune
28,116
Fox News
31,509
LA Times
39,509
NBC News
61,057
New York Times
110,560
NPR
35,101
NY Post
80,783
Reason.com
8,216
Reuters
232,825
ScienceDaily
5,722
The Atlantic
16,240
The Economist
1,687
The Federalist
19,652
The Independent UK
255,954
The Nation
10,065
USA Today
92,445
Wall Street Journal
16,795
Washington Post
14,675
Washington Times
78,372

CogMedia's core dataset is available in its entirety. A release of this dataset is conducted a few times a year. Click here for the current version. To get you started processing these data, the documentation for the function library is on GitHub here:

https://github.com/racdale/cogmedia

With the function library, you can quickly process CogMedia batches inside R. The result is a data frame over which you can apply your favorite tools (tidyverse, etc.). Illustrations are on the GitHub repository above.

Each story is based only on the RSS feed of the news item. We obey all copyright rules of the news source. However we tag news stories with a variety of information from social media metrics. Each story record includes:

  • source News organization (e.g., New York Times).
  • title Title of the story.
  • description Text associated with story description in RSS.
  • alexa_rank An Alexa rank of the source.
  • partisanship A partisanship score, based on AllSides.com.
  • social_score An approximate rate of Twitter sharing shortly after story release.
  • url Full URL to the story's source.