Data
Last modified: 2023-05-23
Data sources
This project utilizes two data sources:
- The Observatory on Social Media’s (OSoMe) Decahose infrastructure (soon to be killed, thanks to that big jerk Elon Musk)
- CrowdTangle (CT) via their API
Data details
Both platforms
- Each month, we calculate FIB indices based on the previous three months of data
- These indices are utilized to identify and display the top 50 superspreaders and their posting activity
- Tweets are in V1 format
- Tweets contain at least one low-credibility domain, as defined by the latest iffy.news list.
- We only consider sources that have a “low” or “very-low” MBFC Factual score
- See Media Bias/Fact Check Methodology for details
- We only consider sources that have a “low” or “very-low” MBFC Factual score
- During the monthly pipeline, decahose data is generated by
moeand then saved here:lisa.luddy.indiana.edu:/home/data/osome_swap/moe/jobs/top_fibers_data - It is then immediately copied to the
lennyproject directory here:/home/data/apps/topfibers/moe_twitter_data - Next it is processed by the rest of the pipeline
- We utilize the
posts/searchendpoint with elevated access (up to 10k posts per request).- This page also contains information on the format of posts returned by CT
- The script that downloads data is:
scripts/data_collection/crowdtangle_dl_fb_links.py
- The API key can be found on the
lennymachine saved here:/u/truthy/.top_fib_CT_setup- The API key can also be accessed by logging into https://crowdtangle.com/ > selecting the “Top-FIBers” dashboard > clicking the “gear” icon in the top right > selecting the “API access” dropdown. This will display the current API key and also allow you to generate a new one.
- Details on the format of Facebook posts data can be found here.