Data | top-FIBers

Last modified: 2023-05-23

This project utilizes two data sources:

The Observatory on Social Media’s (OSoMe) Decahose infrastructure (soon to be killed, thanks to that big jerk Elon Musk)
CrowdTangle (CT) via their API

Each month, we calculate FIB indices based on the previous three months of data
These indices are utilized to identify and display the top 50 superspreaders and their posting activity

Tweets are in V1 format
Tweets contain at least one low-credibility domain, as defined by the latest iffy.news list.
- We only consider sources that have a “low” or “very-low” MBFC Factual score
  - See Media Bias/Fact Check Methodology for details
During the monthly pipeline, decahose data is generated by moe and then saved here:
```
  lisa.luddy.indiana.edu:/home/data/osome_swap/moe/jobs/top_fibers_data
```
It is then immediately copied to the lenny project directory here:
```
  /home/data/apps/topfibers/moe_twitter_data
```
Next it is processed by the rest of the pipeline

We utilize the posts/search endpoint with elevated access (up to 10k posts per request).
- This page also contains information on the format of posts returned by CT
- The script that downloads data is: scripts/data_collection/crowdtangle_dl_fb_links.py
The API key can be found on the lenny machine saved here:
```
  /u/truthy/.top_fib_CT_setup
```
- The API key can also be accessed by logging into https://crowdtangle.com/ > selecting the “Top-FIBers” dashboard > clicking the “gear” icon in the top right > selecting the “API access” dropdown. This will display the current API key and also allow you to generate a new one.
Details on the format of Facebook posts data can be found here.