Last modified: 2023-05-23

Data sources

This project utilizes two data sources:

  • The Observatory on Social Media’s (OSoMe) Decahose infrastructure (soon to be killed, thanks to that big jerk Elon Musk)
  • CrowdTangle (CT) via their API

Data details

Both platforms

  • Each month, we calculate FIB indices based on the previous three months of data
  • These indices are utilized to identify and display the top 50 superspreaders and their posting activity

Twitter

  • Tweets are in V1 format
  • Tweets contain at least one low-credibility domain, as defined by the latest iffy.news list.
  • During the monthly pipeline, decahose data is generated by moe and then saved here:
      lisa.luddy.indiana.edu:/home/data/osome_swap/moe/jobs/top_fibers_data
    
  • It is then immediately copied to the lenny project directory here:
      /home/data/apps/topfibers/moe_twitter_data
    
  • Next it is processed by the rest of the pipeline

Facebook

  • We utilize the posts/search endpoint with elevated access (up to 10k posts per request).
  • The API key can be found on the lenny machine saved here:
      /u/truthy/.top_fib_CT_setup
    
    • The API key can also be accessed by logging into https://crowdtangle.com/ > selecting the “Top-FIBers” dashboard > clicking the “gear” icon in the top right > selecting the “API access” dropdown. This will display the current API key and also allow you to generate a new one.
  • Details on the format of Facebook posts data can be found here.