Facebook Explains How It Fights Scraping and Gives Numbers in Q1 2021 Report
- Facebook says they cannot wholly eradicate scraping, but they are doing their best.
- The platform finds that phone number enumeration is the most troublesome kind of scraping.
- The transparency report numbers show that the police requests for user data have increased by 10%.
A full month has passed since Facebook’s internal email that suggested framing scraping as a “sector problem” leaked on a Danish media outlet, and the social media giant hopes that everyone has forgotten about it. Hence, it’s time for a post that describes what they’re doing to fight scraping. At the same time, Q1 2021 report results are out, giving us some interesting figures around policy enforcement, tackling piracy, and stopping the circulation of counterfeit items.
Starting with the scraping protection mechanisms, Facebook presents this as a problem which “can never be eliminated entirely.” Still, they have some measures in place to mitigate the risks of it happening.
- Maintain a 100-member ‘External Data Misuse’ team dedicated to detecting, investigating, and blocking scraping.
- Impose rate and data limits to restrict massive scraping operations.
- Collaborate with researchers to find and secure publicly accessible datasets that contain scraped Facebook data.
- Take enforcement action against those who have abused Facebook and violated the relevant policies.
Facebook makes special mention to phone number enumeration because it happens at a different scale, and it’s particularly hard to stop. As the blog piece mentions, after September 2019, the work of scrapers became a lot more difficult due to new improvements that were implemented at that time. Facebook says scrapers still adjust and change their approach to bypass the protection measures, but the platform is constantly updating its defenses to stay ahead of them.
Here are some interesting numbers of the Facebook Q1 2021 transparency report:
- 8.8 million pieces of bullying and harassment content were removed.
- 9.8 million pieces of organized hate content were removed.
- 25.2 million pieces of hate speech content were removed.
- 335,765,018 pieces of suspected counterfeit content were proactively removed (without report).
- 9,822,070 pieces of content that were suspect of copyright violations were proactively removed (without report).
- Another 1,964,414 pieces of content that were reported for copyright violations were removed.
And finally, there are stats about government requests for user data, which increased by 10% in H2 2020, reaching 191,013 in number. The leader in this category is the United States (61,262 requests), followed by India, Germany, France, Brazil, and the UK. Non-disclosure orders (which mean that Facebook was prohibited from notifying the users about the fact) increased to 69%, so in most cases, users weren’t informed about their data being shared with law enforcement authorities.





