Content Moderation
Kainan Jarrette and Nina Kotova
Content Moderation
Learning Objectives
- Define content moderation.
- Define the types of content moderation.
- Discuss the types of actions moderators can take.
- Identify the drawbacks of content moderation.
Introduction
Content moderation is the process (Lo, 2020) of monitoring, evaluating, and possibly taking action against content that violates the stated goals or policies of an organization or platform where that content is being hosted. Although in the most technical sense this could apply to any organization that deals with information, content moderation is largely a process born of the internet age, and applied to things like social media platforms (such as X, Meta, TikTok) and forums (such as Reddit).
An organization typically moderates the content on their own platform, on their own authority, based on their own criteria. Although there may be external influences (such as the government, other organizations, or a platform’s user base), in most countries those influences have no actual authority to moderate other organizations’ content. There are some countries, however, where the government does exercise direct control over content, usually by censoring or restricting access (Bischoff, 2024).
Moderation can be conducted in a variety of ways, especially when it comes to what “taking action” means for a particular platform. When used consistently, equitably, and efficiently, content moderation can be invaluable in helping to fight the spread of misinformation.
Moderation Procedures
Section 1.1: Types of Content Moderation
As detailed by Zeuthen (2024), there are five main forms of content moderation: manual pre-moderation, manual post-moderation, reactive moderation, distributed moderation, and automated moderation.
Manual pre-moderation occurs before content has been published on a platform. This involves every single piece of content being reviewed by a moderator before being published (or, if in violation of the rules, rejected). The biggest benefit to this type of moderation is that it can prevent misinformation from ever being spread in the first place. Unfortunately, the primary drawback to this method stems from the publishing delay. If a user submits content (like a post), it’s not immediately available to other users. How long it’s unavailable is a product of what moderation tools are being used and the volume of content to be moderated. Manual pre-moderation requires a good deal of resources, making it one of the costliest types of moderation. Further, there’s a competitive drawback to having any length of publishing delay. Social media has built itself, at least in part, around the idea of instantaneous communication. In that type of environment, nobody wants to be the slowest game in town.
Manual post-moderation still involves moderators reviewing every piece of content, but only after they’ve been published. This eliminates the publishing delay, but still allows the content time to potentially be absorbed and spread.
Reactive moderation also occurs after content has been published, but relies on users to “flag” content as potentially violating the rules. All content is published, on only if and when a piece of content is flagged will it be reviewed by a moderator. Not only does this method eliminate the publishing delay, but it also requires significantly less resources, as flagged content is only a very small fraction of all content. It also engenders the sense of community most platforms strive for, while also allowing the platform to not bear sole responsibility for moderation. Like with manual post-moderation, though, it allows misinformation time to be viewed and shared.
Distributed moderation leaves moderation entirely in the hands of the user. Here, things like voting systems allow users to up or down vote content, with more votes making content more visible and less votes making content potentially be hidden or deleted. As seemingly democratic as this process may be, it comes with a few key drawbacks. It relies solely on the average critical thinking skills of a group, which may not always be high. Additionally, it can quickly create the very echo-chambers that exacerbate misinformation. Ultimately, a voting system ends up being about preference, as opposed to truthfulness. It works very well for content openly presented as opinion, but not very well for content presented as fact.
Lastly, automated moderation involves using automated tools to review content, as opposed to physical moderators. Automated moderation can be anything from keyword filters to Ai driven algorithms. As Ai becomes more powerful, this is an increasingly appealing option for social media platforms. It can be done before or after publishing, often without any noticeable publishing delay, and requires much less resources than physical moderators. This type of moderation suffers from all the same problems Ai does, including mistakes and biases. However, while it may not be able to do the entire job of moderation, it can still be massively helpful in catching large chunks of misinformation, as well as determining what content needs review by a physical moderator.
Section 1.2: Moderation Procedures
While there’s currently no official rules or regulations about moderation, there are some commonly adopted procedural aspects. Theoretically, content is reviewed based on a set of rules that all users of the platform can access. Although a certain level of human judgment is usually present, the general agreement between most platforms and their users is that content isn’t moderated arbitrarily or inconsistently. Most platforms also inform the user that submitted the content about any moderation decisions, as well as offer some form of an appeals process.
Outside of these basic general principles, though, moderation can differ across platform. Each platform has its own community standards, or set of rules and guidelines about what constitutes a violation, as well as what the potential consequences of those violations might be. Platforms may also use different types of moderation methods (like the ones talked about in the previous section).
Platform Moderation: A Study in Differences
Facebook has community standards prohibiting content that misrepresents excessive or unproved medical information. It also restricts coordinated manipulative influences that involve large-scale information operations. Facebook is trying to reducing the spread of disinformation using algorithms and is also actively working on moderating manipulated media. Facebook partners with third-party organizations whose knowledge and expertise are used to assess the validity of the content.
X, formerly knowns as Twitter, has changed drastically since its purchase by Elon Musk in 2022, and may continue to change. At the time of this writing, X has several requirements for handling inappropriate content and misinformation. X continues to recommend authoritative sources that develop critical thinking. The aim is to inform and contextualize by sharing credible content from third-party sources. X also has an extended policy on the spread of manipulated media. X labels the messages – labeled tweets are subject to reduced visibility. When you try to share a tweet tagged for violating one of our policies, you will see a prompt to help you find additional context.
Section 1.3: Moderation Actions
Although always adapting with the platforms themselves, there are some basic and traditionally used moderation actions when content is found to be in violation of the rules of a platform:
- Content labeling – The content isn’t removed, but a label is added to the content to either recommend caution to users or to add relevant contextual information (examples: putting “this content may contain inaccurate information about Covid-19,” hiding explicit content behind a warning for users).
- Content modification – The content isn’t removed, but is modified to obscure or remove those parts of the content that violate the rules (examples: censoring a word with asterisks, blurring part of a photo).
- Content removal – The content is completely removed from the platform.
- Account suspension/ban: The content is completely removed for the platform and the user who posted the content is either suspended or banned from the platform for a length of time.
Typically, the general rules and guidelines of the platform dictate in what circumstances of content violation these actions are taken. Of all these actions, labeling has been particularly popular in recent years.
Content Labels
Section 2.1: Types of Labeling
When a moderator decides a piece of content should be labeled, there are two basic types of labeling that can happen:
Recommendation labels assert claims to the user about if and/how the labeled content should be absorbed. This can include questioning context and validity. For example, a post might get a label that says “This post may contain information taken out of context.”
Information labels provide clear and specific information to fill in gaps in the labeled content, or provide context. For example, a post that states “Slavery wasn’t even so bad” might get a label that said “Over the lifetime of the international slave trade, over 1.8. million people did not even survive the journey through the middle passage.”
Hybrid labels are a mix or both recommendation and information labels. They will usually involve giving a recommendation that includes a link to external sources of information.
Section 2.2: Labeling Principles
As in life, not all labels are created equal. There are a couple key principles that help make a label as effective as possible:
- A label should draw more attention to itself than the content it’s labeling.
- The label should be the first thing the user absorbs, before the content is viewed. This can be achieved in ways like animating or highlighting the label, or dimming the content relative to the label.
- A label should disrupt the picture superiority of the content it’s labeling.
- Picture superiority is a phenomenon were visual information is remembered more easily than information that has to be read or heard. Thus, a label should always be placed on or over any images that are a part of the labeled content.
- A label should refrain from value judgments.
- Labeling a post already contains an inherent element of stigmatization, but that shouldn’t be exacerbated by language that makes value judgments about the poster or subject(s) of the content.
- A label should encourage critical thinking and skepticism.
- This is helped by having some form of user interaction with the label, as well as access to additional (and accurate) information about the subject. Hybrid labels are particularly well suited for this.
Moderation Limitations and Challenges
Section 3.1: Freedom and Control
One of the major issues with content moderation is that it directly intersects with free speech philosophies. Many feel that even lies should be protected speech, and in many ways that belief has been shared by the courts (Congressional Research Service, 2022). Removing content, then, becomes much more difficult and controversial than actions like labeling. At the same time, labeling may not be as effective of a deterrent (Wasike, 2023) as more permanent and isolating solutions like account bans.
Moderation is also inherently information control, which understandably makes some people uneasy (Aslett & Guess, 2022). Human moderation will always contain some degree of subjectivity that allows for personal bias (which also seems to be baked into Ai driven automated moderation). The fear is that bias could make information sharing incredibly a-symmetrical, especially for groups already prone to being treated unfairly.
Ultimately, faith in the concept of moderation will only ever be as strong as the faith in the institutions doing the moderating. When it comes to major social media organizations, that faith isn’t particularly high. But steps like transparency and consistency in moderation can help build that trust back up.
Conclusion
The balance between moderation and free speech ideals is likely to be an ever evolving dialog, and one of increasing importance moving forward. But just because moderation can’t be used absolutely doesn’t mean it’s not worth using at all. When practiced properly, moderation is an effective tool for slowing down the spread of misinformation, and sometimes even stopping it from being published in the first place. Moderation actions like labeling can help enhance the critical thinking skills of users, as well as make them more skilled at identifying reliable sources. Additionally, enforcing account bans can also be an effective deterrent against active participation on the part of the user. What’s important is that an online community or organization take some amount of responsibility for what happens within and as a result of that community. That responsibility takes the form of moderation.
Key Terms
moderation using automated tools to review content, as opposed to physical moderators
rules that guide content moderation and govern what is acceptable to post on a specific social media platform
a moderation action where content isn’t removed, but a label is added to either recommend caution to users or to add relevant contextual information
the process of monitoring, evaluating, and possibly taking action against content that violates the stated goals or policies of an organization or platform
distributed moderation
moderation that happens entirely by the user base, typically through a voting system (where more votes increases visibility, and less votes decreases visibility, potentially threatening deletion)
a type of content label that combines aspects of recommendation labels and information labels (usually a recommendation is given that includes a link to related external information sources)
a type of content label that provides clear and specific information to fill in gaps in the labeled content, or provide context
content moderation that happens before content is published
content moderation where moderators actively search through published content looking for violations of their rules
the psychological phenomena where visual information is more easily remembered than read or heard information
content moderation that where moderators only review content after it is published, usually by allowing users a way to “flag” content they feel may be in violation of the rules
a type of content label that assert claims to the user about if and/how the labeled content should be absorbed
a decision on whether something is good or bad based on personal feelings
References
Arnaudo, D., Bradshow, S., Ooi, H. H., Schwalbe, K., Studdart, A., Zakem, V., & Zink, A. (2021). Combating Information Manipulation: A Playbook for Elections and Beyond. https://www.iri.org/resources/combating-information-manipulation-a-playbook-for-elections-and-beyond/
Aslett, K., Guess, A. M., Bonneau, R., Nagler, J., & Tucker, J. A. (2022). News credibility labels have limited average effects on news diet quality and fail to reduce misperceptions. Science Advances, 8(18). https://doi.org/DOI: 10.1126/sciadv.abl3844
Bateman, J., Jackson, D. (2024). Countering Disinformation Effectively: An Evidence-Based Policy Guide. https://carnegieendowment.org/research/2024/01/countering-disinformation-effectively-an-evidence-based-policy-guide?lang=en
Bischoff, P. (2024) Internet Censorship 2024: A Map of Internet Censorship and Restrictions. Comparitech. https://www.comparitech.com/blog/vpn-privacy/internet-censorship-map/
Born, K. (2020). Understanding Disinformation Solutions Landscape. https://hewlett.org/library/understanding-disinformation-solutions-landscape/
Congressional Research Service. (2022). False Speech and the First Amendment: Constitutional Limits on Regulating Misinformation. https://crsreports.congress.gov/product/pdf/IF/IF12180
Lo, K. (2020). Toolkit for Civil Society and Moderation Inventory. https://meedan.com/post/toolkit-for-civil-society-and-moderation-inventory
Ofcom. (2019). Use of AI in online content moderation (2019 Report Produced on Behalf of Ofcom). https://www.ofcom.org.uk/research-and-data/online-research/online-content-moderation
USAID, & CEPPS. (2023). Countering disinformation guide. https://counteringdisinformation.org/topics/platforms/1-interventions-and-responses-limit-or-curtail-disinformation-and-misinformation
Wasike, B. (2023). You’ve been fact-checked! Examining the effectiveness of social media fact-checking against the spread of misinformation. Elsevier Telematics and Informatics Reports, 11. https://doi.org/10.1016/j.teler.2023.100090
Zeuthen, S. (2024, May 8). 5 Content Moderation Methods You Should Understand. Besedo. https://besedo.com/blog/5-moderation-methods-you-should-understand/
Media Attributions
- Private: Content
- Private: 397px-Traffic_light_red_and_yellow_Drammen_(3) © Peulle is licensed under a CC0 (Creative Commons Zero) license
- Facebook_logo
- Twitter_Logo
- reddit_logo(new)
- tiktok_logo
the process of monitoring, evaluating, and possibly taking action against content that violates the stated goals or policies of an organization or platform
content moderation that happens before content is published
The time between when content is sent to be published and when it actually is published.
content moderation where moderators actively search through published content looking for violations of their rules
content moderation that happens after content is published, usually by allowing users a way to "flag" content they feel may be in violation of the rules, that will then be reviewed by a moderator
Moderation that happens entirely by the user base, typically through a voting system (where more votes increases visibility, and less votes decreases visibility, potentially threatening deletion).
moderation done with automated tools (e.g. filters, Ai) as opposed to physical moderators
rules that guide content moderation and govern what is acceptable to post on a specific social media platform
a moderation action where content isn't removed, but instead a label is added to the content to either recommend caution to users or to add relevant contextual information
a type of content label that assert claims to the user about if and/how the labeled content should be absorbed
a type of content label that provides clear and specific information to fill in gaps in/provide context for the labeled content
a type of content label that combines aspects of recommendation labels and information labels (e.g. a recommendation is given that includes a link to related external information sources)
the psychological phenomena where visual information is more easily remembered than read or heard information
a decision on whether something is good or bad based on personal feelings
the practice of monitoring content made by users for a specific social media platform for adherence to community standards