What if the population that can never say “no”
were that of AI agents? The rogue AI of science fiction may lead us to believe that this would always be
desirable, but consider what it would actually mean
in practice. Though we expect AI agents to follow our
commands, what if we give them commands that are
in conflict with our own long-term goals or with
accurate knowledge they possess, or that have unethical implications not necessarily known to us? What
if they receive contradictory commands from several
humans? Furthermore, what if an AI agent is expected to be socially intelligent in a more general sense?
Given that the tension between compliance and
noncompliance is perhaps fundamental to human
social behavior (Wenar 1982), can an AI agent be
socially intelligent without the ability to be noncompliant and to reason about noncompliance?
We define rebel agents as AI agents that can reject,
protest against, or develop attitudes of reluctance or
opposition to goals or courses of action assigned to
them by other agents, or to the general behavior or
attitudes of other agents. We use “rebellion” as an
umbrella term covering reluctance, protest, refusal,
rejection of tasks, and similar attitudes or behaviors.
The term was first introduced in a more limited interactive storytelling context (Coman, Gillespie, and
Muñoz-Avila 2015), and later generalized (Coman
and Aha 2017; Coman et al. 2017). In a rebellion
episode, an alter is an agent or a group of agents
against which one rebels, and which is in a position
of power over the rebel agent. The alter could, for
example, be a human operator, a human or synthetic teammate, or a mixed group of human or synthetic agents. The rebel agent is not intended to be permanently adversarial towards the alter(s) or in a
rebelling state by default. Such an agent has potential
for rebellion that may or may not manifest, depending on external and internal conditions.
In the tradition of biologically inspired design and
cognitive plausibility, our exploration of AI rebellion
is inspired by the mechanisms of human rebellion.
First, we ask: for humans, if noncompliance is the
solution, what might be the problem? In other
words: Why do we say “no”?
Our possible motivations include protecting the
health, safety, integrity, and dignity of ourselves and
others, and reacting to perceived injustice. Further
questions come to mind:
How do we decide whether, when, and how to say
“no”? Even though we may have compelling reasons
to oppose others, we do not necessarily do so. Before
venturing an act of rebellion, we may consider
whether we are sufficiently influential or trusted to
afford doing so, what consequences we may incur,
and whether our rebellion can actually succeed in
bringing about the consequences we desire. We may
observe the behavior of potential alters to try to
assess these considerations.
How do we say “no”? We may do so explicitly (for
example, verbally) or implicitly (for example,
through behavior that goes against social norms).
Refusal is not necessarily complete and definite. It
can involve explanation, discussion, elicitation of
further information, and negotiation. We may con-
struct and express narratives that counter those of
the alters and reflect our own perspective of the
What are the further social implications of saying
“no”? Such an act can affect our social standing and
reputation in both positive and negative ways. Often,
we are aware of this and act accordingly. We might
attempt, for example, to “fix” social relationships in
the aftermath of rebellion.
Thus, several characteristics of human rebellion
emerge. There are multiple types of rebellion and
multiple possible motivations for rebellion (some primary, others secondary). Rebellion has several possible stages, including a preliminary stage, a stage of
deliberation, the actual manifestation of rebellion,
and its aftermath. Sociocognitive mechanisms play
essential roles at all stages.
Our AI rebellion framework is inspired by social
psychology and designed to accommodate the variations we mentioned, and many more. This framework is general: it does not assume any particular
agent architecture. We also introduced the term
counternarrative intelligence (Coman and Aha 2017) to refer
to a mechanism that enables rebels to produce,
express, and reason about counternarratives1 that
support and justify rebellion.
Through our proposed AI rebellion framework and
the accompanying discussion, we aim to provide the
core of a common language to be used by researchers
in pursuing the following four goals:
( 1) Developing and implementing AI agents
embodying various facets of rebellion. To this end,
the framework can help identify nonobvious,
human-inspired types and functions of rebellion.
Potential research directions we propose are ( 1) the
development of AI cognitive prostheses that empow-
er humans with low social capital to adopt positive-
ly motivated noncompliant behavior, and ( 2) goal
alignment in mixed human and AI teams through
cycles of noncompliance, negotiation, or agreement
( 2) Studying the rebellion potential and ethical
ramifications of existing and prospective agents, thus
identifying ethically prohibited, ethically acceptable,
and perhaps even ethically obligatory rebellious
behavior. Certain types of rebellion in the framework
may be found to be completely unethical (for example, purely egoistic rebellion is a likely candidate). An
example of an ethics question that the framework
can lead us to ask is whether an AI agent should
always signal to humans that it is considering rebellion, even if it does not end up rebelling. Further ethical issues pertaining to AI rebellion are discussed by
Coman et al. (2017).