Frameworks For Decision Making
Changes from the original copy are highlighted
in bold red.
Last Update: Jan 7, 2003
Book Stuff
-
PEAS
-
Environment Characteristics (fully observable vs. partially observable,
etc.)
Decision Theory Framework
-
The set S contains the set of possible states of nature. These states
are the conditions that exist in the world that determine what happens
when an agent acts.
-
The set A contains the set of possible actions available to the
agent. An agent can only choose an action if it thinks it can perform
the action.
-
The set C contains the set of consequences that result when the
world is in state s and the agent chooses action a.
I will try to use the notation C to denote the set of things that
can occur when an agent acts, but I will also use a lower case c(a,s)
to denote the function that maps state-action pairs to consequences.
Note that the set of consequences often includes a future state of the
world.
-
The set G contains the set of possible agent goals. Although
agents can adopt multiple goals, we often ignore this component since the
goal is implicitly included in the problem description.
-
The set U contains the set of possible preferences held by the agent.
We use the letter "U" because we almost always encode preferences as utility
functions. A utility is a function that maps a consequence and a
goal to the real numbers via preferences.
These decision components are fit together in the following figure.
Note how consequences are produced when actions and states are combined.
In essence, when an agent chooses action a from the set A
and nature chooses state s from the set S then a consequence
c
from the set C is produced. After this consequence is produced,
the agent can evaluate how much it liked the consequence.
Note further that when we omit goal dependence, the utility function is
a mapping from consequences to the real line u: U --> R.
However, we often use the shorthand notation that we can obtain via the
composition of the c function and the u function which gives
u'(a,s) = u ° c(a,s).
We can thus treat utility as a mapping from the state-action pair to the
real line without loss of generality. This is especially convenient
in sequential decision problems. A few examples will help illustrate
these ideas.
Example 1: Movement in a dynamic world
Consider a problem of trying to steer a spacecraft from some point
in space to a docking station. The relevant states in the world are
the [x,y,z] positions of the spacecraft in 3 dimensions. An
action is a triple of forces in the [x,y,z] directions, which translate
into accelerations in each of these directions. The consequences
that are produced are new [x,y,z] positions. The goal is to
make [x,y,z] approach zero in a smooth way (where we have assumed
that the location of the docking station is at the origin). We prefer
consequences that are close to the docking station and that are far from
collisions with things in space. We can encode
the utility of each decision via a distance metric where small distances
are preferred.
Example 2: Estimating the true state of the world
Another problem of interest is to try to figure out what the true state
of the world is. For this problem, the action is nothing more than
our best guess (or estimate) of the world's state. We can denote
this action as a=sg (where
the subscript g indicates that our action is a guess about the state
of the world). For each possible state of the world, we can either
get the guess right or wrong, so consequences are error or being correct.
Our goal is to be correct, and a utility function can be 1 if we get the
guess correct and 0 if we miss it.
Decisions in a Context
This formalism is worthless unless we put it in a context. This context
is the real world. When an agent exists in the real world and must
solve problems in this world, we refer to
the agent as situated in the world. Our objective in artificial
intelligence is to translate something we sense into a choice: agents map
X into A. We can do this in may ways, but the standard approach is
to take a goal, consider possible actions and the consequences they might
produce, and then make a choice of which action is most likely to produce
a consequence that will bring about a desired goal.
For situated agents, it helps to segment the decision elements discussed
above into three sub components:
-
Sensory Perception: this processes what is sensed into a belief about likely
states. Thus, we introduce the set B of beliefs about the
world. This set is usually encoded into a probability or set of probabilities,
and we often assume that it follows Bayes rule for updating beliefs.
-
World Model: this consists of what we know and believe about the world.
Note that it includes the set of beliefs produced by the sensory-perception
module.
-
Decision Maker: this processes consequences into utilities, and utilities
plus beliefs into decisions. The de facto model of decision
making is the principal of maximizing expected utility. The module
labeled D is a function that takes utilities and beliefs and spits
out a choice (from the set of actions A).
We should note that purely deductive systems (predicate logic and first
order logic) can be squished into this framework. One way to squish
these things is to let X be percepts in the world, let S
be internal states, restrict beliefs in B to be either 0 or 1 (corresponding
to predicates that return true and false, respectively), and let U
be a set of deduced actions (that are encoded into our rule base via a
system designer who has goals in mind). A rule base is then invoked
to say how actions produce consequences; this means that the decision maker
must deduce which consequences will be produced by which actions or, alternatively,
to deduce a choice by observing the world and then deducing which action
is compatible with the world (an implicit model of the world is often found
in the rule base). Deduction, in this context, is the process of
taking the inputs, making internal inferences efficiently, and applying
the correct action to the world whence D, the decision function,
is tantamount to the deduction operation.
Example 1 revisited: Movement in a dynamic world
In our spacecraft, we have sensors that return inertial navigation
information. We translate this information into beliefs about the
state of the world. We then choose a path that minimizes our cost
to reach the goal, subject to constraints that exist on spacecraft and
space station dynamics.
Example 2 revisited: Estimating conditions of the world
In our estimation problem, we make an observation x about the
world, translate this observation into a belief about a particular state,
and then choose (for example) the estimate sg that maximizes
the resulting probability function.
Uncertainty
Uncertainty can enter the world in several places.
-
We can be uncertain about the true state of the world. Such worlds
are called inaccessible, and our uncertainty is usually encoded
in our beliefs about the state of nature.
-
We can be uncertain about how our actions will produce consequences.
Such worlds are called nondeterministic, and we usually treat them
as if the uncertainty came from inaccessibility rather than from nondeterminism.
Sometimes, such as in sequential decision problems, uncertainty about future
consequences is produced because we can't see the distant future.
-
We can be uncertain about our values. This is rarely addressed in
AI, but is prevalent in studies of human choice.
-
We can be uncertain about what we will choose. I'm thinking about
sequential decision problems where we are uncertain about how we will make
future decisions, but it also applies to other situations (like when an
agent uses a mixed strategy in a game theoretic problem).