Multi-Agent Learning Lab

Part 2

Purpose:

The purposes of this lab are:

Description:


Consider the world shown in the figure below.  In the figure, black squares represent the outer boundaries of the world and dark blue squares represent a wall (with two openings).  The red square with the A indicates the starting position of the red agent, and the red square with the letter G represents the goal for the red agent.  A similar encoding applies for the green agent.  Use Q-learning to find a path from the starting positions to the goal for each agent.  The following information will be helpful:

Experiments:

You will conduct several learning experiments using this world.  We will separate these experiments into two broad categories, and you will spend most of your time on the first category.

  1. Cooperative Agents:
    1. Centralized Learning: When agents are disposed toward cooperation and when perfect cooperation can be enforced, then a centralized learning scheme can be used.  I want you to implement a centralized Q-learner.  This scheme is equivalent to only having one learning entity, which has 16 different possible actions.  This means that there is a single Q-function, Q(s,{a_1,a_2}), and the actions of both players are dictated by the arg max of this function; i.e., player 1 does the a_1 portion of the arg max{a_1,a_2} Q(s,{a_1,a_2}), and player 2 does the a_2 portion.  Here are a couple of hints for you.
      • The state space is the tuple (x1,y1,x2,y2).
      • The action space is the pair (a1,a2).
      • Since the world is 8 by 8 and there are four actions, the size of your Q function is 8*8*8*8*4*4 = 65,536.  This is a big array, so consider how to keep memory usage small.
      • You will need to decide how to handle rewards.  One way is to not give a reward until both agents are in their goal states, but sum the penalties when there are penalties.
      • You will probably like to make the goal states absorbing states.  This means that once an agent reaches the goal state, he cannot leave.
      • Make sure you document decisions you make about how to handle multiple agents.
    2. Decentralized Learning: When communication is limited or when no centralized control exists, then a decentralized learning scheme can be used.  I want you to implement a decentralized Q-learner --- one for each agent.  For each agent, the action space is the single action {a1}.  You will have four experiments to report for this section.

I want you to try two learning variants:

        • Concurrent learning: agents move at the same time, and agents update their Q-values at the same time.
        • Staggered Learning: agents move at the same time, but agent 1 updates its Q values for five trials while agent 2 holds its values constant; agent 1 holds its Q values constant for five trials while agent 2 updates its Q values; etc.  A "trial" is defined as one trip from the starting point to the goal.

Try the following states for each agent:

        • (x1,y1): consider only my state (current position), and not the state of the other agent.
        • (x1,y1,x2,y2): consider both my state (current position) and the other agent's state.
  1. Something Else:  In addition to the experiments outlined above, I want you to think of something interesting that you want to know about this world or its agents.  Run some experiments to try and learn something, and report the problem and results to me.  Consider such things as changing goals, starting points, reward structures, state spaces, etc.

How to get there:

  1. Single agent Q-learning code is available from the last lab.  You will need to add in a second agent. 
    1. Types of worlds: There are three worlds that are built into the program, you will find them in the constructor of the world class (world.cpp) One is labeled single-agent world and the others are labeled multi-agent worlds.  You used the single agent world while experimenting with the q-learning variables in the last lab.
    2. You should probably use the 8x8 multi-agent worlds to decrease the time it takes to run experiments.  You will need to run many more iterations than were needed in the single agent experiments.  This is due to the increase in the size of the q-table.
  1. Recommended Roadmap
    1. Experiment with the q-learning single agent code that we’ve provided.

b.   Develop concurrent learning with state (x1, y1).  This is basically running two individual q-learning agents that have no knowledge about the other agent. 

    1. Develop concurrent learning with state (x1,y1,x2,y2).  This is basically running two individual q-learning agents where each agent keeps track of their own position as well as the position of the other agent.
    2. Develop centralized learning.  Use a single agent with a state space that represents the position of both agents and the movement possibilities of both agents. Basically combine all the data into one big q-learner.
    3. Repeat steps b and c with staggered learning.
  1. Important notes 
    1. You will need to spend time with and make changes to the code to get it to work with two agents, as distributed it will perform q-learning with one agent and it will display the start and end positions for that agent.  It has some information about the second agent, but it will not move the second agent.
    2. To have the code show the second agent, you will need to fill in the methods needed, and uncomment certain lines of code in RenderScene.

 

What you'll turn in:

This lab is like the previous labs.  Do the experiments outlined above, report the results, and discuss your results.  I'm much more concerned about your analysis than anything else, but you should also pay attention to your writing.
 
 

Here are some hints.