Skip to content

Sample One Step Look Ahead Controller (2 Player)

Raluca D. Gaina edited this page Feb 12, 2018 · 2 revisions

The Sample One Step Look-Ahead controller implements a simple controller that evaluates the states reached within one move from the current state. The controller tries all available actions in the current state (call to advance), and evaluates the states found after applying each one of these actions. The action that took to the state with the best reward is the one that will be executed.

In order to advance the forward model, an array of actions is needed, containing one action for each player (the index in the array corresponding to the player's ID). In this case, the action chosen for the opponent is that returned by the getOppNotLosingAction method, which picks an action, at random, that the opponent would make, assuming the current player does nothing, which wouldn't make it lose the game.

Note that the controller catches the player ID and the number of players in the game in the constructor, then derives the opponent's ID.

From Agent.java:

public class Agent extends AbstractMultiPlayer {
    int oppID; //player ID of the opponent
    int id; //ID of this player
    int no_players; //number of players in the game
    public static double epsilon = 1e-6;
    public static Random m_rnd;

    /**
     * initialize all variables for the agent
     * @param stateObs Observation of the current state.
     * @param elapsedTimer Timer when the action returned is due.
     * @param playerID ID if this agent
     */
    public Agent(StateObservationMulti stateObs, ElapsedCpuTimer elapsedTimer, int playerID) {
        m_rnd = new Random();

        //get game information
        no_players = stateObs.getNoPlayers();
        id = playerID; //player ID of this agent
        oppID = (playerID + 1) % stateObs.getNoPlayers();
    }

    /**
     *
     * Very simple one step lookahead agent.
     * Pass player ID to all state observation methods to query the right player.
     * Omitting the player ID will result in it being set to the default 0 (first player, whichever that is).
     *
     * @param stateObs Observation of the current state.
     * @param elapsedTimer Timer when the action returned is due.
     * @return An action for the current state
     */
    public Types.ACTIONS act(StateObservationMulti stateObs, ElapsedCpuTimer elapsedTimer) {

        Types.ACTIONS bestAction = null;
        double maxQ = Double.NEGATIVE_INFINITY;

        //A random non-suicidal action by the opponent.
        Types.ACTIONS oppAction = getOppNotLosingAction(stateObs, id, oppID);
        SimpleStateHeuristic heuristic =  new SimpleStateHeuristic(stateObs);

        for (Types.ACTIONS action : stateObs.getAvailableActions(id)) {

            StateObservationMulti stCopy = stateObs.copy();

            //need to provide actions for all players to advance the forward model
            Types.ACTIONS[] acts = new Types.ACTIONS[no_players];

            //set this agent's action
            acts[id] = action;
            acts[oppID] = oppAction;

            stCopy.advance(acts);

            double Q = heuristic.evaluateState(stCopy, id);
            Q = Utils.noise(Q, this.epsilon, this.m_rnd.nextDouble());

            //System.out.println("Action:" + action + " score:" + Q);
            if (Q > maxQ) {
                maxQ = Q;
                bestAction = action;
            }
        }

        //System.out.println("======== " + getPlayerID() + " " + maxQ + " " + bestAction + "============");
        //System.out.println(elapsedTimer.remainingTimeMillis());
        return bestAction;
    }

    //Returns an action, at random, that the oppponent would make, assuming I do NIL, which wouldn't make it lose the game.
    private Types.ACTIONS getOppNotLosingAction(StateObservationMulti stm, int thisID, int oppID)
    {
        int no_players = stm.getNoPlayers();
        ArrayList<Types.ACTIONS> oppActions = stm.getAvailableActions(oppID);

        ArrayList<Types.ACTIONS> nonDeathActions = new ArrayList<>();

        //Look for the opp actions that would not kill the opponent.
        for (Types.ACTIONS action : stm.getAvailableActions(oppID)) {
            Types.ACTIONS[] acts = new Types.ACTIONS[no_players];
            acts[thisID] = Types.ACTIONS.ACTION_NIL;
            acts[oppID] = action;

            StateObservationMulti stCopy = stm.copy();
            stCopy.advance(acts);

            if(stCopy.getMultiGameWinner()[oppID] != Types.WINNER.PLAYER_LOSES)
                nonDeathActions.add(action);
        }

        if(nonDeathActions.size() == 0)
            //Simply random
            return oppActions.get(new Random().nextInt(oppActions.size()));
        else
            //Random, but among those that would not kill the opponent.
            return (Types.ACTIONS) Utils.choice(nonDeathActions.toArray(), m_rnd);
    }
}

The state evaluation is performed by the multi player version of the class SimpleStateHeuristic, when the method evaluateState is called. The following code shows how this method (from SimpleStateHeuristic.java) evaluates the given state. Note that it uses some of the methods described in the Forward Model, querying for positions of other sprites in the game.

public class SimpleStateHeuristic extends StateHeuristicMulti {

    double initialNpcCounter = 0;

    public SimpleStateHeuristic(StateObservationMulti stateObs) {

    }

    public double evaluateState(StateObservationMulti stateObs, int playerID) {
        Vector2d avatarPosition = stateObs.getAvatarPosition(playerID);
        ArrayList<Observation>[] npcPositions = stateObs.getNPCPositions(avatarPosition);
        ArrayList<Observation>[] portalPositions = stateObs.getPortalsPositions(avatarPosition);
        HashMap<Integer, Integer> resources = stateObs.getAvatarResources(playerID);

        ArrayList<Observation>[] npcPositionsNotSorted = stateObs.getNPCPositions();

        double won = 0;
        int oppID = (playerID + 1) % stateObs.getNoPlayers();
        Types.WINNER[] winners = stateObs.getMultiGameWinner();

        boolean bothWin = (winners[playerID] == Types.WINNER.PLAYER_WINS) && (winners[oppID] == Types.WINNER.PLAYER_WINS);
        boolean meWins  = (winners[playerID] == Types.WINNER.PLAYER_WINS) && (winners[oppID] == Types.WINNER.PLAYER_LOSES);
        boolean meLoses = (winners[playerID] == Types.WINNER.PLAYER_LOSES) && (winners[oppID] == Types.WINNER.PLAYER_WINS);
        boolean bothLose = (winners[playerID] == Types.WINNER.PLAYER_LOSES) && (winners[oppID] == Types.WINNER.PLAYER_LOSES);

        if(meWins || bothWin)
            won = 1000000000;
        else if (meLoses)
            return -999999999;

        double minDistance = Double.POSITIVE_INFINITY;
        Vector2d minObject = null;
        int minNPC_ID = -1;
        int minNPCType = -1;

        int npcCounter = 0;
        if (npcPositions != null) {
            for (ArrayList<Observation> npcs : npcPositions) {
                if(npcs.size() > 0)
                {
                    minObject   = npcs.get(0).position; //This is the closest guy
                    minDistance = npcs.get(0).sqDist;   //This is the (square) distance to the closest NPC.
                    minNPC_ID   = npcs.get(0).obsID;    //This is the id of the closest NPC.
                    minNPCType  = npcs.get(0).itype;    //This is the type of the closest NPC.
                    npcCounter += npcs.size();
                }
            }
        }

        if (portalPositions == null) {

            double score = 0;
            if (npcCounter == 0) {
                score = stateObs.getGameScore(playerID) + won*100000000;
            } else {
                score = -minDistance / 100.0 + (-npcCounter) * 100.0 + stateObs.getGameScore(playerID) + won*100000000;
            }

            return score;
        }

        double minDistancePortal = Double.POSITIVE_INFINITY;
        Vector2d minObjectPortal = null;
        for (ArrayList<Observation> portals : portalPositions) {
            if(portals.size() > 0)
            {
                minObjectPortal   =  portals.get(0).position; //This is the closest portal
                minDistancePortal =  portals.get(0).sqDist;   //This is the (square) distance to the closest portal
            }
        }

        double score = 0;
        if (minObjectPortal == null) {
            score = stateObs.getGameScore() + won*100000000;
        }
        else {
            score = stateObs.getGameScore() + won*1000000 - minDistancePortal * 10.0;
        }

        return score;
    }
}

Table of Contents:

Clone this wiki locally