Ronny Restrepo

Problem specification:

One of the problems I encountered with the formulation of random agents in the previous blog post is when the agents are supposed to do things that require randomness. For instance, if we want the agents to all modify the weights by different random amounts. Doing so with the way it was formulated in the previous blog will lead to each parallel agent using exactly the same random values.

Why:

I think that when the agents are copied into the parallel processes, the random seeds are also being copied to the separate processes. So the random numbers being generated are exactly the same accross all agents.

Solution:

One solution i can think of is to pass a different random seed to each of the functions that is being called in parallel and have each one of them set their own random seed.

The rest of this blog contains code on how to set it up this way.

Setup

# Libraries needed for parallel processing
import multiprocessing
from multiprocessing import Pool
from functools import partial

# Libraries for creating agent
import torch
from torch import nn
from torch import Tensor
from torch.autograd import Variable
import numpy as np

# #############################################
#                  SETTINGS
# #############################################
num_cores = multiprocessing.cpu_count() # use all available CPU cores

Agent

class Agent(nn.Module):
    def __init__(self, w=(9,9,9)):
        super().__init__()
        self.fc1 = nn.Linear(3,1)
        self.fc1.weight.requires_grad=False
        self.set_weights({"fc1.weight": torch.tensor([w], dtype=torch.float32)})
    def score(self):
        # dummy operation to make it use up lots of CPU for some portion of time
        for i in range(5000):
            i**i
        # return a score function
        return self.fc1.weight.sum().item()
    def get_weights(self, copy=False):
        if copy:
            return {"fc1.weight": self.fc1.weight}
        else:
            return {"fc1.weight": self.fc1.weight.clone()}
    def set_weights(self, w):
        self.load_state_dict(w, strict=False)
    def jitter_weights(self, scale=None):
        w = {"fc1.weight": self.fc1.weight}
        step = torch.randint(3,(3,), dtype=torch.float32)-1.0
        w["fc1.weight"].add_(step)
        return step
    def __str__(self):
        return str(self.fc1.weight.clone().data.numpy()[0].tolist())
    def __repr__(self):
        return self.__str__()

Parallel Processing Without Seeding Separately

The following shows what it looks like if we do not pass a separate seed to each process, and how it leads to the same random numbers being generated. We run three agents in parallel at a time, and this allows us to see that the random numbers ( the step value) are uniform within each group of three agents.

# from copy import deepcopy
n_agents = 6
num_cores = 3
# batch_seeds = np.random.randint(1,1000,n_agents)
batch_agents = [Agent() for _ in range(n_agents)]
batch_weights = [{'fc1.weight': torch.tensor([[1., 1., 1.]])},
             {'fc1.weight': torch.tensor([[2., 2., 2.]])},
             {'fc1.weight': torch.tensor([[3., 3., 3.]])},
             {'fc1.weight': torch.tensor([[4., 4., 4.]])},
             {'fc1.weight': torch.tensor([[5., 5., 5.]])},
             {'fc1.weight': torch.tensor([[6., 6., 6.]])},
             ]

def process_input(wagent):
    weights, agent = wagent
    agent.set_weights(weights)
    w1 = agent.fc1.weight.clone().data.numpy().tolist()
    step = agent.jitter_weights().data.numpy().tolist()
    w2 = agent.fc1.weight.clone().data.numpy().tolist()
    print("Orig: {}  Step: {}  New: {}".format(w1, step, w2))
    return agent.score()

# RUN IN PARALLEL
with Pool(num_cores) as pool:
    scores = pool.map(partial(process_input), zip(batch_weights, batch_agents))

[OUTPUT]

Orig: [[1.0, 1.0, 1.0]]  Step: [1.0, 0.0, 0.0]  New: [[2.0, 1.0, 1.0]]
Orig: [[2.0, 2.0, 2.0]]  Step: [1.0, 0.0, 0.0]  New: [[3.0, 2.0, 2.0]]
Orig: [[3.0, 3.0, 3.0]]  Step: [1.0, 0.0, 0.0]  New: [[4.0, 3.0, 3.0]]
Orig: [[4.0, 4.0, 4.0]]  Step: [0.0, 1.0, 0.0]  New: [[4.0, 5.0, 4.0]]
Orig: [[5.0, 5.0, 5.0]]  Step: [0.0, 1.0, 0.0]  New: [[5.0, 6.0, 5.0]]
Orig: [[6.0, 6.0, 6.0]]  Step: [0.0, 1.0, 0.0]  New: [[6.0, 7.0, 6.0]]

We see from the output that even though the new output values are different, it is only because we passed the initial values of the weights as different values to begin with, and the actual random values being generated are exactly the same. This is problematic if we want the agent to start with the same values and randomly explore different values in parallel.

Parallel Processing with separate seeds

Now, we set it up in such a way that each separate process will generate different random values.

# from copy import deepcopy
n_agents = 6
num_cores = 3
batch_seeds = np.random.randint(1,1000,n_agents)
batch_agents = [Agent() for _ in range(n_agents)]

def process_input(sagent):
    seed, agent = sagent

    # Set the random seed for the process
    # # env.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

    w1 = agent.fc1.weight.clone().data.numpy().tolist()
    step = agent.jitter_weights().data.numpy().tolist()
    w2 = agent.fc1.weight.clone().data.numpy().tolist()
    print("Orig: {}  Step: {}  New: {}".format(w1, step, w2))
    return agent.score()

# RUN IN PARALLEL
with Pool(num_cores) as pool:
    scores = pool.map(partial(process_input), zip(batch_seeds, batch_agents))

print("\n")
batch_agents

[OUTPUT]

Orig: [[9.0, 9.0, 9.0]]  Step: [0.0, 0.0, 1.0]  New: [[9.0, 9.0, 10.0]]
Orig: [[9.0, 9.0, 9.0]]  Step: [0.0, -1.0, -1.0]  New: [[9.0, 8.0, 8.0]]
Orig: [[9.0, 9.0, 9.0]]  Step: [0.0, 1.0, -1.0]  New: [[9.0, 10.0, 8.0]]
Orig: [[9.0, 9.0, 9.0]]  Step: [0.0, 0.0, 1.0]  New: [[9.0, 9.0, 10.0]]
Orig: [[9.0, 9.0, 9.0]]  Step: [-1.0, 0.0, 1.0]  New: [[8.0, 9.0, 10.0]]
Orig: [[9.0, 9.0, 9.0]]  Step: [-1.0, -1.0, -1.0]  New: [[8.0, 8.0, 8.0]]

[[9.0, 10.0, 8.0],
 [9.0, 8.0, 8.0],
 [9.0, 9.0, 10.0],
 [9.0, 9.0, 10.0],
 [8.0, 9.0, 10.0],
 [8.0, 8.0, 8.0]]

We see that even though we started all agents with the exact same values, each one performs a different random step, and so we end up with a different new value for the weights.

Parallel Agents with Random Behaviour

Setup

Agent

Parallel Processing Without Seeding Separately

Parallel Processing with separate seeds

Comments