Problem specification:
One of the problems I encountered with the formulation of random agents in the previous blog post is when the agents are supposed to do things that require randomness. For instance, if we want the agents to all modify the weights by different random amounts. Doing so with the way it was formulated in the previous blog will lead to each parallel agent using exactly the same random values.
Why:
I think that when the agents are copied into the parallel processes, the random seeds are also being copied to the separate processes. So the random numbers being generated are exactly the same accross all agents.
Solution:
One solution i can think of is to pass a different random seed to each of the functions that is being called in parallel and have each one of them set their own random seed.
The rest of this blog contains code on how to set it up this way.
# Libraries needed for parallel processing import multiprocessing from multiprocessing import Pool from functools import partial # Libraries for creating agent import torch from torch import nn from torch import Tensor from torch.autograd import Variable import numpy as np # ############################################# # SETTINGS # ############################################# num_cores = multiprocessing.cpu_count() # use all available CPU cores
class Agent(nn.Module): def __init__(self, w=(9,9,9)): super().__init__() self.fc1 = nn.Linear(3,1) self.fc1.weight.requires_grad=False self.set_weights({"fc1.weight": torch.tensor([w], dtype=torch.float32)}) def score(self): # dummy operation to make it use up lots of CPU for some portion of time for i in range(5000): i**i # return a score function return self.fc1.weight.sum().item() def get_weights(self, copy=False): if copy: return {"fc1.weight": self.fc1.weight} else: return {"fc1.weight": self.fc1.weight.clone()} def set_weights(self, w): self.load_state_dict(w, strict=False) def jitter_weights(self, scale=None): w = {"fc1.weight": self.fc1.weight} step = torch.randint(3,(3,), dtype=torch.float32)-1.0 w["fc1.weight"].add_(step) return step def __str__(self): return str(self.fc1.weight.clone().data.numpy()[0].tolist()) def __repr__(self): return self.__str__()
The following shows what it looks like if we do not pass a separate seed to each process, and how it leads to the same random numbers being generated. We run three agents in parallel at a time, and this allows us to see that the random numbers ( the step
value) are uniform within each group of three agents.
# from copy import deepcopy n_agents = 6 num_cores = 3 # batch_seeds = np.random.randint(1,1000,n_agents) batch_agents = [Agent() for _ in range(n_agents)] batch_weights = [{'fc1.weight': torch.tensor([[1., 1., 1.]])}, {'fc1.weight': torch.tensor([[2., 2., 2.]])}, {'fc1.weight': torch.tensor([[3., 3., 3.]])}, {'fc1.weight': torch.tensor([[4., 4., 4.]])}, {'fc1.weight': torch.tensor([[5., 5., 5.]])}, {'fc1.weight': torch.tensor([[6., 6., 6.]])}, ] def process_input(wagent): weights, agent = wagent agent.set_weights(weights) w1 = agent.fc1.weight.clone().data.numpy().tolist() step = agent.jitter_weights().data.numpy().tolist() w2 = agent.fc1.weight.clone().data.numpy().tolist() print("Orig: {} Step: {} New: {}".format(w1, step, w2)) return agent.score() # RUN IN PARALLEL with Pool(num_cores) as pool: scores = pool.map(partial(process_input), zip(batch_weights, batch_agents))
[OUTPUT]
Orig: [[1.0, 1.0, 1.0]] Step: [1.0, 0.0, 0.0] New: [[2.0, 1.0, 1.0]] Orig: [[2.0, 2.0, 2.0]] Step: [1.0, 0.0, 0.0] New: [[3.0, 2.0, 2.0]] Orig: [[3.0, 3.0, 3.0]] Step: [1.0, 0.0, 0.0] New: [[4.0, 3.0, 3.0]] Orig: [[4.0, 4.0, 4.0]] Step: [0.0, 1.0, 0.0] New: [[4.0, 5.0, 4.0]] Orig: [[5.0, 5.0, 5.0]] Step: [0.0, 1.0, 0.0] New: [[5.0, 6.0, 5.0]] Orig: [[6.0, 6.0, 6.0]] Step: [0.0, 1.0, 0.0] New: [[6.0, 7.0, 6.0]]
We see from the output that even though the new output values are different, it is only because we passed the initial values of the weights as different values to begin with, and the actual random values being generated are exactly the same. This is problematic if we want the agent to start with the same values and randomly explore different values in parallel.
Now, we set it up in such a way that each separate process will generate different random values.
# from copy import deepcopy n_agents = 6 num_cores = 3 batch_seeds = np.random.randint(1,1000,n_agents) batch_agents = [Agent() for _ in range(n_agents)] def process_input(sagent): seed, agent = sagent # Set the random seed for the process # # env.seed(seed) np.random.seed(seed) torch.manual_seed(seed) w1 = agent.fc1.weight.clone().data.numpy().tolist() step = agent.jitter_weights().data.numpy().tolist() w2 = agent.fc1.weight.clone().data.numpy().tolist() print("Orig: {} Step: {} New: {}".format(w1, step, w2)) return agent.score() # RUN IN PARALLEL with Pool(num_cores) as pool: scores = pool.map(partial(process_input), zip(batch_seeds, batch_agents)) print("\n") batch_agents
[OUTPUT]
Orig: [[9.0, 9.0, 9.0]] Step: [0.0, 0.0, 1.0] New: [[9.0, 9.0, 10.0]] Orig: [[9.0, 9.0, 9.0]] Step: [0.0, -1.0, -1.0] New: [[9.0, 8.0, 8.0]] Orig: [[9.0, 9.0, 9.0]] Step: [0.0, 1.0, -1.0] New: [[9.0, 10.0, 8.0]] Orig: [[9.0, 9.0, 9.0]] Step: [0.0, 0.0, 1.0] New: [[9.0, 9.0, 10.0]] Orig: [[9.0, 9.0, 9.0]] Step: [-1.0, 0.0, 1.0] New: [[8.0, 9.0, 10.0]] Orig: [[9.0, 9.0, 9.0]] Step: [-1.0, -1.0, -1.0] New: [[8.0, 8.0, 8.0]] [[9.0, 10.0, 8.0], [9.0, 8.0, 8.0], [9.0, 9.0, 10.0], [9.0, 9.0, 10.0], [8.0, 9.0, 10.0], [8.0, 8.0, 8.0]]
We see that even though we started all agents with the exact same values, each one performs a different random step, and so we end up with a different new value for the weights.
Note you can comment without any login by: