Gaussian Policy
Gaussian Policy Class
- class modularl.policies.GaussianPolicy(observation_shape: int, action_shape: int, high_action: float, low_action: float, network: Module | None = None, use_xavier: bool = True, **kwargs: Any)[source]
Bases:
AbstractPolicyGaussian Policy for continuous action spaces.
- Parameters:
observation_shape (int) – Dimension of the observation space.
action_shape (int) – Dimension of the action space.
high_action (float) – Upper bound of the action space.
low_action (float) – Lower bound of the action space.
network (nn.Module, optional) – Custom neural network to represent the policy. If None, a default network is used. Defaults to None.
use_xavier (bool, optional) – Whether to use Xavier initialization for weights. Defaults to True.
Note
If a custom network is provided, it should be headless, meaning that this class will add additional linear layers on top of the provided network. Specifically, the class appends two nn.Linear layers for mean and log_std, with input size equal to the output features of the last layer in the provided network. The head of the network consists of two nn.Linear layers for mean and log_std, with input size equal to the output features of the last layer in the provided network.
- forward(batch_observation: Tensor) Tuple[Tensor, Tensor][source]
Forward pass of the policy network.
Note
This method should not be used to get actions. For obtaining the batch actions, please use the get_action method.
- Parameters:
batch_observation (torch.Tensor) – Batch observation from the environment
- get_action(batch_observation: Tensor)[source]
Get action from the policy
- Parameters:
observation (torch.Tensor) – Observation from the environment
- Returns:
Sampled action from the policy distribution (only if deterministic is False) log_prob (torch.Tensor): Log probability of the action (only if deterministic is False) mean (torch.Tensor): Mean of the action distribution
- Return type:
action (torch.Tensor)
Example Usage
Here’s an example of how to use the GaussianPolicy:
import torch
import torch.nn as nn
from modularl.policies.gaussian_policy import GaussianPolicy
# Define custom network
class CustomNetwork(nn.Module):
def __init__(self, observation_shape):
super(CustomNetwork, self).__init__()
self.network = nn.Sequential(
nn.Linear(observation_shape, 128),
nn.ReLU(),
nn.Linear(128, 128),
nn.ReLU()
)
def forward(self, x):
return self.network(x)
# Observation and action space dimensions
observation_shape = 10
action_shape = 2
high_action = 1.0
low_action = -1.0
# Custom network instance
custom_network = CustomNetwork(observation_shape)
# Create Gaussian Policy with the custom network
policy = GaussianPolicy(
observation_shape=observation_shape,
action_shape=action_shape,
high_action=high_action,
low_action=low_action,
network=custom_network
)
# Or without a custom network
default_policy = GaussianPolicy(
observation_shape=observation_shape,
action_shape=action_shape,
high_action=high_action,
low_action=low_action
)
# Example observation
observation = torch.randn((1, observation_shape))
# Get action from the policy
action, log_prob, mean = policy.get_action(observation)
print("Action:", action)
print("Log Probability:", log_prob)
print("Mean:", mean)