culebra.trainer.abc.DistributedTrainer class

class DistributedTrainer(fitness_function: FitnessFunction, subtrainer_cls: type[SingleSpeciesTrainer], max_num_iters: int | None = None, custom_termination_func: Callable[[SingleSpeciesTrainer], bool] | None = None, num_subtrainers: int | None = None, representation_size: int | None = None, representation_freq: int | None = None, representation_topology_func: Callable[[int, int, Any], list[int]] | None = None, representation_topology_func_params: dict[str, Any] | None = None, representation_selection_func: Callable[[list[Solution], Any], Solution] | None = None, representation_selection_func_params: dict[str, Any] | None = None, checkpoint_activation: bool | None = None, checkpoint_freq: int | None = None, checkpoint_filename: str | None = None, verbosity: bool | None = None, random_seed: int | None = None, **subtrainer_params: Any)

Bases: Trainer

Create a new trainer.

Parameters:
Raises:
  • TypeError – If any argument is not of the appropriate type

  • ValueError – If any argument has an incorrect value

Class attributes

DistributedTrainer.objective_stats = {'Avg': <function mean>, 'Max': <function max>, 'Min': <function min>, 'Std': <function std>}

Statistics calculated for each objective.

DistributedTrainer.stats_names = ('Iter', 'NEvals')

Statistics calculated each iteration.

Class methods

classmethod DistributedTrainer.load(filename: str) Base

Load a serialized object from a file.

Parameters:

filename (str) – The file name.

Returns:

The loaded object

Raises:

Properties

property DistributedTrainer.checkpoint_activation: bool

Checkpointing activation.

Returns:

True if checkpointing is active, or False otherwise

Return type:

bool

Setter:

Modify the checkpointing activation

Parameters:

value (bool) – New value for the checkpoint activation. If set to None, _default_checkpoint_activation is chosen

Raises:

TypeError – If value is not a boolean value

property DistributedTrainer.checkpoint_filename: str

Checkpoint file path.

Return type:

str

Setter:

Modify the checkpoint file path

Parameters:

value (str) – New value for the checkpoint file path. If set to None, _default_checkpoint_filename is chosen

Raises:
property DistributedTrainer.checkpoint_freq: int

Checkpoint frequency.

Return type:

int

Setter:

Modify the checkpoint frequency

Parameters:

value (int) – New value for the checkpoint frequency. If set to None, _default_checkpoint_freq is chosen

Raises:
property DistributedTrainer.container: Trainer | None

Container of this trainer.

The trainer container is only used by distributed trainers. For the rest of trainers defaults to None.

Return type:

Trainer

Setter:

Set a new value for container of this trainer

Parameters:

value (Trainer) – New value for the container or None

Raises:

TypeError – If value is not a valid trainer

property DistributedTrainer.current_iter: int | None

Current iteration.

Returns:

The current iteration or None if the search has not been done yet

Return type:

int

property DistributedTrainer.custom_termination_func: Callable[[Trainer], bool] | None

Custom termination criterion.

Although the trainer will always stop when the max_num_iters are reached, a custom termination criterion can be set to detect convergente and stop the trainer earlier. This custom termination criterion must be a function which receives the trainer as its unique argument and returns a boolean value, True if the search should terminate or False otherwise.

If more than one arguments are needed to define the termination condition, functools.partial() can be used:

from functools import partial

def my_crit(trainer, max_iters):
    if trainer.current_iter < max_iters:
        return False
    return True

trainer.custom_termination_func = partial(my_crit, max_iters=10)
Setter:

Set a new custom termination criterion

Parameters:

func (Callable) – The new custom termination criterion. If set to None, the default termination criterion is used

Raises:

TypeError – If func is not callable

property DistributedTrainer.fitness_function: FitnessFunction

Training fitness function.

Return type:

FitnessFunction

Setter:

Set a new fitness function

Parameters:

func (FitnessFunction) – The new training fitness function

Raises:

TypeError – If func is not a valid fitness function

property DistributedTrainer.index: int

Trainer index.

The trainer index is only used by distributed trainers. For the rest of trainers _default_index is used.

Return type:

int

Setter:

Set a new value for trainer index.

Parameters:

value (int) – New value for the trainer index. If set to None, _default_index is chosen

Raises:
property DistributedTrainer.logbook: Logbook | None

Trainer logbook.

Returns:

A logbook with the statistics of the search or None if the search has not been done yet

Return type:

Logbook

property DistributedTrainer.max_num_iters: int

Maximum number of iterations.

Return type:

int

Setter:

Set a new value for the maximum number of iterations

Parameters:

value (int) – The new maximum number of iterations. If set to None, the default maximum number of iterations, _default_max_num_iters, is chosen

Raises:
property DistributedTrainer.num_evals: int | None

Number of evaluations performed while training.

Returns:

The number of evaluations or None if the search has not been done yet

Return type:

int

property DistributedTrainer.num_subtrainers: int

Number of subtrainers.

Return type:

int

Setter:

Set a new value for the number of subtrainers

Parameters:

value (int) – The new number of subtrainers. If set to None, _default_num_subtrainers is chosen

Raises:
property DistributedTrainer.random_seed: int

Random seed used by this trainer.

Return type:

int

Setter:

Set a new value for the random seed

Parameters:

value (int) – New value

property DistributedTrainer.representation_freq: int

Number of iterations between representatives sendings.

Return type:

int

Setter:

Set a new value for the frequency

Parameters:

value (int) – The new frequency. If set to None, _default_representation_freq is chosen

Raises:
property DistributedTrainer.representation_selection_func: Callable[[list[Solution], Any], Solution]

Representation selection policy function.

Returns:

A function that chooses which solutions are selected as representatives of each subtrainer

Return type:

Callable

Setter:

Set new representation selection policy function.

Parameters:

func (Callable) – The new function. If set to None, _default_representation_selection_func is chosen

Raises:

TypeError – If func is not callable

property DistributedTrainer.representation_selection_func_params: dict[str, Any]

Parameters of the representation selection function.

Return type:

dict

Setter:

Set new parameters

Parameters:

params (dict) – The new parameters. If set to None, _default_representation_selection_func_params is chosen

Raises:

TypeError – If params is not a dict

property DistributedTrainer.representation_size: int

Representation size.

Returns:

The number of representatives sent to the other subtrainers

Return type:

int

Setter:

Set a new representation size

Parameters:

size (int) – The new size. If set to None, _default_representation_size is chosen

Raises:
property DistributedTrainer.representation_topology_func: Callable[[int, int, Any], list[int]]

Representation topology function.

Return type:

Callable

Setter:

Set new representation topology function

Parameters:

func (Callable) – The new function. If set to None, _default_representation_topology_func is chosen

Raises:

TypeError – If func is not callable

property DistributedTrainer.representation_topology_func_params: dict[str, Any]

Parameters of the representation topology function.

Return type:

dict

Setter:

Set new parameters

Parameters:

params (dict) – The new parameters. If set to None, _default_representation_topology_func_params is chosen

Raises:

TypeError – If params is not a dict

property DistributedTrainer.representatives: list[list[Solution | None]] | None

Representatives of the other species.

Only used by cooperative trainers. If the trainer does not use representatives, None is returned.

Return type:

list[list[Solution]]

property DistributedTrainer.runtime: float | None

Training runtime.

Returns:

The training runtime or None if the search has not been done yet.

Return type:

float

property DistributedTrainer.subtrainer_checkpoint_filenames: Generator[str, None, None]

Checkpoint file name for all the subtrainers.

Returns:

A generator of the filenames

Return type:

Generator[str, None, None]

property DistributedTrainer.subtrainer_cls: type[SingleSpeciesTrainer]

Trainer class to handle the subtrainers.

Each subtrainer will be handled by a single-species trainer.

Return type:

type[SingleSpeciesTrainer]

Setter:

Set a new trainer class to handle the subtrainers

Parameters:

cls (type[SingleSpeciesTrainer]) – The new class

Raises:

TypeError – If cls is not a valid trainer class

property DistributedTrainer.subtrainer_params: dict[str, Any]

Custom parameters for the subtrainers.

Return type:

dict

Setter:

Set new parameters

Parameters:

params (dict) – The new parameters

Raises:

TypeError – If params is not a dict

property DistributedTrainer.subtrainers: list[SingleSpeciesTrainer] | None

Subtrainers.

One single-species trainer for each subtrainer.

Return type:

list[SingleSpeciesTrainer]

property DistributedTrainer.verbosity: bool

Verbosity of this trainer.

Return type:

bool

Setter:

Set a new value for the verbosity

Parameters:

value (bool) – The verbosity. If set to None, _default_verbosity is chosen

Raises:

TypeError – If value is not boolean

Private properties

property DistributedTrainer._default_checkpoint_activation: bool

Default checkpointing activation.

Returns:

DEFAULT_CHECKPOINT_ACTIVATION

Return type:

bool

property DistributedTrainer._default_checkpoint_filename: str

Default checkpointing file name.

Returns:

DEFAULT_CHECKPOINT_FILENAME

Return type:

str

property DistributedTrainer._default_checkpoint_freq: int

Default checkpointing frequency.

Returns:

DEFAULT_CHECKPOINT_FREQ

Return type:

int

property DistributedTrainer._default_index: int

Default index.

Returns:

DEFAULT_INDEX

Return type:

int

property DistributedTrainer._default_max_num_iters: int

Default maximum number of iterations.

Returns:

DEFAULT_MAX_NUM_ITERS

Return type:

int

property DistributedTrainer._default_num_subtrainers: int

Default number of subtrainers.

Returns:

DEFAULT_NUM_SUBTRAINERS

Return type:

int

property DistributedTrainer._default_representation_freq: int

Default number of iterations between representatives sending.

Returns:

DEFAULT_REPRESENTATION_FREQ

Return type:

int

property DistributedTrainer._default_representation_selection_func: Callable[[list[Solution], Any], Solution]

Default selection policy function to choose the representatives.

Returns:

DEFAULT_REPRESENTATION_SELECTION_FUNC

Return type:

Callable

property DistributedTrainer._default_representation_selection_func_params: dict[str, Any]

Default parameters for the representatives selection policy function.

Returns:

DEFAULT_REPRESENTATION_SELECTION_FUNC_PARAMS

Return type:

dict

property DistributedTrainer._default_representation_size: int

Default number of representatives sent to the other subtrainers.

Returns:

DEFAULT_REPRESENTATION_SIZE

Return type:

int

property DistributedTrainer._default_representation_topology_func: Callable[[int, int, Any], list[int]]

Default topology function.

This property must be overridden by subclasses to return a correct value.

Return type:

Callable

Raises:

NotImplementedError – If has not been overridden

property DistributedTrainer._default_representation_topology_func_params: dict[str, Any]

Default parameters for the default topology function.

This property must be overridden by subclasses to return a correct value.

Return type:

dict

Raises:

NotImplementedError – If has not been overridden

property DistributedTrainer._default_verbosity: bool

Default verbosity.

Returns:

DEFAULT_VERBOSITY

Return type:

bool

property DistributedTrainer._subtrainer_suffixes: Generator[str, None, None]

Suffixes for the different subtrainers.

Can be used to generate the subtrainers’ names, checkpoint files, etc.

Returns:

A generator of the suffixes

Return type:

Generator[str, None, None]

Static methods

abstract static DistributedTrainer.receive_representatives(subtrainer: SingleSpeciesTrainer) None

Receive representative solutions.

This method must be overridden by subclasses.

Parameters:

subtrainer (SingleSpeciesTrainer) – The subtrainer receiving representatives

Raises:

NotImplementedError – If has not been overridden

abstract static DistributedTrainer.send_representatives(subtrainer: SingleSpeciesTrainer) None

Send representatives.

This method must be overridden by subclasses.

Parameters:

subtrainer (SingleSpeciesTrainer) – The sender subtrainer

Raises:

NotImplementedError – If has not been overridden

Methods

DistributedTrainer.best_representatives() list[list[Solution]] | None

Return a list of representatives from each species.

Only used for cooperative trainers.

Returns:

A list of representatives lists if the trainer is cooperative or None in other cases.

Return type:

list[list[Solution]]

abstract DistributedTrainer.best_solutions() tuple[HallOfFame]

Get the best solutions found for each species.

This method must be overridden by subclasses to return a correct value.

Returns:

One Hall of Fame for each species

Return type:

tuple[HallOfFame]

Raises:

NotImplementedError – If has not been overridden

DistributedTrainer.dump(filename: str) None

Serialize this object and save it to a file.

Parameters:

filename (str) – The file name.

Raises:
DistributedTrainer.evaluate(sol: Solution, fitness_func: FitnessFunction | None = None, index: int | None = None, representatives: Sequence[Sequence[Solution | None]] | None = None) None

Evaluate one solution.

Its fitness will be modified according with the fitness function results. Besides, if called during training, the number of evaluations will be also updated.

Parameters:
  • sol (Solution) – The solution

  • fitness_func (FitnessFunction) – The fitness function. If omitted, the default training fitness function (fitness_function) is used

  • index (int) – Index where sol should be inserted in the representatives sequence to form a complete solution for the problem. If omitted, index is used

  • representatives (Sequence[Sequence[Solution]]) – Sequence of representatives of other species or None (if no representatives are needed to evaluate sol). If omitted, the current value of representatives is used

DistributedTrainer.reset() None

Reset the trainer.

Delete the state of the trainer (with _reset_state()) and also all the internal data structures needed to perform the search (with _reset_internals()).

This method should be invoqued each time a hyper parameter is modified.

DistributedTrainer.test(best_found: Sequence[HallOfFame], fitness_func: FitnessFunction | None = None, representatives: Sequence[Sequence[Solution]] | None = None) None

Apply the test fitness function to the solutions found.

Update the solutions in best_found with their test fitness.

Parameters:
Raises:
  • TypeError – If any parameter has a wrong type

  • ValueError – If any parameter has an invalid value.

DistributedTrainer.train(state_proxy: DictProxy | None = None) None

Perform the training process.

Parameters:

state_proxy (DictProxy) – dictionary proxy to copy the output state of the trainer procedure. Only used if train is executed within a multiprocess.Process. Defaults to None

Private methods

DistributedTrainer._default_termination_func() bool

Default termination criterion.

Returns:

True if max_num_iters iterations have been run

Return type:

bool

abstract DistributedTrainer._do_iteration() None

Implement an iteration of the search process.

This abstract method should be implemented by subclasses in order to implement the desired behavior.

DistributedTrainer._do_iteration_stats() None

Perform the iteration stats.

This method should be implemented by subclasses in order to perform the adequate stats.

DistributedTrainer._finish_iteration() None

Finish an iteration.

Finish the iteration metrics (number of evaluations, execution time) after each iteration is run.

Finish the search process.

This method is called after the search has finished. It can be overridden to perform any treatment of the solutions found.

abstract DistributedTrainer._generate_subtrainers() None

Generate the subtrainers.

Also assign an index and a container to each SingleSpeciesTrainer subtrainer, and change the subtrainers’ checkpoint_filename according to the container checkpointing file name and each subtrainer index.

Finally, the _preprocess_iteration() and _postprocess_iteration() methods of the subtrainer_cls class are dynamically overridden, in order to allow solutions exchange between subtrainers, if necessary.

This method must be overridden by subclasses.

Raises:

NotImplementedError – If has not been overridden

DistributedTrainer._get_state() dict[str, Any]

Return the state of this trainer.

Default state is a dictionary composed of the values of the logbook, num_evals, runtime, current_iter, and representatives trainer properties, along with a private boolean attribute that informs if the search has finished and also the states of the random and numpy.random modules.

If subclasses use any more properties to keep their state, the _get_state() and _set_state() methods must be overridden to take into account such properties.

Return type:

dict

DistributedTrainer._init_internals() None

Set up the trainer internal data structures to start searching.

Overridden to create the subtrainers and communication queues.

DistributedTrainer._init_representatives() None

Init the representatives of the other species.

Only used for cooperative approaches, which need representatives of all the species to form a complete solution for the problem. Cooperative subclasses of the Trainer class should override this method to get the representatives of the other species initialized.

Init the search process.

Initialize the state of the trainer (with _init_state()) and all the internal data structures needed (with _init_internals()) to perform the search.

DistributedTrainer._init_state() None

Init the trainer state.

If there is any checkpoint file, the state is initialized from it with the _load_state() method. Otherwise a new initial state is generated with the _new_state() method.

DistributedTrainer._load_state() None

Load the state of the last checkpoint.

Raises:

Exception – If the checkpoint file can’t be loaded

DistributedTrainer._new_state() None

Generate a new trainer state.

Overridden to set the logbook to None, since the final logbook will be generated from the subtrainers’ logbook, once the trainer has finished.

DistributedTrainer._postprocess_iteration() None

Postprocess after doing the iteration.

Subclasses should override this method to make any postprocessment after performing an iteration.

DistributedTrainer._preprocess_iteration() None

Preprocess before doing the iteration.

Subclasses should override this method to make any preprocessment before performing an iteration.

DistributedTrainer._reset_internals() None

Reset the internal structures of the trainer.

Overridden to reset the subtrainers and communication queues.

DistributedTrainer._reset_state() None

Reset the trainer state.

If subclasses overwrite the _new_state() method to add any new property to keep their state, this method should also be overridden to reset the full state of the trainer.

DistributedTrainer._save_state() None

Save the state at a new checkpoint.

Raises:

Exception – If the checkpoint file can’t be written

Apply the search algorithm.

Execute the trainer until the termination condition is met. Each iteration is composed by the following steps:

DistributedTrainer._set_cooperative_fitness(sol: Solution, fitness_trials_values: [Sequence[tuple[float]]]) None

Estimate a solution fitness from multiple evaluation trials.

Applies an average of the fitness trials values. Trainers requiring another estimation should override this method.

Parameters:
  • sol (Solution) – The solution

  • fitness_trials_values (Sequence[tuple[float]]) – Sequence of fitness trials values. Each trial should be obtained with a different context in a cooperative trainer approach.

DistributedTrainer._set_state(state: dict[str, Any]) None

Set the state of this trainer.

If subclasses use any more properties to keep their state, the _get_state() and _set_state() methods must be overridden to take into account such properties.

Parameters:

state (dict) – The last loaded state

DistributedTrainer._start_iteration() None

Start an iteration.

Prepare the iteration metrics (number of evaluations, execution time) before each iteration is run.

DistributedTrainer._termination_criterion() bool

Control the search termination.

Returns:

True if either the default termination criterion or a custom termination criterion is met. The default termination criterion is implemented by the _default_termination_func() method. Another custom termination criterion can be set with custom_termination_func method.

Return type:

bool