culebra.fitness_function.feature_selection.KappaIndex class

class KappaIndex(training_data: Dataset, test_data: Dataset | None = None, test_prop: float | None = None, cv_folds: int | None = None, classifier: ClassifierMixin | None = None)

Construct a fitness function.

If test_data are provided, the whole training_data are used to train. Otherwise, if test_prop is provided, training_data are split (stratified) into training and test data each time evaluate() is called and a Monte Carlo cross validation is applied. Finally, if both test_data and test_prop are omitted, a k-fold cross-validation is applied.

Parameters:

Class attributes

class culebra.fitness_function.feature_selection.KappaIndex.Fitness

Handles the values returned by the evaluate() method within a Solution.

Fitness.weights = (1.0,)

Maximizes the validation Kappa index.

Fitness.names = ('Kappa',)

Name of the objective.

Fitness.thresholds = [0]

Similarity threshold for fitness comparisons.

Class methods

classmethod KappaIndex.load_pickle(filename: str) Base

Load a pickled object from a file.

Parameters:

filename (str) – The file name.

Raises:
classmethod KappaIndex.set_fitness_thresholds(thresholds: float | Sequence[float]) None

Set new fitness thresholds.

Modifies the thresholds of the Fitness objects generated by this fitness function.

Parameters:

thresholds (float or Sequence of float) – The new thresholds. If only a single value is provided, the same threshold will be used for all the objectives. Different thresholds can be provided in a Sequence

Raises:
classmethod KappaIndex.get_fitness_objective_threshold(obj_name: str) None

Get the similarity threshold for the given objective.

Parameters:

obj_name (str) – Objective name whose threshold is returned

Raises:
  • TypeError – If obj_name isn’t a string

  • ValueError – If value isn’t a valid objective name

classmethod KappaIndex.set_fitness_objective_threshold(obj_name: str, value: float) None

Set a similarity threshold for the given fitness objective.

Parameters:
  • obj_name (str) – Objective name whose threshold is modified

  • value (float) – New value for the similarity threshold.

Raises:
  • TypeError – If obj_name isn’t a string or value isn’t a real number

  • ValueError – If obj_name isn’t a valid objective name or value is lower than 0

Properties

property KappaIndex.is_noisy: int

Return True if the fitness function is noisy.

property KappaIndex.num_obj: int

Get the number of objectives.

Type:

int

property KappaIndex.num_nodes: int

Return the problem graph’s number of nodes for ACO-based trainers.

Returns:

The problem graph’s number of nodes

Return type:

int

property KappaIndex.training_data: Dataset

Get and set the training dataset.

Getter:

Return the training dataset

Setter:

Set a new training dataset

Type:

Dataset

Raises:

TypeError – If set to an invalid dataset

property KappaIndex.test_data: Dataset | None

Get and set the test dataset.

Getter:

Return the test dataset

Setter:

Set a new test dataset

Type:

Dataset

Raises:

TypeError – If set to an invalid dataset

property KappaIndex.test_prop: float | None

Get and set the proportion of data used to test.

Getter:

Return the test data proportion

Setter:

Set a new value for the test data porportion. A real value in (0, 1) or None is expected

Type:

float

Raises:
  • TypeError – If set to a value which is not a real number

  • ValueError – If set to a value which is not in (0, 1)

property KappaIndex.classifier: ClassifierMixin

Get and set the classifier applied within this fitness function.

Getter:

Return the classifier

Setter:

Set a new classifier

Type:

ClassifierMixin

Raises:

TypeError – If set to a value which is not a classifier

Private properties

property KappaIndex._worst_score: float

Worst achievable score.

Type:

float

Methods

KappaIndex.save_pickle(filename: str) None

Pickle this object and save it to a file.

Parameters:

filename (str) – The file name.

Raises:
KappaIndex.heuristic(species: Species) Sequence[ndarray, ...]

Get the heuristic matrices for ACO-based trainers.

Parameters:

species (Species) – Species constraining the problem solutions

Raises:

TypeError – If species is not an instance of Species

Returns:

A tuple with only one heuristic matrix. Arcs between selectable features have a heuristic value of 1, while arcs involving any non-selectable feature or arcs from a feature to itself have a heuristic value of 0.

Return type:

Sequence of ndarray

KappaIndex.evaluate(sol: Solution, index: int | None = None, representatives: Sequence[Solution] | None = None) Tuple[float, ...]

Evaluate a solution.

Parameters:
  • sol (Solution) – Solution to be evaluated.

  • index (int, ignored) – Index where sol should be inserted in the representatives sequence to form a complete solution for the problem. Only used by cooperative problems

  • representatives (Sequence of Solution, ignored) – Representative solutions of each species being optimized. Only used by cooperative problems

Returns:

The fitness of sol

Return type:

tuple of float

Private methods

KappaIndex._score(y2, *, labels=None, weights=None, sample_weight=None)

Use cohen_kappa_score() to score.

KappaIndex._final_training_test_data(sol: Solution) Tuple[Dataset, Dataset]

Get the final training and test data.

Parameters:

sol (Solution) – Solution to be evaluated. It is used to select the features from the datasets

Returns:

The final training and test datasets

Return type:

tuple of Dataset

KappaIndex._evaluate_train_test(training_data: Dataset, test_data: Dataset) Tuple[float, ...]

Evaluate a solution.

This method must be overridden by subclasses to return a correct value.

Parameters:
  • training_data (Dataset) – The training dataset

  • test_data (Dataset) – The test dataset

Returns:

The fitness values for sol

Return type:

tuple of float

KappaIndex._evaluate_mccv(training_data: Dataset) Tuple[float, ...]

Evaluate a solution.

The training_data are split (stratified) into training and test data according to test_prop each time the solution is evalueted and a Monte Carlo cross-validation is applied.

Parameters:

training_data (Dataset) – The training dataset

Returns:

The fitness values for sol

Return type:

tuple of float

KappaIndex._evaluate_kfcv(training_data: Dataset) Tuple[float, ...]

Evaluate a solution.

A k-fold cross-validation is applied using the training_data with cv_folds folds.

Parameters:

training_data (Dataset) – The training dataset

Returns:

The fitness values for sol

Return type:

tuple of float