culebra.tools module

Tools to automate the execution of experiments.

Since many interesting problems are based on data processing, this module provides the Dataset class to hold and manage the data samples.

Besides, since automated experimentation is also a quite valuable characteristic when a Trainer method has to be run many times, culebra provides this features by means of the following classes:

  • The Batch class, which allows to run a batch of experiments with the same configuration

  • The EffectSize class, to keep the outcome of an effect size estimation of several batches results

  • The Evaluation class, a base class for the evaluation of trainers

  • The Experiment class, designed to run a single experiment with a Trainer

  • The Results class, to manage the results provided by the evaluation of any Trainer

  • The ResultsAnalyzer class, to perform statistical analysis over the results of several experimtent batchs

  • The ResultsComparison class, to keep the outcome of a comparison of several batches results

  • The TestOutcome class, to keep the outcome of a statistical test

Attributes

DEFAULT_ALPHA = 0.05

Default significance level for statistical tests.

DEFAULT_BATCH_STATS_FUNCTIONS = {'Avg': <function NDFrame._add_numeric_operations.<locals>.mean>, 'Max': <function NDFrame._add_numeric_operations.<locals>.max>, 'Min': <function NDFrame._add_numeric_operations.<locals>.min>, 'Std': <function NDFrame._add_numeric_operations.<locals>.std>}

Default statistics calculated for the results gathered from all the experiments.

DEFAULT_CONFIG_SCRIPT_FILENAME = 'config.py'

Default file name for configuration files.

DEFAULT_FEATURE_METRIC_FUNCTIONS = {'Rank': <function Metrics.rank>, 'Relevance': <function Metrics.relevance>}

Default metrics calculated for the features in the set of solutions.

DEFAULT_HOMOSCEDASTICITY_TEST = <function bartlett>

Default homoscedasticity test.

DEFAULT_OUTLIER_PROPORTION = 0.05

Expected outlier proportion por class.

DEFAULT_NORMALITY_TEST = <function shapiro>

Default normality test.

DEFAULT_NUM_EXPERIMENTS = 1

Default number of experiments in the batch.

DEFAULT_P_ADJUST = 'fdr_tsbky'

Default method for adjusting the p-values with the Dunn’s test.

DEFAULT_RESULTS_BASE_FILENAME = 'results'

Default base name for results files.

DEFAULT_RUN_SCRIPT_FILENAME = 'run.py'

Default file name for the script to run an evaluation.

DEFAULT_SEP = '\\s+'

Default column separator used within dataset files.

DEFAULT_SMOTE_NUM_NEIGHBORS = 5

Default number of neighbors for SMOTE.

DEFAULT_STATS_FUNCTIONS = {'Avg': <function mean>, 'Max': <function amax>, 'Min': <function amin>, 'Std': <function std>}

Default statistics calculated for the results.

EXCEL_FILE_EXTENSION = '.xlsx'

File extension for Excel datasheets.