MC_metric

`timecave.validation_strategy_metrics.MC_metric(estimated_error_list, test_error_list, metric)`

Compute validation strategy metrics for N different experiments (MC stands for Monte Carlo).

This function processes the results of a Monte Carlo experiment and outputs a statistical summary of the results. This can be useful if one needs to analyse the performance of a given validation method on several different time series or using different models.
Users may provide a custom metric if they so desire, but it must have the same function signature as the metrics provided by this package.

Parameters:

Name	Type	Description	Default
`estimated_error_list`	`list[float \| int]`	List of estimated (i.e. validation) errors, one for each experiment / trial.	required
`test_error_list`	`list[float \| int]`	List of test errors, one for each experiment / trial.	required
`metric`	`callable`	Validation strategy metric.	required

Returns:

Type	Description
`dict`	A statistical summary of the results.

Raises:

Type	Description
`ValueError`	If the estimator error list and the test error list differ in length.

See also

under_over_estimation: Computes separate statistics for overestimation and underestimation cases.

Examples:

>>> from timecave.validation_strategy_metrics import PAE, MC_metric
>>> true_errors = [10, 30, 10, 50];
>>> validation_errors = [20, 20, 50, 30];
>>> MC_metric(validation_errors, true_errors, PAE)
{'Mean': 5.0, 'Median': 0.0, '1st_Quartile': -12.5, '3rd_Quartile': 17.5, 'Minimum': -20.0, 'Maximum': 40.0, 'Standard_deviation': 22.9128784747792}

If the lengths of the estimated error and test error lists do not match, an exception is thrown:

>>> MC_metric(validation_errors, [10], PAE)
Traceback (most recent call last):
...
ValueError: The estimated error and test error lists must have the same length.

Source code in timecave/validation_strategy_metrics.py

def MC_metric(estimated_error_list: list[float | int], test_error_list: list[float | int], metric: callable) -> dict:

    """
    Compute validation strategy metrics for N different experiments (MC stands for Monte Carlo).

    This function processes the results of a Monte Carlo experiment and outputs a statistical summary of the results. \
    This can be useful if one needs to analyse the performance of a given validation method on several different time series or using different models.  
    Users may provide a custom metric if they so desire, but it must have the same function signature as the metrics provided by this package.

    Parameters
    ----------
    estimated_error_list : list[float  |  int]
        List of estimated (i.e. validation) errors, one for each experiment / trial.

    test_error_list : list[float  |  int]
        List of test errors, one for each experiment / trial.

    metric : callable
        Validation strategy metric.

    Returns
    -------
    dict
        A statistical summary of the results.

    Raises
    ------
    ValueError
        If the estimator error list and the test error list differ in length.

    See also
    --------
    [under_over_estimation](under_over.md):
        Computes separate statistics for overestimation and underestimation cases.

    Examples
    --------
    >>> from timecave.validation_strategy_metrics import PAE, MC_metric
    >>> true_errors = [10, 30, 10, 50];
    >>> validation_errors = [20, 20, 50, 30];
    >>> MC_metric(validation_errors, true_errors, PAE)
    {'Mean': 5.0, 'Median': 0.0, '1st_Quartile': -12.5, '3rd_Quartile': 17.5, 'Minimum': -20.0, 'Maximum': 40.0, 'Standard_deviation': 22.9128784747792}

    If the lengths of the estimated error and test error lists do not match, an exception is thrown:

    >>> MC_metric(validation_errors, [10], PAE)
    Traceback (most recent call last):
    ...
    ValueError: The estimated error and test error lists must have the same length.
    """

    if(len(estimated_error_list) != len(test_error_list)):

        raise ValueError("The estimated error and test error lists must have the same length.");

    metric_array = np.zeros(len(estimated_error_list));

    for ind, (val_error, test_error) in enumerate(zip(estimated_error_list, test_error_list)):

        metric_array[ind] = metric(val_error, test_error);

    mean = metric_array.mean();
    minimum = metric_array.min();
    maximum = metric_array.max();
    median = np.median(metric_array);
    Q1 = np.quantile(metric_array, 0.25);
    Q3 = np.quantile(metric_array, 0.75);
    std = metric_array.std();

    results = {"Mean": mean,
               "Median": median,
               "1st_Quartile": Q1,
               "3rd_Quartile": Q3,
               "Minimum": minimum,
               "Maximum": maximum,
               "Standard_deviation": std};

    return results;