under_over_estimation

`timecave.validation_strategy_metrics.under_over_estimation(estimated_error_list, test_error_list, metric)`

Compute separate validation strategy metrics for underestimation and overestimation instances (for N different experiments).

This function processes the results of a Monte Carlo experiment and outputs two separate sets of summary statistics: one for cases where the true error is underestimated, and another one for cases where the validation method overestimates the error. This can be useful if one needs to analyse the performance of a given validation method on several different time series or using different models.
Users may provide a custom metric if they so desire, but it must have the same function signature as the metrics provided by this package.

Parameters:

Name	Type	Description	Default
`estimated_error_list`	`list[float \| int]`	List of estimated (i.e. validation) errors, one for each experiment / trial.	required
`test_error_list`	`list[float \| int]`	List of test errors, one for each experiment / trial.	required
`metric`	`callable`	Validation strategy metric.	required

Returns:

Type	Description
`tuple[dict]`	[Separate] Statistical summaries for the overestimation and underestimation cases. The first dictionary is for the underestimation cases.

Raises:

Type	Description
`ValueError`	If the estimator error list and the test error list differ in length.

See also

MC_metric: Computes relevant statistics for the whole Monte Carlo experiment (i.e. does not differentiate between overestimation and underestimation).

Examples:

>>> from timecave.validation_strategy_metrics import under_over_estimation, PAE
>>> true_errors = [10, 30, 10, 50];
>>> validation_errors = [20, 20, 50, 30];
>>> under_over_estimation(validation_errors, true_errors, PAE)
({'Mean': -15.0, 'Median': -15.0, '1st_Quartile': -17.5, '3rd_Quartile': -12.5, 'Minimum': -20.0, 'Maximum': -10.0, 'Standard_deviation': 5.0, 'N': 2, '%': 50.0}, {'Mean': 25.0, 'Median': 25.0, '1st_Quartile': 17.5, '3rd_Quartile': 32.5, 'Minimum': 10.0, 'Maximum': 40.0, 'Standard_deviation': 15.0, 'N': 2, '%': 50.0})

If there are no overestimation or underestimation cases, the respective dictionary will be empty:

>>> under_over_estimation([10, 20, 30], [5, 10, 15], PAE)
No errors were underestimated. Underestimation data dictionary empty.
({}, {'Mean': 10.0, 'Median': 10.0, '1st_Quartile': 7.5, '3rd_Quartile': 12.5, 'Minimum': 5.0, 'Maximum': 15.0, 'Standard_deviation': 4.08248290463863, 'N': 3, '%': 100.0})

If the lengths of the estimated error and test error lists do not match, an exception is thrown:

>>> under_over_estimation(validation_errors, [10], PAE)
Traceback (most recent call last):
...
ValueError: The estimated error and test error lists must have the same length.

Source code in timecave/validation_strategy_metrics.py

def under_over_estimation(estimated_error_list: list[float | int], test_error_list: list[float | int], metric: callable) -> tuple[dict]:

    """
    Compute separate validation strategy metrics for underestimation and overestimation instances (for N different experiments).

    This function processes the results of a Monte Carlo experiment and outputs two separate
    sets of summary statistics: one for cases where the true error is underestimated, and another one for cases 
    where the validation method overestimates the error.
    This can be useful if one needs to analyse the performance of a given validation method on several different time series or using different models.  
    Users may provide a custom metric if they so desire, but it must have the same function signature as the metrics provided by this package.

    Parameters
    ----------
    estimated_error_list : list[float  |  int]
        List of estimated (i.e. validation) errors, one for each experiment / trial.

    test_error_list : list[float  |  int]
        List of test errors, one for each experiment / trial.

    metric : callable
        Validation strategy metric.

    Returns
    -------
    tuple[dict]
        [Separate] Statistical summaries for the overestimation and underestimation cases. \
        The first dictionary is for the underestimation cases.

    Raises
    ------
    ValueError
        If the estimator error list and the test error list differ in length.

    See also
    --------
    [MC_metric](MC_metric.md):
        Computes relevant statistics for the whole Monte Carlo experiment (i.e. does not differentiate between overestimation and underestimation).

    Examples
    --------
    >>> from timecave.validation_strategy_metrics import under_over_estimation, PAE
    >>> true_errors = [10, 30, 10, 50];
    >>> validation_errors = [20, 20, 50, 30];
    >>> under_over_estimation(validation_errors, true_errors, PAE)
    ({'Mean': -15.0, 'Median': -15.0, '1st_Quartile': -17.5, '3rd_Quartile': -12.5, 'Minimum': -20.0, 'Maximum': -10.0, 'Standard_deviation': 5.0, 'N': 2, '%': 50.0}, {'Mean': 25.0, 'Median': 25.0, '1st_Quartile': 17.5, '3rd_Quartile': 32.5, 'Minimum': 10.0, 'Maximum': 40.0, 'Standard_deviation': 15.0, 'N': 2, '%': 50.0})

    If there are no overestimation or underestimation cases, the respective dictionary will be empty:

    >>> under_over_estimation([10, 20, 30], [5, 10, 15], PAE)
    No errors were underestimated. Underestimation data dictionary empty.
    ({}, {'Mean': 10.0, 'Median': 10.0, '1st_Quartile': 7.5, '3rd_Quartile': 12.5, 'Minimum': 5.0, 'Maximum': 15.0, 'Standard_deviation': 4.08248290463863, 'N': 3, '%': 100.0})

    If the lengths of the estimated error and test error lists do not match, an exception is thrown:

    >>> under_over_estimation(validation_errors, [10], PAE)
    Traceback (most recent call last):
    ...
    ValueError: The estimated error and test error lists must have the same length.
    """

    if(len(estimated_error_list) != len(test_error_list)):

        raise ValueError("The estimated error and test error lists must have the same length.");

    estimated_errors = np.array(estimated_error_list);
    test_errors = np.array(test_error_list);

    under_est = estimated_errors[estimated_errors < test_errors].tolist();
    under_test = test_errors[estimated_errors < test_errors].tolist();
    over_est = estimated_errors[estimated_errors > test_errors].tolist();
    over_test = test_errors[estimated_errors > test_errors].tolist();

    if(len(under_est) > 0):

        under_estimation_stats = MC_metric(under_est, under_test, metric);
        under_estimation_stats["N"] = len(under_est);
        under_estimation_stats["%"] = np.round(len(under_est) / len(estimated_error_list) * 100, 2);

    else:

        under_estimation_stats = {};
        print("No errors were underestimated. Underestimation data dictionary empty.");

    if(len(over_est) > 0):

        over_estimation_stats = MC_metric(over_est, over_test, metric);
        over_estimation_stats["N"] = len(over_est);
        over_estimation_stats["%"] = np.round(len(over_est) / len(estimated_error_list) * 100, 2);

    else:

        over_estimation_stats = {};
        print("No errors were overestimated. Overestimation data dictionary empty.");

    return (under_estimation_stats, over_estimation_stats);