get_features

`timecave.data_characteristics.get_features(ts, fs)`

Compute time series features.

This function extracts features from a time series. The tsfel package is used to extract most features, and should be used if only these are required. The exceptions are the 'Strength of Trend', 'Mean-crossing rate', and 'Median-crossing rate' features, for which custom functions were developed (these were also made available to the user).

Parameters:

Name	Type	Description	Default
`ts`	`ndarray \| Series`	Univariate time series.	required
`fs`	`float \| int`	Sampling frequency (Hz).	required

Returns:

Type	Description
`DataFrame`	Data frame containing all time series features supported by this package.

Raises:

Type	Description
`TypeError`	If `ts` is neither an Numpy array nor a Pandas series.
`TypeError`	If `fs` is neither a float nor an integer.
`ValueError`	If `fs` is negative.

Examples:

>>> import numpy as np
>>> from timecave.data_characteristics import get_features
>>> t = np.arange(0, 10, 0.01);
>>> time_series = np.sin(2 * np.pi * t);
>>> sampling_frequency = 1 / 0.01;
>>> get_features(time_series, sampling_frequency)
           Mean        Median  Min  Max  Variance  P2P_amplitude  Trend_slope  Spectral_centroid  Spectral_rolloff  Spectral_entropy  Strength_of_trend  Mean_crossing_rate  Median_crossing_rate
0  3.552714e-18 -3.673940e-16 -1.0  1.0       0.5            2.0    -0.000191                1.0               1.0      6.485530e-29          15.926086             0.02002              0.019019

If the time series is neither an array nor a series, an exception is thrown:

>>> get_features([0, 1, 2], sampling_frequency)
Traceback (most recent call last):
...
TypeError: Time series must be either a Numpy array or a Pandas series.

The same happens if the sampling frequency is neither a float nor an integer:

>>> get_features(time_series, "Hello")
Traceback (most recent call last):
...
TypeError: The sampling frequency should be either a float or an integer.

A different exception is raised if the sampling frequency is negative:

>>> get_features(time_series, -1)
Traceback (most recent call last):
...
ValueError: The sampling frequency should be larger than zero.

Source code in timecave/data_characteristics.py

def get_features(ts: np.ndarray | pd.Series, fs: float | int) -> pd.DataFrame:

    """
    Compute time series features.

    This function extracts features from a time series. The tsfel package is used to extract most features,
    and should be used if only these are required. The exceptions are the 'Strength of Trend',
    'Mean-crossing rate', and 'Median-crossing rate' features, for which custom functions were
    developed (these were also made available to the user). 

    Parameters
    ----------
    ts : np.ndarray | pd.Series
        Univariate time series.

    fs : float | int
        Sampling frequency (Hz).

    Returns
    -------
    pd.DataFrame
        Data frame containing all time series features supported by this package.

    Raises
    ------
    TypeError
        If `ts` is neither an Numpy array nor a Pandas series.

    TypeError
        If `fs` is neither a float nor an integer.

    ValueError
        If `fs` is negative.

    Examples
    --------
    >>> import numpy as np
    >>> from timecave.data_characteristics import get_features
    >>> t = np.arange(0, 10, 0.01);
    >>> time_series = np.sin(2 * np.pi * t);
    >>> sampling_frequency = 1 / 0.01;
    >>> get_features(time_series, sampling_frequency)
               Mean        Median  Min  Max  Variance  P2P_amplitude  Trend_slope  Spectral_centroid  Spectral_rolloff  Spectral_entropy  Strength_of_trend  Mean_crossing_rate  Median_crossing_rate
    0  3.552714e-18 -3.673940e-16 -1.0  1.0       0.5            2.0    -0.000191                1.0               1.0      6.485530e-29          15.926086             0.02002              0.019019

    If the time series is neither an array nor a series, an exception is thrown:

    >>> get_features([0, 1, 2], sampling_frequency)
    Traceback (most recent call last):
    ...
    TypeError: Time series must be either a Numpy array or a Pandas series.

    The same happens if the sampling frequency is neither a float nor an integer:

    >>> get_features(time_series, "Hello")
    Traceback (most recent call last):
    ...
    TypeError: The sampling frequency should be either a float or an integer.

    A different exception is raised if the sampling frequency is negative:

    >>> get_features(time_series, -1)
    Traceback (most recent call last):
    ...
    ValueError: The sampling frequency should be larger than zero.
    """

    _check_type(ts);
    _check_sampling_rate(fs);

    #feature_list = ["0_Mean", "0_Median", "0_Min", "0_Max", "0_Variance", "0_Peak to peak distance"];

    #cfg = tsfel.get_features_by_domain("statistical");
    #stat_feat_df = tsfel.time_series_features_extractor(cfg, ts, fs);

    #relevant_feat_df = stat_feat_df[feature_list].copy();
    #new_names = [feat[2:] for feat in feature_list];
    #cols = {name: new_name for (name, new_name) in zip(feature_list, new_names)};
    #relevant_feat_df = relevant_feat_df.rename(columns=cols);
    #relevant_feat_df = relevant_feat_df.rename(columns={"Peak to peak distance": "P2P_amplitude"});

    mean = tsfel.calc_mean(ts);
    median = tsfel.calc_median(ts);
    minimum = tsfel.calc_min(ts);
    maximum = tsfel.calc_max(ts);
    variance = tsfel.calc_var(ts);
    p2p = tsfel.pk_pk_distance(ts);
    feature_list = [mean, median, minimum, maximum, variance, p2p];
    feature_names = ["Mean", "Median", "Min", "Max", "Variance", "P2P_amplitude"];

    relevant_feat_df = pd.DataFrame(data={name: [feat] for name, feat in zip(feature_names, feature_list)});
    relevant_feat_df["Trend_slope"] = tsfel.slope(ts);
    relevant_feat_df["Spectral_centroid"] = tsfel.spectral_centroid(ts, fs);
    relevant_feat_df["Spectral_rolloff"] = tsfel.spectral_roll_off(ts, fs);
    relevant_feat_df["Spectral_entropy"] = tsfel.spectral_entropy(ts, fs);
    relevant_feat_df["Strength_of_trend"] = strength_of_trend(ts);
    relevant_feat_df["Mean_crossing_rate"] = mean_crossing_rate(ts);
    relevant_feat_df["Median_crossing_rate"] = median_crossing_rate(ts);

    return relevant_feat_df;