Rolling Window method
timecave.validation_methods.prequential.RollingWindow(splits, ts, fs=1, gap=0, weight_function=constant_weights, params=None)
Bases: BaseSplitter
Implements every variant of the Rolling Window method.
This class implements the Rolling Window method. It also supports every variant of this method, including Gap Rolling Window and
Weighted Rolling Window. The gap parameter can be used to implement the former, while the weight_function argument allows the user
to implement the latter in a convenient way.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
splits |
int
|
The number of folds used to partition the data. |
required |
ts |
ndarray | Series
|
Univariate time series. |
required |
fs |
float | int
|
Sampling frequency (Hz). |
1
|
gap |
int
|
Number of folds separating the validation set from the training set. If this value is set to zero, the validation set will be adjacent to the training set. |
0
|
weight_function |
callable
|
Fold weighting function. Check the weights module for more details. |
constant_weights
|
params |
dict
|
Parameters to be passed to the weighting functions. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
n_splits |
int
|
The number of splits. |
sampling_freq |
int | float
|
The series' sampling frequency (Hz). |
Methods:
| Name | Description |
|---|---|
split |
Split the time series into training and validation sets. |
info |
Provide additional information on the validation method. |
statistics |
Compute relevant statistics for both training and validation sets. |
plot |
Plot the partitioned time series. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If |
ValueError
|
If |
See also
Growing Window: Similar to Rolling Window, but the training set size gradually increases.
Notes
The Rolling Window method splits the data into \(N\) different folds. Then, in every iteration \(i\), the model is trained on data from the \(i^{th}\) fold and validated on the \(i+1^{th}\) fold (assuming no gap is specified). The average error on the validation sets is then taken as the estimate of the model's true error. This method preserves the temporal order of observations, as the training set always precedes the validation set. If a gap is specified, the procedure runs for \(N-1-N_{gap}\) iterations, where \(N_{gap}\) is the number of folds separating the training and validation sets.

Note that, even though the size of the training set is kept constant throughout the validation procedure, the models from the last iterations are trained on more recent data. It is therefore reasonable to assume that these models will have an advantage over the ones trained on older data, yielding a less biased estimate of the model's true error. To address this issue, one may use a weighted average to compute the final estimate of the error, with larger weights being assigned to the estimates obtained using models trained on more recent data. For more details on this method, the reader should refer to [1].
References
1
Vitor Cerqueira, Luis Torgo, and Igor Mozetiˇc. Evaluating time series forecasting models: An empirical study on performance estimation methods. Machine Learning, 109(11):1997–2028, 2020.
Source code in timecave/validation_methods/prequential.py
info()
Provide some basic information on the training and validation sets.
This method displays the number of splits, the fold size, the gap, and the weights that will be used to compute the error estimate.
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.prequential import RollingWindow
>>> ts = np.ones(10);
>>> splitter = RollingWindow(5, ts);
>>> splitter.info();
Rolling Window method
---------------------
Time series size: 10 samples
Number of splits: 5
Fold size: 2 to 2 samples (20.0 to 20.0 %)
Gap: 0
Weights: [1. 1. 1. 1.]
Source code in timecave/validation_methods/prequential.py
plot(height, width)
Plot the partitioned time series.
This method allows the user to plot the partitioned time series. The training and validation sets are plotted using different colours.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height |
int
|
The figure's height. |
required |
width |
int
|
The figure's width. |
required |
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.prequential import RollingWindow
>>> ts = np.ones(100);
>>> splitter = RollingWindow(5, ts);
>>> splitter.plot(10, 10);

Source code in timecave/validation_methods/prequential.py
split()
Split the time series into training and validation sets.
This method splits the series' indices into disjoint sets containing the training and validation indices.
At every iteration, an array of training indices and another one containing the validation indices are generated.
Note that this method is a generator. To access the indices, use the next() method or a for loop.
Yields:
| Type | Description |
|---|---|
ndarray
|
Array of training indices. |
ndarray
|
Array of validation indices. |
float
|
Weight assigned to the error estimate. |
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.prequential import RollingWindow
>>> ts = np.ones(10);
>>> splitter = RollingWindow(5, ts); # Split the data into 5 different folds
>>> for ind, (train, val, _) in enumerate(splitter.split()):
...
... print(f"Iteration {ind+1}");
... print(f"Training set indices: {train}");
... print(f"Validation set indices: {val}");
Iteration 1
Training set indices: [0 1]
Validation set indices: [2 3]
Iteration 2
Training set indices: [2 3]
Validation set indices: [4 5]
Iteration 3
Training set indices: [4 5]
Validation set indices: [6 7]
Iteration 4
Training set indices: [6 7]
Validation set indices: [8 9]
If the number of samples is not divisible by the number of folds, the first folds will contain more samples:
>>> ts2 = np.ones(17);
>>> splitter = RollingWindow(5, ts2);
>>> for ind, (train, val, _) in enumerate(splitter.split()):
...
... print(f"Iteration {ind+1}");
... print(f"Training set indices: {train}");
... print(f"Validation set indices: {val}");
Iteration 1
Training set indices: [0 1 2 3]
Validation set indices: [4 5 6 7]
Iteration 2
Training set indices: [4 5 6 7]
Validation set indices: [ 8 9 10]
Iteration 3
Training set indices: [ 8 9 10]
Validation set indices: [11 12 13]
Iteration 4
Training set indices: [11 12 13]
Validation set indices: [14 15 16]
If a gap is specified (Gap Rolling Window), the validation set will no longer be adjacent to the training set. Keep in mind that, the larger the gap between these two sets, the fewer iterations are run:
>>> splitter = RollingWindow(5, ts, gap=1);
>>> for ind, (train, val, _) in enumerate(splitter.split()):
...
... print(f"Iteration {ind+1}");
... print(f"Training set indices: {train}");
... print(f"Validation set indices: {val}");
Iteration 1
Training set indices: [0 1]
Validation set indices: [4 5]
Iteration 2
Training set indices: [2 3]
Validation set indices: [6 7]
Iteration 3
Training set indices: [4 5]
Validation set indices: [8 9]
Weights can be assigned to the error estimates (Weighted Rolling Window method). The parameters for the weighting functions must be passed to the class constructor:
>>> from timecave.validation_methods.weights import exponential_weights
>>> splitter = RollingWindow(5, ts, weight_function=exponential_weights, params={"base": 2});
>>> for ind, (train, val, weight) in enumerate(splitter.split()):
...
... print(f"Iteration {ind+1}");
... print(f"Training set indices: {train}");
... print(f"Validation set indices: {val}");
... print(f"Weight: {np.round(weight, 3)}");
Iteration 1
Training set indices: [0 1]
Validation set indices: [2 3]
Weight: 0.067
Iteration 2
Training set indices: [2 3]
Validation set indices: [4 5]
Weight: 0.133
Iteration 3
Training set indices: [4 5]
Validation set indices: [6 7]
Weight: 0.267
Iteration 4
Training set indices: [6 7]
Validation set indices: [8 9]
Weight: 0.533
Source code in timecave/validation_methods/prequential.py
712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 | |
statistics()
Compute relevant statistics for both training and validation sets.
This method computes relevant time series features, such as mean, strength-of-trend, etc. for both the whole time series, the training set and the validation set. It can and should be used to ensure that the characteristics of both the training and validation sets are, statistically speaking, similar to those of the time series one wishes to forecast. If this is not the case, using the validation method will most likely lead to a poor assessment of the model's performance.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Relevant features for the entire time series. |
DataFrame
|
Relevant features for the training set. |
DataFrame
|
Relevant features for the validation set. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the time series is composed of less than three samples. |
ValueError
|
If the folds comprise less than two samples. |
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.prequential import RollingWindow
>>> ts = np.hstack((np.ones(5), np.zeros(5)));
>>> splitter = RollingWindow(5, ts);
>>> ts_stats, training_stats, validation_stats = splitter.statistics();
Frequency features are only meaningful if the correct sampling frequency is passed to the class.
>>> ts_stats
Mean Median Min Max Variance P2P_amplitude Trend_slope Spectral_centroid Spectral_rolloff Spectral_entropy Strength_of_trend Mean_crossing_rate Median_crossing_rate
0 0.5 0.5 0.0 1.0 0.25 1.0 -0.151515 0.114058 0.5 0.38717 1.59099 0.111111 0.111111
>>> training_stats
Mean Median Min Max Variance P2P_amplitude Trend_slope Spectral_centroid Spectral_rolloff Spectral_entropy Strength_of_trend Mean_crossing_rate Median_crossing_rate
0 1.0 1.0 1.0 1.0 0.00 0.0 -7.850462e-17 0.00 0.0 0.0 inf 0.0 0.0
0 1.0 1.0 1.0 1.0 0.00 0.0 -7.850462e-17 0.00 0.0 0.0 inf 0.0 0.0
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.000000e+00 0.25 0.5 0.0 inf 1.0 1.0
0 0.0 0.0 0.0 0.0 0.00 0.0 0.000000e+00 0.00 0.0 0.0 inf 0.0 0.0
>>> validation_stats
Mean Median Min Max Variance P2P_amplitude Trend_slope Spectral_centroid Spectral_rolloff Spectral_entropy Strength_of_trend Mean_crossing_rate Median_crossing_rate
0 1.0 1.0 1.0 1.0 0.00 0.0 -7.850462e-17 0.00 0.0 0.0 inf 0.0 0.0
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.000000e+00 0.25 0.5 0.0 inf 1.0 1.0
0 0.0 0.0 0.0 0.0 0.00 0.0 0.000000e+00 0.00 0.0 0.0 inf 0.0 0.0
0 0.0 0.0 0.0 0.0 0.00 0.0 0.000000e+00 0.00 0.0 0.0 inf 0.0 0.0
Source code in timecave/validation_methods/prequential.py
889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 | |