Markov Cross-validation method
timecave.validation_methods.markov
This module contains the Markov cross-validation method.
Classes:
| Name | Description |
|---|---|
MarkovCV |
Implements the Markov cross-validation method. |
MarkovCV(ts, p, seed=1)
Bases: BaseSplitter
Implements the Markov cross-validation method.
This class implements the Markov cross-validation method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ts |
ndarray | Series
|
Univariate time series. |
required |
p |
int
|
p-order autocorrelation. |
required |
seed |
int
|
Random seed. |
1
|
Attributes:
| Name | Type | Description |
|---|---|---|
n_splits |
int
|
The number of splits. |
sampling_freq |
int | float
|
The series' sampling frequency (Hz). |
Methods:
| Name | Description |
|---|---|
split |
Split the time series into training and validation sets. |
info |
Provide additional information on the validation method. |
statistics |
Compute relevant statistics for both training and validation sets. |
plot |
Plot the partitioned time series. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
TypeError
|
If |
ValueError
|
If |
Notes
The Markov cross-validation method partitions the data so that every partition can be regarded as a Markov Process. It uses the linear autocorrelation measure to ensure that the samples in both the training set and the validation set are neither too close nor too far apart.

For a thorough discussion of the method, see [1].
References
1
Gaoxia Jiang and Wenjian Wang. Markov cross-validation for time series model evaluations. Information Sciences, 375:219–233, 2017
Source code in timecave/validation_methods/markov.py
sampling_freq: int | float
property
Get the time series' sampling frequency.
This method can be used to access the time series' sampling frequency, in Hertz (this is set on intialisation). Since the method is implemented as a property, this information can simply be accessed as an attribute using dot notation.
Returns:
| Type | Description |
|---|---|
int | float
|
The time series' sampling frequency (Hz). |
info()
Provide some basic information on the training and validation sets.
This method displays the number of splits and the number of observations per set.
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.markov import MarkovCV
>>> ts = np.ones(10);
>>> splitter = MarkovCV(ts, p=2);
>>> splitter.info();
Markov CV method
---------------
Time series size: 10 samples
Number of splits: 6
Number of observations per set: 1 to 3
Source code in timecave/validation_methods/markov.py
plot(height, width)
Plot the partitioned time series.
This method allows the user to plot the partitioned time series. The training and validation sets are plotted using different colours.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height |
int
|
The figure's height. |
required |
width |
int
|
The figure's width. |
required |
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.markov import MarkovCV
>>> ts = np.ones(100);
>>> splitter = MarkovCV(ts, p=1);
>>> splitter.plot(10, 10);

Source code in timecave/validation_methods/markov.py
split()
Split the time series into training and validation sets.
This method splits the series' indices into disjoint sets containing the training and validation indices.
At every iteration, an array of training indices and another one containing the validation indices are generated.
Note that this method is a generator. To access the indices, use the next() method or a for loop.
Yields:
| Type | Description |
|---|---|
ndarray
|
Array of training indices. |
ndarray
|
Array of validation indices. |
float
|
Used for compatibility reasons. Irrelevant for this method. |
Examples:
>>> import numpy as np
>>> from timecave.validation_methods.markov import MarkovCV
>>> ts = np.ones(10);
>>> splitter = MarkovCV(ts, p=2);
>>> for ind, (train, val, _) in enumerate(splitter.split()):
...
... print(f"Iteration {ind+1}");
... print(f"Training set indices: {train}");
... print(f"Validation set indices: {val}");
Iteration 1
Training set indices: [8]
Validation set indices: [0 6]
Iteration 2
Training set indices: [0 6]
Validation set indices: [8]
Iteration 3
Training set indices: [3]
Validation set indices: [5]
Iteration 4
Training set indices: [5]
Validation set indices: [3]
Iteration 5
Training set indices: [1 2 7]
Validation set indices: [4 9]
Iteration 6
Training set indices: [4 9]
Validation set indices: [1 2 7]
Source code in timecave/validation_methods/markov.py
statistics()
Compute relevant statistics for both training and validation sets.
This method computes relevant time series features, such as mean, strength-of-trend, etc. for both the whole time series, the training set and the validation set. It can and should be used to ensure that the characteristics of both the training and validation sets are, statistically speaking, similar to those of the time series one wishes to forecast. If this is not the case, using the validation method will most likely lead to a poor assessment of the model's performance.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Relevant features for the entire time series. |
DataFrame
|
Relevant features for the training set. |
DataFrame
|
Relevant features for the validation set. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the time series is composed of less than three samples. |
ValueError
|
If the folds comprise less than two samples. |
Examples:
Frequency-domain features are not computed for the Markov CV method:
>>> import numpy as np
>>> from timecave.validation_methods.markov import MarkovCV
>>> ts = np.hstack((np.ones(5), np.zeros(5)));
>>> splitter = MarkovCV(ts, p=1);
>>> ts_stats, training_stats, validation_stats = splitter.statistics();
Frequency features are only meaningful if the correct sampling frequency is passed to the class.
>>> ts_stats
Mean Median Min Max Variance P2P_amplitude Trend_slope Strength_of_trend Mean_crossing_rate Median_crossing_rate
0 0.5 0.5 0.0 1.0 0.25 1.0 -0.151515 1.59099 0.111111 0.111111
>>> training_stats
Mean Median Min Max Variance P2P_amplitude Trend_slope Strength_of_trend Mean_crossing_rate Median_crossing_rate
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.0 inf 1.000000 1.000000
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.0 inf 1.000000 1.000000
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.0 inf 1.000000 1.000000
0 0.5 0.5 0.0 1.0 0.25 1.0 -0.4 1.06066 0.333333 0.333333
>>> validation_stats
Mean Median Min Max Variance P2P_amplitude Trend_slope Strength_of_trend Mean_crossing_rate Median_crossing_rate
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.0 inf 1.000000 1.000000
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.0 inf 1.000000 1.000000
0 0.5 0.5 0.0 1.0 0.25 1.0 -0.4 1.06066 0.333333 0.333333
0 0.5 0.5 0.0 1.0 0.25 1.0 -1.0 inf 1.000000 1.000000
Source code in timecave/validation_methods/markov.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 | |