"

Cross-validation iterators in scikit-learn are simply generator objects, that is, Python objects that implement the __iter__ method and that for each call to this method return (or more precisely, yield ) the indices or a boolean mask for the train and test set. Hence, implementing new cross-validation iterators that behave as the ones in scikit-learn is easy with this in mind. Here goes a small code snippet that implements a holdout cross-validator generator following the scikit-learn API.

import numpy as npfrom sklearn.utils import check_random_stateclass HoldOut:    '''    Hold-out cross-validator generator. In the hold-out, the    data is split only once into a train set and a test set.    Unlike in other cross-validation schemes, the hold-out    consists of only one iteration.    Parameters    ----------    n : total number of samples    test_size : 0 < float < 1        Fraction of samples to use as test set. Must be a        number between 0 and 1.    random_state : int        Seed for the random number generator.    '''    def __init__(self, n, test_size=0.2, random_state=0):        self.n = n        self.test_size = test_size        self.random_state = random_state    def __iter__(self):        n_test = int(np.ceil(self.test_size * self.n))        n_train = self.n - n_test        rng = check_random_state(self.random_state)        permutation = rng.permutation(self.n)        ind_test = permutation[:n_test]        ind_train = permutation[n_test:n_test + n_train]        yield ind_train, ind_test

Contrary to other cross-validation schemes, holdout relies on a single split of the data. It is well known than in practice holdout performs much worse than KFold or LeaveOneOut schemes. However, holdout has the advantage that its theoretical properties are easier to derive. For examples of this see e.g. Section 8.7 of Theory of classification: a survey of some recent advances and the very recent The reusable holdout .

"



    3           8