Python scikit learn pipelines (no transformation on features)

Yes, you can simply do

pipe = Pipeline(steps=[('clf', RandomForestClassifier())])

Also, if you had some custom base transformation you almost always wanted, and it also had certain hyperparameters or added functionality you could also do something like (somewhat lame example, but just for ideas..)

from sklearn.base import TransformerMixin

class Transform(TransformerMixin):
    def __init__(self, **kwargs):
        print(kwargs)
        self.hyperparam = kwargs

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        if self.hyperparam["square"]
            X = [x**2 for x in X]
        # elif "other_transform" in self.hyperparam
            # pass_val = self.hyperparam['other_transform']
            # X = other_transform(X, pass_val)
        return X  # default to no transform if no hyperparameter

pass_pipe = Pipeline(steps=[('do nothing', Transform()),
                            ('clf', RandomForestClassifier())])
square_pipe = Pipeline(steps=[('square', Transform(square=True)),
                              ('clf', RandomForestClassifier())])

The above is a mutually exclusive way to do transforms, i.e. one or the other. If, instead, you had a bunch of transforms and you wanted to apply them in a certain order, implementing callbacks would probably be the right way. Check out how that kind of thing is implemented in popular libraries like sklearn, pytorch, or fastai.

Read more here: Source link