models.model_spec#

Module: models.model_spec#

Inheritance diagram for ISLP.models.model_spec:

digraph inheritance09a94ee187 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "models.model_spec.Contrast" [URL="#ISLP.models.model_spec.Contrast",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "sklearn.base.TransformerMixin" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "models.model_spec.Feature" [URL="#ISLP.models.model_spec.Feature",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An element in a model matrix that will build"]; "models.model_spec.ModelSpec" [URL="#ISLP.models.model_spec.ModelSpec",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Parameters"]; "sklearn.base.TransformerMixin" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Base class for all estimators in scikit-learn."]; "sklearn.base.TransformerMixin" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin class for all transformers in scikit-learn."]; }

Model specification#

This module defines the basic object to represent regression formula: ModelSpec.

Classes#

Contrast#

class ISLP.models.model_spec.Contrast(method='drop', drop_level=None)#

Bases: TransformerMixin, BaseEstimator

Methods

fit(X[, y])

Construct contrast of categorical variable for use in building a design matrix.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform

__init__(method='drop', drop_level=None)#

Contrast encoding for categorical variables.

Parameters:
method[‘drop’, ‘sum’, None, callable]

If ‘drop’, then a column of the one-hot encoding will be dropped. If ‘sum’, then the sum of coefficients is constrained to sum to 1. If None, the full one-hot encoding is returned. Finally, if callable, then it should take the number of levels of the category as a single argument and return an appropriate contrast of the full one-hot encoding.

drop_levelstr (optional)

If not None, this level of the category will be dropped if method==’drop’.

fit(X, y=None)#

Construct contrast of categorical variable for use in building a design matrix.

Parameters:
Xarray-like

X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

Returns:
Farray-like

Columns of design matrix implied by the categorical variable.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • None: Transform configuration is unchanged

Returns:
selfestimator instance

Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)#

Feature#

class ISLP.models.model_spec.Feature(variables: tuple, name: str, encoder: Any, use_transform: bool = True, pure_columns: bool = False, override_encoder_colnames: bool = False)#

Bases: NamedTuple

An element in a model matrix that will build columns from an array-like X.

Attributes:
encoder

Alias for field number 2

name

Alias for field number 1

override_encoder_colnames

Alias for field number 5

pure_columns

Alias for field number 4

use_transform

Alias for field number 3

variables

Alias for field number 0

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

__init__(*args, **kwargs)#
count(value, /)#

Return number of occurrences of value.

encoder: Any#

Alias for field number 2

index(value, start=0, stop=sys.maxsize, /)#

Return first index of value.

Raises ValueError if the value is not present.

name: str#

Alias for field number 1

override_encoder_colnames: bool#

Alias for field number 5

pure_columns: bool#

Alias for field number 4

use_transform: bool#

Alias for field number 3

variables: tuple#

Alias for field number 0

ModelSpec#

class ISLP.models.model_spec.ModelSpec(terms=[], intercept=True, categorical_features=None, default_encoders={'categorical': Contrast(), 'ordinal': OrdinalEncoder()})#

Bases: TransformerMixin, BaseEstimator

Parameters:
termssequence (optional)

Sequence of sets whose elements are columns of X when fit. For pd.DataFrame these can be column names.

interceptbool (optional)

Include a column for intercept?

categorical_featuresarray-like of {bool, int} of shape (n_features)

or shape (n_categorical_features,), default=None.

Indicates the categorical features. Will be ignored if X is a pd.DataFrame or pd.Series.

  • None : no feature will be considered categorical for np.ndarray.

  • boolean array-like : boolean mask indicating categorical features.

  • integer array-like : integer indices indicating categorical features.

default_encodersdict

Dictionary whose keys are elements of terms and values are transforms to be applied to the associate columns in the model matrix by running the fit_transform method when fit is called and overwriting these values in the dictionary.

Attributes:
names

Methods

build_sequence(X[, anova_type])

Build implied sequence of submodels based on successively including more terms.

build_submodel(X, terms)

Build design on X after fitting.

fit(X[, y])

Construct parameters for orthogonal polynomials in the feature X.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y])

Build design on X after fitting.

__init__(terms=[], intercept=True, categorical_features=None, default_encoders={'categorical': Contrast(), 'ordinal': OrdinalEncoder()})#
build_sequence(X, anova_type='sequential')#

Build implied sequence of submodels based on successively including more terms.

Parameters:
Xarray-like

X on which columns are evaluated.

anova_type: str

One of “sequential” or “drop”.

Returns:
modelsgenerator

Generator for sequence of models for ANOVA.

build_submodel(X, terms)#

Build design on X after fitting.

Parameters:
Xarray-like

X on which columns are evaluated.

terms[Feature]

Sequence of features

Returns:
Darray-like

Design matrix created with terms

fit(X, y=None)#

Construct parameters for orthogonal polynomials in the feature X.

Parameters:
Xarray-like

X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

property names#
set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • None: Transform configuration is unchanged

Returns:
selfestimator instance

Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X, y=None)#

Build design on X after fitting.

Parameters:
Xarray-like
yNone

Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Functions#

ISLP.models.model_spec.bs(col, intercept=False, name=None, **spline_args)#

Create a B-spline Feature for a given column.

Additional args and spline_args are passed to ISLP.transforms.BSpline along with intercept.

Parameters:
colcolumn identifier or Column
interceptbool

Include an intercept column.

namestr (optional)

Defaults to one derived from col.

Returns:
varFeature
ISLP.models.model_spec.build_columns(column_info, X, var, encoders={}, col_cache={}, fit=False)#

Build columns for a Feature from X.

Parameters:
column_info: dict

Dictionary with values specifying sets of columns to be concatenated into a design matrix.

Xarray-like

X on which columns are evaluated.

varFeature

Feature whose columns will be built, typically a key in column_info.

encodersdict

Dict that stores encoder of each Feature.

col_cache: dict

Dict where columns will be stored – if var.name in col_cache then just returns those columns.

fitbool (optional)

If True, then try to fit encoder. Will raise an error if encoder has already been fit.

ISLP.models.model_spec.build_model(column_info, X, terms, intercept=True, encoders={})#

Construct design matrix on a sequence of terms and X after fitting.

Parameters:
column_info: dict

Dictionary with values specifying sets of columns to be concatenated into a design matrix.

Xarray-like

X on which columns are evaluated.

terms[Feature]

Sequence of features

encodersdict

Dict that stores encoder of each Feature.

Returns:
dfnp.ndarray or pd.DataFrame

Design matrix.

ISLP.models.model_spec.contrast(col, method='drop', drop_level=None)#

Create encoding of categorical feature.

Parameters:
col: column identifier
method: ‘drop’, ‘sum’, None or callable
drop_level: level identifier
Returns:
varFeature
ISLP.models.model_spec.derived_feature(variables, encoder=None, name=None, use_transform=True)#

Create a Feature, optionally applying an encoder to the stacked columns.

Parameters:
variables[column identifier, Column, Feature]

Variables to apply transform to. Could be column identifiers or variables: all columns will be stacked before encoding.

namestr (optional)

Defaults to str(encoder).

encodertransform-like (optional)

Transform obeying sklearn fit/transform convention.

Returns:
varFeature
ISLP.models.model_spec.fit_encoder(encoders, var, X)#

Fit an encoder if not already registered in encoders.

Parameters:
encodersdict

Dictionary of encoders for each feature.

varFeature

Feature whose encoder will be fit.

Xarray-like

X on which encoder will be fit.

ISLP.models.model_spec.ns(col, intercept=False, name=None, **spline_args)#

Create a natural spline Feature for a given column.

Additional spline_args are passed to NaturalSpline along with intercept.

Parameters:
colcolumn identifier or Column
interceptbool

Include an intercept column.

namestr (optional)

Defaults to one derived from col.

Returns:
varFeature
ISLP.models.model_spec.ordinal(col, name=None, *args, **kwargs)#

Create ordinal encoding of categorical feature.

Parameters:
col: column identifier
Returns:
varFeature
ISLP.models.model_spec.pca(variables, name, scale=False, **pca_args)#

Create PCA encoding of features from a sequence of variables.

Additional args and pca_args are passed to ISLP.transforms.PCA.

Parameters:
variables[column identifier, Column or Feature]

Sequence whose columns will be encoded by PCA.

Returns:
varFeature
ISLP.models.model_spec.poly(col, degree=1, intercept=False, raw=False, name=None)#

Create a polynomial Feature for a given column.

Additional args and kwargs are passed to Poly.

Parameters:
colcolumn identifier or Column

Column to transform.

degreeint, default=1

Degree of polynomial.

interceptbool, default=False

Include a column for intercept?

rawbool, default=False

If False, perform a QR decomposition on the resulting matrix of powers of centered and / or scaled features.

namestr (optional)

Defaults to one derived from col.

Returns:
varFeature