models.model_spec

models.model_spec#

Module: `models.model_spec`#

Inheritance diagram for ISLP.models.model_spec:

digraph inheritance09a94ee187 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "models.model_spec.Contrast" [URL="#ISLP.models.model_spec.Contrast",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "sklearn.base.TransformerMixin" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "models.model_spec.Feature" [URL="#ISLP.models.model_spec.Feature",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An element in a model matrix that will build"]; "models.model_spec.ModelSpec" [URL="#ISLP.models.model_spec.ModelSpec",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Parameters"]; "sklearn.base.TransformerMixin" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Base class for all estimators in scikit-learn."]; "sklearn.base.TransformerMixin" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin class for all transformers in scikit-learn."]; }

Model specification#

This module defines the basic object to represent regression formula: ModelSpec.

Classes#

`Contrast`#

class ISLP.models.model_spec.Contrast(method='drop', drop_level=None)#

Bases: TransformerMixin, BaseEstimator

Methods

`fit`(X[, y])	Construct contrast of categorical variable for use in building a design matrix.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.

transform

__init__(method='drop', drop_level=None)#

Contrast encoding for categorical variables.

Parameters:

method[‘drop’, ‘sum’, None, callable]: If ‘drop’, then a column of the one-hot encoding will be dropped. If ‘sum’, then the sum of coefficients is constrained to sum to 1. If None, the full one-hot encoding is returned. Finally, if callable, then it should take the number of levels of the category as a single argument and return an appropriate contrast of the full one-hot encoding.
drop_levelstr (optional): If not None, this level of the category will be dropped if method==’drop’.

fit(X, y=None)#

Construct contrast of categorical variable for use in building a design matrix.

Parameters:

Xarray-like: X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

Returns:

Farray-like: Columns of design matrix implied by the categorical variable.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged

Returns:

selfestimator instance: Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)#

`Feature`#

class ISLP.models.model_spec.Feature(variables: tuple, name: str, encoder: Any, use_transform: bool = True, pure_columns: bool = False, override_encoder_colnames: bool = False)#

Bases: NamedTuple

An element in a model matrix that will build columns from an array-like X.

Attributes:

encoder: Alias for field number 2
name: Alias for field number 1
override_encoder_colnames: Alias for field number 5
pure_columns: Alias for field number 4
use_transform: Alias for field number 3
variables: Alias for field number 0

Methods

`count`(value, /)	Return number of occurrences of value.
`index`(value[, start, stop])	Return first index of value.

__init__(*args, **kwargs)#

count(value, /)#: Return number of occurrences of value.

encoder: Any#: Alias for field number 2

index(value, start=0, stop=sys.maxsize, /)#

Return first index of value.

Raises ValueError if the value is not present.

name: str#: Alias for field number 1

override_encoder_colnames: bool#: Alias for field number 5

pure_columns: bool#: Alias for field number 4

use_transform: bool#: Alias for field number 3

variables: tuple#: Alias for field number 0

`ModelSpec`#

class ISLP.models.model_spec.ModelSpec(terms=[], intercept=True, categorical_features=None, default_encoders={'categorical': Contrast(), 'ordinal': OrdinalEncoder()})#

Bases: TransformerMixin, BaseEstimator

Parameters:

termssequence (optional)

Sequence of sets whose elements are columns of X when fit. For pd.DataFrame these can be column names.

interceptbool (optional)

Include a column for intercept?

categorical_featuresarray-like of {bool, int} of shape (n_features)

or shape (n_categorical_features,), default=None.

Indicates the categorical features. Will be ignored if X is a pd.DataFrame or pd.Series.

None : no feature will be considered categorical for np.ndarray.
boolean array-like : boolean mask indicating categorical features.
integer array-like : integer indices indicating categorical features.

default_encodersdict

Dictionary whose keys are elements of terms and values are transforms to be applied to the associate columns in the model matrix by running the fit_transform method when fit is called and overwriting these values in the dictionary.

Attributes:

names

Methods

`build_sequence`(X[, anova_type])	Build implied sequence of submodels based on successively including more terms.
`build_submodel`(X, terms)	Build design on X after fitting.
`fit`(X[, y])	Construct parameters for orthogonal polynomials in the feature X.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X[, y])	Build design on X after fitting.

__init__(terms=[], intercept=True, categorical_features=None, default_encoders={'categorical': Contrast(), 'ordinal': OrdinalEncoder()})#

build_sequence(X, anova_type='sequential')#

Build implied sequence of submodels based on successively including more terms.

Parameters:

Xarray-like: X on which columns are evaluated.
anova_type: str: One of “sequential” or “drop”.

Returns:

modelsgenerator: Generator for sequence of models for ANOVA.

build_submodel(X, terms)#

Build design on X after fitting.

Parameters:

Xarray-like: X on which columns are evaluated.
terms[Feature]: Sequence of features

Returns:

Darray-like: Design matrix created with terms

fit(X, y=None)#

Construct parameters for orthogonal polynomials in the feature X.

Parameters:

Xarray-like: X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

property names#

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged

Returns:

selfestimator instance: Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X, y=None)#

Build design on X after fitting.

Parameters:

Xarray-like
yNone: Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Functions#

ISLP.models.model_spec.bs(col, intercept=False, name=None, **spline_args)#

Create a B-spline Feature for a given column.

Additional args and spline_args are passed to ISLP.transforms.BSpline along with intercept.

Parameters:

colcolumn identifier or Column
interceptbool: Include an intercept column.
namestr (optional): Defaults to one derived from col.

Returns:

varFeature

ISLP.models.model_spec.build_columns(column_info, X, var, encoders={}, col_cache={}, fit=False)#

Build columns for a Feature from X.

Parameters:

column_info: dict: Dictionary with values specifying sets of columns to be concatenated into a design matrix.
Xarray-like: X on which columns are evaluated.
varFeature: Feature whose columns will be built, typically a key in column_info.
encodersdict: Dict that stores encoder of each Feature.
col_cache: dict: Dict where columns will be stored – if var.name in col_cache then just returns those columns.
fitbool (optional): If True, then try to fit encoder. Will raise an error if encoder has already been fit.

ISLP.models.model_spec.build_model(column_info, X, terms, intercept=True, encoders={})#

Construct design matrix on a sequence of terms and X after fitting.

Parameters:

column_info: dict: Dictionary with values specifying sets of columns to be concatenated into a design matrix.
Xarray-like: X on which columns are evaluated.
terms[Feature]: Sequence of features
encodersdict: Dict that stores encoder of each Feature.

Returns:

dfnp.ndarray or pd.DataFrame: Design matrix.

ISLP.models.model_spec.contrast(col, method='drop', drop_level=None)#

Create encoding of categorical feature.

Parameters:

col: column identifier
method: ‘drop’, ‘sum’, None or callable
drop_level: level identifier

Returns:

varFeature

ISLP.models.model_spec.derived_feature(variables, encoder=None, name=None, use_transform=True)#

Create a Feature, optionally applying an encoder to the stacked columns.

Parameters:

variables[column identifier, Column, Feature]: Variables to apply transform to. Could be column identifiers or variables: all columns will be stacked before encoding.
namestr (optional): Defaults to str(encoder).
encodertransform-like (optional): Transform obeying sklearn fit/transform convention.

Returns:

varFeature

ISLP.models.model_spec.fit_encoder(encoders, var, X)#

Fit an encoder if not already registered in encoders.

Parameters:

encodersdict: Dictionary of encoders for each feature.
varFeature: Feature whose encoder will be fit.
Xarray-like: X on which encoder will be fit.

ISLP.models.model_spec.ns(col, intercept=False, name=None, **spline_args)#

Create a natural spline Feature for a given column.

Additional spline_args are passed to NaturalSpline along with intercept.

Parameters:

colcolumn identifier or Column
interceptbool: Include an intercept column.
namestr (optional): Defaults to one derived from col.

Returns:

varFeature

ISLP.models.model_spec.ordinal(col, name=None, *args, **kwargs)#

Create ordinal encoding of categorical feature.

Parameters:

col: column identifier

Returns:

varFeature

ISLP.models.model_spec.pca(variables, name, scale=False, **pca_args)#

Create PCA encoding of features from a sequence of variables.

Additional args and pca_args are passed to ISLP.transforms.PCA.

Parameters:

variables[column identifier, Column or Feature]: Sequence whose columns will be encoded by PCA.

Returns:

varFeature

ISLP.models.model_spec.poly(col, degree=1, intercept=False, raw=False, name=None)#

Create a polynomial Feature for a given column.

Additional args and kwargs are passed to Poly.

Parameters:

colcolumn identifier or Column: Column to transform.
degreeint, default=1: Degree of polynomial.
interceptbool, default=False: Include a column for intercept?
rawbool, default=False: If False, perform a QR decomposition on the resulting matrix of powers of centered and / or scaled features.
namestr (optional): Defaults to one derived from col.

Returns:

varFeature

models.model_spec

Contents

models.model_spec#

Module: models.model_spec#

Model specification#

Classes#

Contrast#

Feature#

ModelSpec#

Functions#

Module: `models.model_spec`#

`Contrast`#

`Feature`#

`ModelSpec`#