models.model_spec#
Module: models.model_spec#
Inheritance diagram for ISLP.models.model_spec:
Model specification#
This module defines the basic object to represent regression formula: ModelSpec.
Classes#
Contrast#
- class ISLP.models.model_spec.Contrast(method='drop', drop_level=None)#
Bases:
TransformerMixin,BaseEstimatorMethods
fit(X[, y])Construct contrast of categorical variable for use in building a design matrix.
fit_transform(X[, y])Fit to data, then transform it.
get_params([deep])Get parameters for this estimator.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform
- __init__(method='drop', drop_level=None)#
Contrast encoding for categorical variables.
- Parameters:
- method[‘drop’, ‘sum’, None, callable]
If ‘drop’, then a column of the one-hot encoding will be dropped. If ‘sum’, then the sum of coefficients is constrained to sum to 1. If None, the full one-hot encoding is returned. Finally, if callable, then it should take the number of levels of the category as a single argument and return an appropriate contrast of the full one-hot encoding.
- drop_levelstr (optional)
If not None, this level of the category will be dropped if method==’drop’.
- fit(X, y=None)#
Construct contrast of categorical variable for use in building a design matrix.
- Parameters:
- Xarray-like
X on which model matrix will be evaluated. If a
pd.DataFrameorpd.Series, variables that are of categorical dtype will be treated as categorical.
- Returns:
- Farray-like
Columns of design matrix implied by the categorical variable.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)#
Feature#
- class ISLP.models.model_spec.Feature(variables: tuple, name: str, encoder: Any, use_transform: bool = True, pure_columns: bool = False, override_encoder_colnames: bool = False)#
Bases:
NamedTupleAn element in a model matrix that will build columns from an array-like X.
- Attributes:
- encoder
Alias for field number 2
- name
Alias for field number 1
- override_encoder_colnames
Alias for field number 5
- pure_columns
Alias for field number 4
- use_transform
Alias for field number 3
- variables
Alias for field number 0
Methods
count(value, /)Return number of occurrences of value.
index(value[, start, stop])Return first index of value.
- __init__(*args, **kwargs)#
- count(value, /)#
Return number of occurrences of value.
- index(value, start=0, stop=sys.maxsize, /)#
Return first index of value.
Raises ValueError if the value is not present.
ModelSpec#
- class ISLP.models.model_spec.ModelSpec(terms=[], intercept=True, categorical_features=None, default_encoders={'categorical': Contrast(), 'ordinal': OrdinalEncoder()})#
Bases:
TransformerMixin,BaseEstimator- Parameters:
- termssequence (optional)
Sequence of sets whose elements are columns of X when fit. For
pd.DataFramethese can be column names.- interceptbool (optional)
Include a column for intercept?
- categorical_featuresarray-like of {bool, int} of shape (n_features)
or shape (n_categorical_features,), default=None.
Indicates the categorical features. Will be ignored if X is a
pd.DataFrameorpd.Series.None : no feature will be considered categorical for
np.ndarray.boolean array-like : boolean mask indicating categorical features.
integer array-like : integer indices indicating categorical features.
- default_encodersdict
Dictionary whose keys are elements of terms and values are transforms to be applied to the associate columns in the model matrix by running the fit_transform method when fit is called and overwriting these values in the dictionary.
- Attributes:
- names
Methods
build_sequence(X[, anova_type])Build implied sequence of submodels based on successively including more terms.
build_submodel(X, terms)Build design on X after fitting.
fit(X[, y])Construct parameters for orthogonal polynomials in the feature X.
fit_transform(X[, y])Fit to data, then transform it.
get_params([deep])Get parameters for this estimator.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X[, y])Build design on X after fitting.
- __init__(terms=[], intercept=True, categorical_features=None, default_encoders={'categorical': Contrast(), 'ordinal': OrdinalEncoder()})#
- build_sequence(X, anova_type='sequential')#
Build implied sequence of submodels based on successively including more terms.
- Parameters:
- Xarray-like
X on which columns are evaluated.
- anova_type: str
One of “sequential” or “drop”.
- Returns:
- modelsgenerator
Generator for sequence of models for ANOVA.
- build_submodel(X, terms)#
Build design on X after fitting.
- Parameters:
- Xarray-like
X on which columns are evaluated.
- terms[Feature]
Sequence of features
- Returns:
- Darray-like
Design matrix created with terms
- fit(X, y=None)#
Construct parameters for orthogonal polynomials in the feature X.
- Parameters:
- Xarray-like
X on which model matrix will be evaluated. If a
pd.DataFrameorpd.Series, variables that are of categorical dtype will be treated as categorical.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- property names#
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X, y=None)#
Build design on X after fitting.
- Parameters:
- Xarray-like
- yNone
Ignored. This parameter exists only for compatibility with
sklearn.pipeline.Pipeline.
Functions#
- ISLP.models.model_spec.bs(col, intercept=False, name=None, **spline_args)#
Create a B-spline Feature for a given column.
Additional args and spline_args are passed to
ISLP.transforms.BSplinealong with intercept.- Parameters:
- colcolumn identifier or Column
- interceptbool
Include an intercept column.
- namestr (optional)
Defaults to one derived from col.
- Returns:
- varFeature
- ISLP.models.model_spec.build_columns(column_info, X, var, encoders={}, col_cache={}, fit=False)#
Build columns for a Feature from X.
- Parameters:
- column_info: dict
Dictionary with values specifying sets of columns to be concatenated into a design matrix.
- Xarray-like
X on which columns are evaluated.
- varFeature
Feature whose columns will be built, typically a key in column_info.
- encodersdict
Dict that stores encoder of each Feature.
- col_cache: dict
Dict where columns will be stored – if var.name in col_cache then just returns those columns.
- fitbool (optional)
If True, then try to fit encoder. Will raise an error if encoder has already been fit.
- ISLP.models.model_spec.build_model(column_info, X, terms, intercept=True, encoders={})#
Construct design matrix on a sequence of terms and X after fitting.
- Parameters:
- column_info: dict
Dictionary with values specifying sets of columns to be concatenated into a design matrix.
- Xarray-like
X on which columns are evaluated.
- terms[Feature]
Sequence of features
- encodersdict
Dict that stores encoder of each Feature.
- Returns:
- dfnp.ndarray or pd.DataFrame
Design matrix.
- ISLP.models.model_spec.contrast(col, method='drop', drop_level=None)#
Create encoding of categorical feature.
- Parameters:
- col: column identifier
- method: ‘drop’, ‘sum’, None or callable
- drop_level: level identifier
- Returns:
- varFeature
- ISLP.models.model_spec.derived_feature(variables, encoder=None, name=None, use_transform=True)#
Create a Feature, optionally applying an encoder to the stacked columns.
- Parameters:
- variables[column identifier, Column, Feature]
Variables to apply transform to. Could be column identifiers or variables: all columns will be stacked before encoding.
- namestr (optional)
Defaults to str(encoder).
- encodertransform-like (optional)
Transform obeying sklearn fit/transform convention.
- Returns:
- varFeature
- ISLP.models.model_spec.fit_encoder(encoders, var, X)#
Fit an encoder if not already registered in encoders.
- Parameters:
- encodersdict
Dictionary of encoders for each feature.
- varFeature
Feature whose encoder will be fit.
- Xarray-like
X on which encoder will be fit.
- ISLP.models.model_spec.ns(col, intercept=False, name=None, **spline_args)#
Create a natural spline Feature for a given column.
Additional spline_args are passed to
NaturalSplinealong with intercept.- Parameters:
- colcolumn identifier or Column
- interceptbool
Include an intercept column.
- namestr (optional)
Defaults to one derived from col.
- Returns:
- varFeature
- ISLP.models.model_spec.ordinal(col, name=None, *args, **kwargs)#
Create ordinal encoding of categorical feature.
- Parameters:
- col: column identifier
- Returns:
- varFeature
- ISLP.models.model_spec.pca(variables, name, scale=False, **pca_args)#
Create PCA encoding of features from a sequence of variables.
Additional args and pca_args are passed to
ISLP.transforms.PCA.- Parameters:
- variables[column identifier, Column or Feature]
Sequence whose columns will be encoded by PCA.
- Returns:
- varFeature
- ISLP.models.model_spec.poly(col, degree=1, intercept=False, raw=False, name=None)#
Create a polynomial Feature for a given column.
Additional args and kwargs are passed to Poly.
- Parameters:
- colcolumn identifier or Column
Column to transform.
- degreeint, default=1
Degree of polynomial.
- interceptbool, default=False
Include a column for intercept?
- rawbool, default=False
If False, perform a QR decomposition on the resulting matrix of powers of centered and / or scaled features.
- namestr (optional)
Defaults to one derived from col.
- Returns:
- varFeature