# Polynomial features#

The modelling tools included in `ISLP`

allow for
construction of orthogonal polynomials of features.

Force rebuild

```
import numpy as np
from ISLP import load_data
from ISLP.models import ModelSpec, poly
```

```
Carseats = load_data('Carseats')
Carseats.columns
```

```
Index(['Sales', 'CompPrice', 'Income', 'Advertising', 'Population', 'Price',
'ShelveLoc', 'Age', 'Education', 'Urban', 'US'],
dtype='object')
```

Let’s make a term representing a quartic effect for `Population`

.

```
quartic = poly('Population', 4)
```

The object `quartic`

does not refer to any data yet, it must be included in a `ModelSpec`

object
and fit using the `fit`

method.

```
design = ModelSpec([quartic], intercept=False)
ISLP_features = design.fit_transform(Carseats)
ISLP_features.columns
```

```
Index(['poly(Population, degree=4)[0]', 'poly(Population, degree=4)[1]',
'poly(Population, degree=4)[2]', 'poly(Population, degree=4)[3]'],
dtype='object')
```

## Compare to `R`

#

We can compare our polynomials to a similar function in `R`

```
%load_ext rpy2.ipython
```

We’ll recompute these features using `poly`

in `R`

.

```
%%R -i Carseats -o R_features
R_features = poly(Carseats$Population, 4)
```

```
In addition: Warning message:
In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :
libraries ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ contain no packages
```

```
np.linalg.norm(ISLP_features - R_features)
```

```
6.611410814977955e-15
```

## Underlying model#

If we look at `quartic`

, we see it is a `Feature`

, i.e. it can be used to produce a set of columns
in a design matrix when it is a term used in creating the `ModelSpec`

.

Its encoder is `Poly(degree=4)`

. This is a special `sklearn`

transform that expects a single column
in its `fit()`

method and constructs a matrix of corresponding orthogonal polynomials.

The spline helpers `ns`

and `bs`

as well as `pca`

follow a similar structure.

```
quartic
```

```
Feature(variables=('Population',), name='poly(Population, degree=4)', encoder=Poly(degree=4), use_transform=True, pure_columns=False, override_encoder_colnames=True)
```

## Raw polynomials#

One can compute raw polynomials (which results in a less well-conditioned design matrix) of course.

```
quartic_raw = poly('Population', degree=4, raw=True)
```

Let’s compare the features again.

```
design = ModelSpec([quartic_raw], intercept=False)
raw_features = design.fit_transform(Carseats)
```

```
%%R -i Carseats -o R_features
R_features = poly(Carseats$Population, 4, raw=TRUE)
```

```
np.linalg.norm(raw_features - R_features)
```

```
4.279987406903757e-06
```

## Intercept#

Looking at `py_features`

we see it contains columns: `[Population**i for i in range(1, 4)]`

. That is,
it doesn’t contain an intercept, the order 0 term. This can be include with `intercept=True`

```
quartic_int = poly('Population', degree=4, raw=True, intercept=True)
design = ModelSpec([quartic_int], intercept=False)
intercept_features = design.fit_transform(Carseats)
```

```
np.linalg.norm(intercept_features.iloc[:,1:] - R_features)
```

```
4.279987406903757e-06
```