What is StandardScaler?

Charan H U
2 min readAug 16, 2021

--

Standardization Formulas

Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well-behaved mean and standard deviation. You can still standardize your data if this expectation is not met, but you may not get reliable results.

Assume that you have a matrix X where each row/line is a sample/observation and each column is a variable/feature(this is the expected input for any sklearn ML function by the way X.shape should be [number_of_samples, number_of_features]).

Core of method: The main idea is to normalize/standardize i.e. μ = 0 and σ = 1 your features/variables/columns of X, individually, before applying any machine learning model.

StandardScaler() will normalize the features i.e. each column of X, INDIVIDUALLY, so that each column/feature/variable will have μ = 0 and σ=1.

Another […] technique is to calculate the statistical mean and standard deviation of the attribute values, subtract the mean from each value, and divide the result by the standard deviation. This process is called standardizing a statistical variable and results in a set of values whose mean is zero and standard deviation is one.”

Reference:- python — Can anyone explain me StandardScaler? — Stack Overflow

Example

StandardScaler() performs the task of Standardization. Usually a dataset contains variables that are different in scale. For e.g. an Employee dataset will contain AGE column with values on scale 20–70 and SALARY column with values on scale 10000–80000. As these two columns are different in scale, they are Standardized to have common scale while building machine learning model.

--

--