sklearn.impute.ChainedImputer

class sklearn.impute.ChainedImputer(missing_values=nan, imputation_order='ascending', n_imputations=100, n_burn_in=10, predictor=None, n_nearest_features=None, initial_strategy='mean', min_value=None, max_value=None, verbose=False, random_state=None)[source]

Chained imputer transformer to impute missing values.

Basic implementation of chained imputer from MICE (Multivariate Imputations by Chained Equations) package from R. This version assumes all of the features are Gaussian.

Read more in the User Guide.

Parameters:
missing_values : int, np.nan, optional (default=np.nan)

The placeholder for the missing values. All occurrences of missing_values will be imputed.

imputation_order : str, optional (default=”ascending”)

The order in which the features will be imputed. Possible values:

“ascending”

From features with fewest missing values to most.

“descending”

From features with most missing values to fewest.

“roman”

Left to right.

“arabic”

Right to left.

“random”

A random order for each round.

n_imputations : int, optional (default=100)

Number of chained imputation rounds to perform, the results of which will be used in the final average.

n_burn_in : int, optional (default=10)

Number of initial imputation rounds to perform the results of which will not be returned.

predictor : estimator object, default=BayesianRidge()

The predictor to use at each step of the round-robin imputation. It must support return_std in its predict method.

n_nearest_features : int, optional (default=None)

Number of other features to use to estimate the missing values of the each feature column. Nearness between features is measured using the absolute correlation coefficient between each feature pair (after initial imputation). Can provide significant speed-up when the number of features is huge. If None, all features will be used.

initial_strategy : str, optional (default=”mean”)

Which strategy to use to initialize the missing values. Same as the strategy parameter in sklearn.impute.SimpleImputer Valid values: {“mean”, “median”, “most_frequent”, or “constant”}.

min_value : float, optional (default=None)

Minimum possible imputed value. Default of None will set minimum to negative infinity.

max_value : float, optional (default=None)

Maximum possible imputed value. Default of None will set maximum to positive infinity.

verbose : int, optional (default=0)

Verbosity flag, controls the debug messages that are issued as functions are evaluated. The higher, the more verbose. Can be 0, 1, or 2.

random_state : int, RandomState instance or None, optional (default=None)

The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes:
initial_imputer_ : object of class sklearn.preprocessing.Imputer

The imputer used to initialize the missing values.

imputation_sequence_ : list of tuples

Each tuple has (feat_idx, neighbor_feat_idx, predictor), where feat_idx is the current feature to be imputed, neighbor_feat_idx is the array of other features used to impute the current feature, and predictor is the trained predictor used for the imputation.

Notes

The R version of MICE does not have inductive functionality, i.e. first fitting on X_train and then transforming any X_test without additional fitting. We do this by storing each feature’s predictor during the round-robin fit phase, and predicting without refitting (in order) during the transform phase.

Features which contain all missing values at fit are discarded upon transform.

Features with missing values in transform which did not have any missing values in fit will be imputed with the initial imputation method only.

References

[1]Stef van Buuren, Karin Groothuis-Oudshoorn (2011). “mice: Multivariate Imputation by Chained Equations in R”. Journal of Statistical Software 45: 1-67.

Methods

fit(X[, y]) Fits the imputer on X and return self.
fit_transform(X[, y]) Fits the imputer on X and return the transformed X.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Imputes all missing values in X.
__init__(missing_values=nan, imputation_order='ascending', n_imputations=100, n_burn_in=10, predictor=None, n_nearest_features=None, initial_strategy='mean', min_value=None, max_value=None, verbose=False, random_state=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Fits the imputer on X and return self.

Parameters:
X : array-like, shape (n_samples, n_features)

Input data, where “n_samples” is the number of samples and “n_features” is the number of features.

y : ignored
Returns:
self : object

Returns self.

fit_transform(X, y=None)[source]

Fits the imputer on X and return the transformed X.

Parameters:
X : array-like, shape (n_samples, n_features)

Input data, where “n_samples” is the number of samples and “n_features” is the number of features.

y : ignored.
Returns:
Xt : array-like, shape (n_samples, n_features)

The imputed input data.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X)[source]

Imputes all missing values in X.

Note that this is stochastic, and that if random_state is not fixed, repeated calls, or permuted input, will yield different results.

Parameters:
X : array-like, shape = [n_samples, n_features]

The input data to complete.

Returns:
Xt : array-like, shape (n_samples, n_features)

The imputed input data.

Examples using sklearn.impute.ChainedImputer