sklearn.impute
.ChainedImputer¶
-
class
sklearn.impute.
ChainedImputer
(missing_values=nan, imputation_order='ascending', n_imputations=100, n_burn_in=10, predictor=None, n_nearest_features=None, initial_strategy='mean', min_value=None, max_value=None, verbose=False, random_state=None)[source]¶ Chained imputer transformer to impute missing values.
Basic implementation of chained imputer from MICE (Multivariate Imputations by Chained Equations) package from R. This version assumes all of the features are Gaussian.
Read more in the User Guide.
Parameters: - missing_values : int, np.nan, optional (default=np.nan)
The placeholder for the missing values. All occurrences of
missing_values
will be imputed.- imputation_order : str, optional (default=”ascending”)
The order in which the features will be imputed. Possible values:
- “ascending”
From features with fewest missing values to most.
- “descending”
From features with most missing values to fewest.
- “roman”
Left to right.
- “arabic”
Right to left.
- “random”
A random order for each round.
- n_imputations : int, optional (default=100)
Number of chained imputation rounds to perform, the results of which will be used in the final average.
- n_burn_in : int, optional (default=10)
Number of initial imputation rounds to perform the results of which will not be returned.
- predictor : estimator object, default=BayesianRidge()
The predictor to use at each step of the round-robin imputation. It must support
return_std
in itspredict
method.- n_nearest_features : int, optional (default=None)
Number of other features to use to estimate the missing values of the each feature column. Nearness between features is measured using the absolute correlation coefficient between each feature pair (after initial imputation). Can provide significant speed-up when the number of features is huge. If
None
, all features will be used.- initial_strategy : str, optional (default=”mean”)
Which strategy to use to initialize the missing values. Same as the
strategy
parameter insklearn.impute.SimpleImputer
Valid values: {“mean”, “median”, “most_frequent”, or “constant”}.- min_value : float, optional (default=None)
Minimum possible imputed value. Default of
None
will set minimum to negative infinity.- max_value : float, optional (default=None)
Maximum possible imputed value. Default of
None
will set maximum to positive infinity.- verbose : int, optional (default=0)
Verbosity flag, controls the debug messages that are issued as functions are evaluated. The higher, the more verbose. Can be 0, 1, or 2.
- random_state : int, RandomState instance or None, optional (default=None)
The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.
Attributes: - initial_imputer_ : object of class
sklearn.preprocessing.Imputer
’ The imputer used to initialize the missing values.
- imputation_sequence_ : list of tuples
Each tuple has
(feat_idx, neighbor_feat_idx, predictor)
, wherefeat_idx
is the current feature to be imputed,neighbor_feat_idx
is the array of other features used to impute the current feature, andpredictor
is the trained predictor used for the imputation.
Notes
The R version of MICE does not have inductive functionality, i.e. first fitting on
X_train
and then transforming anyX_test
without additional fitting. We do this by storing each feature’s predictor during the round-robinfit
phase, and predicting without refitting (in order) during thetransform
phase.Features which contain all missing values at
fit
are discarded upontransform
.Features with missing values in transform which did not have any missing values in fit will be imputed with the initial imputation method only.
References
[1] Stef van Buuren, Karin Groothuis-Oudshoorn (2011). “mice: Multivariate Imputation by Chained Equations in R”. Journal of Statistical Software 45: 1-67. Methods
fit
(X[, y])Fits the imputer on X and return self. fit_transform
(X[, y])Fits the imputer on X and return the transformed X. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(X)Imputes all missing values in X. -
__init__
(missing_values=nan, imputation_order='ascending', n_imputations=100, n_burn_in=10, predictor=None, n_nearest_features=None, initial_strategy='mean', min_value=None, max_value=None, verbose=False, random_state=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(X, y=None)[source]¶ Fits the imputer on X and return self.
Parameters: - X : array-like, shape (n_samples, n_features)
Input data, where “n_samples” is the number of samples and “n_features” is the number of features.
- y : ignored
Returns: - self : object
Returns self.
-
fit_transform
(X, y=None)[source]¶ Fits the imputer on X and return the transformed X.
Parameters: - X : array-like, shape (n_samples, n_features)
Input data, where “n_samples” is the number of samples and “n_features” is the number of features.
- y : ignored.
Returns: - Xt : array-like, shape (n_samples, n_features)
The imputed input data.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self
-
transform
(X)[source]¶ Imputes all missing values in X.
Note that this is stochastic, and that if random_state is not fixed, repeated calls, or permuted input, will yield different results.
Parameters: - X : array-like, shape = [n_samples, n_features]
The input data to complete.
Returns: - Xt : array-like, shape (n_samples, n_features)
The imputed input data.