Version 0.20 (under development)

This release packs in a mountain of bug fixes, features and enhancements for the Scikit-learn library, and improvements to the documentation and examples. Thanks to our many contributors!

Highlights

We have tried to improve our support for common data-science use-cases including missing values, categorical variables, heterogeneous data, and features/targets with unusual distributions.

Missing values in features, represented by NaNs, are now accepted in column-wise preprocessing such as scalers. Each feature is fitted disregarding NaNs, and data containing NaNs can be transformed. The new impute module provides estimators for learning despite missing data.

ColumnTransformer handles the case where different features or columns of a pandas.DataFrame need different preprocessing. String or pandas Categorical columns can now be encoded with OneHotEncoder or OrdinalEncoder.

TransformedTargetRegressor helps when the regression target needs to be transformed to be modeled. PowerTransformer and KBinsDiscretizer join QuantileTransformer as non-linear transformations.

Beyond this, we have added sample_weight support to several estimators (including KMeans, BayesianRidge and KernelDensity) and improved stopping criteria in others (including MLPRegressor, GradientBoostingRegressor and SGDRegressor).

This release is also the first to be accompanied by a Glossary of Common Terms and API Elements developed by Joel Nothman. The glossary is a reference resource to help users and contributors become familiar with the terminology and conventions used in Scikit-learn.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

Support for Python 3.3 has been officially dropped.

New features

Classifiers and regressors

Preprocessing

Model evaluation

Decomposition, manifold learning and clustering

Metrics

Misc

Enhancements

Classifiers and regressors

Cluster

Datasets

Preprocessing

Model evaluation and meta-estimators

Decomposition and manifold learning

Metrics

Linear, kernelized and related models

Decomposition, manifold learning and clustering

  • Deprecate precomputed parameter in function manifold.t_sne.trustworthiness. Instead, the new parameter metric should be used with any compatible metric including ‘precomputed’, in which case the input matrix X should be a matrix of pairwise distances or squared distances. #9775 by William de Vazelhes.

Utils

Miscellaneous

Bug fixes

Classifiers and regressors

Decomposition, manifold learning and clustering

Metrics

Neighbors

Feature Extraction

Utils

  • utils.check_array yield a FutureWarning indicating that arrays of bytes/strings will be interpreted as decimal numbers beginning in version 0.22. #10229 by Ryan Lee

Preprocessing

Feature selection

Model evaluation and meta-estimators

Datasets

API changes summary

Linear, kernelized and related models

Preprocessing

  • Deprecate n_values and categorical_features parameters and active_features_, feature_indices_ and n_values_ attributes of preprocessing.OneHotEncoder. The n_values parameter can be replaced with the new categories parameter, and the attributes with the new categories_ attribute. Selecting the categorical features with the categorical_features parameter is now better supported using the compose.ColumnTransformer. #10521 by Joris Van den Bossche.

Decomposition, manifold learning and clustering

  • Deprecate precomputed parameter in function manifold.t_sne.trustworthiness. Instead, the new parameter metric should be used with any compatible metric including ‘precomputed’, in which case the input matrix X should be a matrix of pairwise distances or squared distances. #9775 by William de Vazelhes.
  • Added function fit_predict to mixture.GaussianMixture and mixture.GaussianMixture, which is essentially equivalent to calling fit and predict. #10336 by Shu Haoran and Andrew Peng.

Metrics

Cluster

Imputer

Outlier Detection models

Covariance

Misc

Preprocessing

Changes to estimator checks

These changes mostly affect library developers.