.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_auto_examples_plot_3_example_varying_sample_weights.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_3_example_varying_sample_weights.py:


.. _example3:

Third Example: Injecting varying ``sample_weight`` vectors to a linear regression model for GridSearchCV
-------------------------------------------------------------------------------------------------------------------
This example illustrates a case in which a varying vector is injected to a linear regression model as ``sample_weight`` in order to evaluate them and obtain the sample_weight that generates the best results.
Let's imagine we have a sample_weight vector and different powers of the vector are needed to be evaluated. To perform such experiment, the following issues appear:

- The shape of the graph is not a linear sequence as those that can be implemented using Pipeline.
- More than two variables (typically: ``X`` and ``y``) need to be accordingly split in order to perform the cross validation with GridSearchCV, in this case: ``X``, ``y`` and ``sample_weight``.
- The information provided to the  ``sample_weight`` parameter of the LinearRegression step varies on the different scenarios explored by GridSearchCV. In a GridSearchCV with Pipeline, ``sample_weight`` can't vary because it is treated as a ``fit_param`` instead of a variable.

Steps of the **PipeGraph**:

- **selector**: Featuring a :class:`ColumnSelector` custom step. This is not a sklearn original object but a custom class that allows to split an array into columns. In this case, ``X`` augmented data is column-wise divided as specified in a mapping dictionary. We previously created an augmented ``X`` in which all data but ``y`` is concatenated and it will be used by :class:`GridSearchCV` to make the cross validation splits. **selector** step de-concatenates such data.
- **custom_power**: Featuring a :class:`CustomPower` custom class. A simple transformation of the input data that is powered to a specified power as indicated in ``param_grid``.
- **scaler**: implements :class:`MinMaxScaler` class
- **polynomial_features**: Contains a :class:`PolynomialFeatures` object
- **linear_model**: Contains a :class:`LinearRegression` model

.. figure:: https://raw.githubusercontent.com/mcasl/PipeGraph/master/examples/images/Diapositiva3.png

    Figure 1. PipeGraph diagram showing the steps and their connections


.. code-block:: python

    import numpy as np
    import pandas as pd
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import GridSearchCV
    from pipegraph.base import PipeGraph, ColumnSelector, Reshape
    from pipegraph.demo_blocks import CustomPower
    import matplotlib.pyplot as plt


We create an augmented ``X`` in which all data but ``y`` is concatenated. In this case, we concatenate ``X`` and ``sample_weight`` vector.


.. code-block:: python


    X = pd.DataFrame(dict(X=np.array([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11]),
              sample_weight=np.array([0.01, 0.95, 0.10, 0.95, 0.95, 0.10, 0.10, 0.95, 0.95, 0.95, 0.01])))
    y = np.array(                    [  10,    4,   20,   16,   25 , -60,   85,   64,   81,  100,  150])


Next we define the steps and we use :class:`PipeGraphRegressor` as estimator for :class:`GridSearchCV`.


.. code-block:: python


    scaler = MinMaxScaler()
    polynomial_features = PolynomialFeatures()
    linear_model = LinearRegression()
    custom_power = CustomPower()
    selector = ColumnSelector(mapping={'X': slice(0, 1),
                                       'sample_weight': slice(1,2)})

    steps = [('selector', selector),
             ('custom_power', custom_power),
             ('scaler', scaler),
             ('polynomial_features', polynomial_features),
             ('linear_model', linear_model)]

    pgraph = PipeGraph(steps=steps)

    (pgraph.inject(sink='selector', sink_var='X', source='_External', source_var='X')
           .inject('custom_power', 'X', 'selector', 'sample_weight')
           .inject('scaler', 'X', 'selector', 'X')
           .inject('polynomial_features', 'X', 'scaler')
           .inject('linear_model', 'X',  'polynomial_features')
           .inject('linear_model', 'y', source_var='y')
           .inject('linear_model', 'sample_weight', 'custom_power'))


Then we define ``param_grid`` as expected by :class:`GridSearchCV` exploring a few possibilities
 of varying parameters.


.. code-block:: python

    param_grid = {'polynomial_features__degree': range(1, 3),
                  'linear_model__fit_intercept': [True, False],
                  'custom_power__power': [1, 5, 10, 20, 30]}


    grid_search_regressor = GridSearchCV(estimator=pgraph, param_grid=param_grid, refit=True)
    grid_search_regressor.fit(X, y)
    y_pred = grid_search_regressor.predict(X)

    plt.scatter(X.loc[:,'X'], y)
    plt.scatter(X.loc[:,'X'], y_pred)
    plt.show()

    power = grid_search_regressor.best_estimator_.get_params()['custom_power']
    print('Power that obtains the best results in the linear model: \n {}'.format(power))


.. image:: /auto_examples/images/sphx_glr_plot_3_example_varying_sample_weights_001.png
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Power that obtains the best results in the linear model: 
     CustomPower(power=20)


This example displayed a non linear workflow successfully implemented by **PipeGraph**, while at the same time showing a way to circumvent current limitations of standard :class:`GridSearchCV`, in particular, the retriction on the number of input parameters.
:ref:`Next examples <example4>` show more elaborated examples in increasing complexity order.


**Total running time of the script:** ( 0 minutes  0.519 seconds)


.. _sphx_glr_download_auto_examples_plot_3_example_varying_sample_weights.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_3_example_varying_sample_weights.py <plot_3_example_varying_sample_weights.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_3_example_varying_sample_weights.ipynb <plot_3_example_varying_sample_weights.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_