Skip to content

Regression models

Setup

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import r2_score, mean_squared_error
import xgboost as xgb
from sklearn.neighbors import KNeighborsRegressor
import logger

By subclassing the Model class: in that case, you should define your layers in init and you should implement the model's forward pass in regressor.

class RegressionModelTuner():

This class shall be used to get the best suited Regression model

def __init__(self):
        self.file_object = open('RegressionLogs.txt', 'a+')
        self.logger_object = logger.App_Logger()

Method Name : get_tuned_knn_model

def get_tuned_knn_model(self, x_train, y_train):

Description : This method will be used to get the hypertuned KNN Model

x_train : Feature Columns of Training DataSet

y_train : Target Column of Training DataSet

output : A hyper parameter tuned model object

parameters

Let's set up a parameter grid that will be explored during the search. Note that you can use fewer parameters and fewer options for each parameter. Same goes for more parameter and more options if you want to be very thorough. Also, you can plug in any other ML method instead of XGBoost and search for its optimal parameters.

knn_parameters = {'n_neighbors': [50, 100, 200, 250, 300, 350],
                              'weights': ['uniform', 'distance'],
                              'algorithm': ['ball_tree', 'kd_tree'],
                              'leaf_size': [20, 25, 30, 35, 40, 45, 50],
                              }

Method Name: get_tuned_random_forest_classifier

Description: This method will be used to build RandomForestRegressor model

Input Description:

x_train : Feature Columns of Training DataSet

y_train : Target Column of Training DataSet

Let's try hyperparameter tuning on the all features data This first section is setting up the grid and importing the necessary modules and fitting X_train and y_train

self.model = RandomForestRegressor(n_estimators=n_estimators,
                                       max_depth=max_depth,
                                       criterion=criterion,
                                       min_samples_leaf=min_samples_leaf,
                                       max_features=max_features,
                                       min_samples_split=min_samples_split,
                                       bootstrap=bootstrap,
                                       random_state=25,
                                       n_jobs=-1)

self.model = RandomForestRegressor(n_jobs=-1)
self.model.fit(x_train, y_train)
self.logger_object.log(self.file_object,

Method Name: get_tuned_xgboost_model

Description: This method will be used to build XGBoost Regressor model

Input Description:

x_train : Feature Columns of Training DataSet

y_train : Target Column of Training DataSet

Parameters

self.xg_parameters = {"n_estimators": [10, 50, 100, 200],
                                  "learning_rate": [0.05, 0.10, 0.15, 0.20, 0.25, 0.30],
                                  "max_depth": [3, 4, 5, 6, 8, 10, 12, 15, 20],
                                  "min_child_weight": [1, 3, 5, 7],
                                  "gamma": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5],
                                  "colsample_bytree": [0.3, 0.4, 0.5, 0.7]
                                  }

self.rmdsearch = RandomizedSearchCV(xgb.XGBRegressor(objective='reg:squarederror'),param_distributions=self.xg_parameters, n_iter=10, cv=10, n_jobs=-1)
self.rmdsearch.fit(x_train, y_train)
hyperparameters = self.rmdsearch.best_params_n_estimators,
                min_child_weight, max_depth, learning_rate,
                gamma, colsample_bytree = hyperparameters['n_estimators'], \
                                                        hyperparameters[
                                                            'min_child_weight'], \
                                                        hyperparameters[
                                                            'max_depth'], \
                                                        hyperparameters[
                                                            'learning_rate'], \
                                                        hyperparameters[
                                                            'gamma'], \
                                                        hyperparameters[
                                                            'colsample_bytree']
self.xgboost_model = xgb.XGBRegressor(n_estimators=n_estimators,
                                   learning_rate=learning_rate,
                                   gamma=gamma,
                                   min_child_weight=min_child_weight,
                                   max_depth=max_depth,
                                   colsample_bytree=colsample_bytree)

Fitting X_train and y_train

    self.xgboost_model = xgb.XGBRegressor(objective='reg:squarederror',n_jobs=-1)
    self.xgboost_model.fit(x_train, y_train)
    self.logger_object.log(self.file_object,"Xgboost Model Training Started.")