Regression models¶

Setup¶

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import r2_score, mean_squared_error
import xgboost as xgb
from sklearn.neighbors import KNeighborsRegressor
import logger

By subclassing the Model class: in that case, you should define your layers in init and you should implement the model's forward pass in regressor.

class RegressionModelTuner():¶

This class shall be used to get the best suited Regression model

def __init__(self):
        self.file_object = open('RegressionLogs.txt', 'a+')
        self.logger_object = logger.App_Logger()

Method Name : get_tuned_knn_model¶

def get_tuned_knn_model(self, x_train, y_train):

Description : This method will be used to get the hypertuned KNN Model

x_train : Feature Columns of Training DataSet

y_train : Target Column of Training DataSet

output : A hyper parameter tuned model object

parameters¶

Let's set up a parameter grid that will be explored during the search. Note that you can use fewer parameters and fewer options for each parameter. Same goes for more parameter and more options if you want to be very thorough. Also, you can plug in any other ML method instead of XGBoost and search for its optimal parameters.

knn_parameters = {'n_neighbors': [50, 100, 200, 250, 300, 350],
                              'weights': ['uniform', 'distance'],
                              'algorithm': ['ball_tree', 'kd_tree'],
                              'leaf_size': [20, 25, 30, 35, 40, 45, 50],
                              }

Method Name: get_tuned_random_forest_classifier¶

Description: This method will be used to build RandomForestRegressor model

Input Description:

x_train : Feature Columns of Training DataSet

y_train : Target Column of Training DataSet

Let's try hyperparameter tuning on the all features data This first section is setting up the grid and importing the necessary modules and fitting X_train and y_train

self.model = RandomForestRegressor(n_estimators=n_estimators,
                                       max_depth=max_depth,
                                       criterion=criterion,
                                       min_samples_leaf=min_samples_leaf,
                                       max_features=max_features,
                                       min_samples_split=min_samples_split,
                                       bootstrap=bootstrap,
                                       random_state=25,
                                       n_jobs=-1)

self.model = RandomForestRegressor(n_jobs=-1)
self.model.fit(x_train, y_train)
self.logger_object.log(self.file_object,

Method Name: get_tuned_xgboost_model¶

Description: This method will be used to build XGBoost Regressor model

Input Description:

x_train : Feature Columns of Training DataSet

y_train : Target Column of Training DataSet

Parameters

self.xg_parameters = {"n_estimators": [10, 50, 100, 200],
                                  "learning_rate": [0.05, 0.10, 0.15, 0.20, 0.25, 0.30],
                                  "max_depth": [3, 4, 5, 6, 8, 10, 12, 15, 20],
                                  "min_child_weight": [1, 3, 5, 7],
                                  "gamma": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5],
                                  "colsample_bytree": [0.3, 0.4, 0.5, 0.7]
                                  }

self.rmdsearch = RandomizedSearchCV(xgb.XGBRegressor(objective='reg:squarederror'),param_distributions=self.xg_parameters, n_iter=10, cv=10, n_jobs=-1)
self.rmdsearch.fit(x_train, y_train)
hyperparameters = self.rmdsearch.best_params_n_estimators,
                min_child_weight, max_depth, learning_rate,
                gamma, colsample_bytree = hyperparameters['n_estimators'], \
                                                        hyperparameters[
                                                            'min_child_weight'], \
                                                        hyperparameters[
                                                            'max_depth'], \
                                                        hyperparameters[
                                                            'learning_rate'], \
                                                        hyperparameters[
                                                            'gamma'], \
                                                        hyperparameters[
                                                            'colsample_bytree']
self.xgboost_model = xgb.XGBRegressor(n_estimators=n_estimators,
                                   learning_rate=learning_rate,
                                   gamma=gamma,
                                   min_child_weight=min_child_weight,
                                   max_depth=max_depth,
                                   colsample_bytree=colsample_bytree)

Fitting X_train and y_train¶

    self.xgboost_model = xgb.XGBRegressor(objective='reg:squarederror',n_jobs=-1)
    self.xgboost_model.fit(x_train, y_train)
    self.logger_object.log(self.file_object,"Xgboost Model Training Started.")