Skip to content Skip to sidebar Skip to footer

How To Predict Correctly In Sklearn Randomforestregressor?

I'm working on a big data project for my school project. My dataset looks like this: https://github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv I'm trying to predict th

Solution 1:

It's not enought to use only year to predict temperature. Your need to use month data too. Here is a working example for starters:

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
df = pd.read_csv('https://raw.githubusercontent.com/gindeleo/climate/master/GlobalTemperatures.csv', usecols=['dt','LandAverageTemperature'], parse_dates=['dt'])
df = df.dropna()
df["year"] = df['dt'].dt.year
df["month"] = df['dt'].dt.month
X = df[["month", "year"]]
y = df["LandAverageTemperature"]
rf_reg=RandomForestRegressor(n_estimators=10,random_state=0)
rf_reg.fit(X, y)
y_pred = rf_reg.predict(X)
df_result = pd.DataFrame({'year': X['year'], 'month': X['month'], 'true': y, 'pred': y_pred})
print('True values and predictions')
print(df_result)
print('Feature importances', list(zip(X.columns, rf_reg.feature_importances_)))

And here is output:

Truevaluesandpredictionsyearmonthtruepred01750      13.0342.294411750      23.0832.422221750      35.6265.643431750      48.4908.341941750      511.57311.7569...............3187  2015      814.75514.80043188  2015      912.99913.03923189  2015     1010.80110.70683190  2015     117.4337.11733191  2015     125.5185.1634

[3180 rowsx4columns]
Featureimportances [('month', 0.9543059863177156), ('year', 0.045694013682284394)]

Post a Comment for "How To Predict Correctly In Sklearn Randomforestregressor?"