Sklearn Prediction On Test Dataset With Different Shape From Training Dataset Shape

June 27, 2023 Post a Comment

I'm new to ML and would be grateful for any assistance provided. I've run a linear regression prediction using test set A and training set A. I saved the linear regression model an

Solution 1:

They are inconsistent shapes which is why the error is being thrown. Have you tried to reshape the data so one of them are same shape? From a quick look, it seems that you have more samples and one less feature in testA.

Think about it, if you have trained your model with 5 features you cannot then ask the same model to make a prediction given 6 features. You speak of using a Linear Regressor, the equation is roughly:

y  = b + w0*x0 + w1*x1 + w2*x2 + .. + wN-1*xN-1 

Where { 
         y is your output/label
         N is the number of features
         b is the bias term
         w(i) is the ith weight
         x(i) is the ith feature value
      }

You have trained a linear regressor with 5 features, effectively producing the following

y (your output/label) = b + w0*x0 + w1*x1 + w2*x2 + w3*x3 + w4*x4

You then ask it to make a prediction given 6 features but it only knows how to deal with 5.

Aside from that issue, you also have too many samples, testB has 2480 and testA has 1315. These need to match, as the model wants to make 2480 predictions, but you only give it 1315 outputs to compare it to. How can you get a score for 1165 missing samples? Do you now see why the data has to be reshaped?

EDIT

Assuming you have datasets with an equal amount of features as discussed above, you may now look at reshaping (removing data) testB like so:

testB = testB[0:1314, :]
testB.shape
(1315, 5)

Or, if you would prefer a solution using the numpy API:

testB = np.delete(testB, np.s_[0:(len(testB)-len(testA))], axis=0)
testB.shape
(1315, 5)

Keep in mind, when doing this you slice out a number of samples. If this is important to you (which it can be) then it may be better to introduce a pre-processing step to help out with the missing values, namely imputing them like this. It is worth noting that the data you are reshaping should be shuffled (unless it is already), as you may be removing parts of the data the model should be learning about. Neglecting to do this could result in a model that may not generalise as well as you hoped.

Python stackoverflow Examples

Sklearn Prediction On Test Dataset With Different Shape From Training Dataset Shape

Solution 1:

Post a Comment for "Sklearn Prediction On Test Dataset With Different Shape From Training Dataset Shape"