Import the required modules :
Reading data sets :
( The details shall be subject to the location of the data set stored by yourself )
Basic information of data set :
The first feature is Id, For machine learning ,id Can’t bring effective feature information , So we don’t use this attribute as a feature .
Preprocess the data :
After standardization , The mean value of each numerical characteristic becomes 0, So you can use it directly 0 To replace the missing value
Use one_hot Coding decomposes discrete features into multiple features , The decomposed features can be used 0/1 To express , such , This transformation converts the number of features from 79 Increased to 331
Define the functions required by the model :
Given predicted value y ^ 1 , … , y ^ n hat y_1, ldots, hat y_n y^1,…,y^n And the corresponding real label y 1 , … , y n y_1,ldots, y_n y1,…,yn, The root mean square error of logarithm is defined as
1 n ∑ i = 1 n ( log ( y i ) − log ( y ^ i ) ) 2 . sqrt{frac{1}{n}sum_{i=1}^nleft(log(y_i)-log(hat y_i)right)^2}. n1i=1∑n(log(yi)−log(y^i))2 .
Realization K Crossover verification :
.
.
Training models :
After the above code is executed, a submission.csv file , The document conforms to kaggle Submission format for , You can directly re kaggle Submit the game link .
Read more here: Source link