首先得到经过特征选择后的样本数据,并划分为训练样本和检验样本
newdata4=newdata3[,Profile$optVariables]然后定义模型训练参数,method确定多次交叉检验的抽样方法,number确定了划分的重数, repeats确定了反复次数。
inTrain = createDataPartition(mdrrClass, p = 3/4, list = FALSE)
trainx = newdata4[inTrain,]
testx = newdata4[-inTrain,]
trainy = mdrrClass[inTrain]
testy = mdrrClass[-inTrain]
fitControl = trainControl(method = "repeatedcv", number = 10, repeats = 3,returnResamp = "all")确定参数选择范围,本例建模准备使用gbm算法,相应的参数有如下三项
gbmGrid = expand.grid(.interaction.depth = c(1, 3),.n.trees = c(50, 100, 150, 200, 250, 300),.shrinkage = 0.1)利用train函数进行训练,使用的建模方法为提升决策树方法,
gbmFit1 = train(trainx,trainy,method = "gbm",trControl = fitControl,tuneGrid = gbmGrid,verbose = FALSE)从结果可以观察到interaction.depth取1,n.trees取150时精度最高
interaction.depth n.trees Accuracy Kappa Accuracy SD Kappa SD 1 50 0.822 0.635 0.0577 0.118 1 100 0.824 0.639 0.0574 0.118 1 150 0.826 0.643 0.0635 0.131 1 200 0.824 0.64 0.0605 0.123 1 250 0.816 0.623 0.0608 0.124 1 300 0.824 0.64 0.0584 0.119 3 50 0.816 0.621 0.0569 0.117 3 100 0.82 0.631 0.0578 0.117 3 150 0.815 0.621 0.0582 0.117 3 200 0.82 0.63 0.0618 0.125 3 250 0.813 0.617 0.0632 0.127 3 300 0.812 0.615 0.0622 0.126
同样的图形观察
plot(gbmFit1)
没有评论:
发表评论