模型评估与调优:交叉验证/超参搜索/模型选择
模型评估与调优交叉验证/超参搜索/模型选择1. 分类评估指标fromsklearn.metricsimport(accuracy_score,precision_score,recall_score,f1_score,confusion_matrix,classification_report,roc_auc_score,roc_curve)# 基础指标accuracyaccuracy_score(y_test,y_pred)precisionprecision_score(y_test,y_pred,averageweighted)recallrecall_score(y_test,y_pred,averageweighted)f1f1_score(y_test,y_pred,averageweighted)# 混淆矩阵cmconfusion_matrix(y_test,y_pred)print(cm)# 分类报告print(classification_report(y_test,y_pred))# ROC-AUCy_probamodel.predict_proba(X_test)[:,1]aucroc_auc_score(y_test,y_proba)2. 回归评估指标fromsklearn.metricsimportmean_squared_error,mean_absolute_error,r2_score msemean_squared_error(y_test,y_pred)rmsenp.sqrt(mse)maemean_absolute_error(y_test,y_pred)r2r2_score(y_test,y_pred)print(fMSE:{mse:.4f})print(fRMSE:{rmse:.4f})print(fMAE:{mae:.4f})print(fR²:{r2:.4f})3. 交叉验证fromsklearn.model_selectionimport(cross_val_score,KFold,StratifiedKFold,LeaveOneOut)# K 折交叉验证kfoldKFold(n_splits5,shuffleTrue,random_state42)scorescross_val_score(model,X,y,cvkfold,scoringaccuracy)print(f准确率:{scores.mean():.4f}±{scores.std():.4f})# 分层 K 折分类任务推荐skfStratifiedKFold(n_splits5,shuffleTrue,random_state42)scorescross_val_score(model,X,y,cvskf,scoringaccuracy)# 留一法小数据集looLeaveOneOut()scorescross_val_score(model,X,y,cvloo,scoringaccuracy)4. 超参数搜索fromsklearn.model_selectionimportGridSearchCV,RandomizedSearchCV# 网格搜索param_grid{n_estimators:[50,100,200],max_depth:[3,6,9],learning_rate:[0.01,0.1,0.2],}gridGridSearchCV(model,param_grid,cv5,scoringaccuracy,n_jobs-1)grid.fit(X_train,y_train)print(f最佳参数:{grid.best_params_})print(f最佳分数:{grid.best_score_:.4f})# 随机搜索更快fromscipy.statsimportuniform,randint param_dist{n_estimators:randint(50,300),max_depth:randint(3,10),learning_rate:uniform(0.01,0.3),}random_searchRandomizedSearchCV(model,param_dist,n_iter100,cv5,scoringaccuracy,random_state42)random_search.fit(X_train,y_train)5. 学习曲线fromsklearn.model_selectionimportlearning_curveimportmatplotlib.pyplotasplt train_sizes,train_scores,val_scoreslearning_curve(model,X,y,cv5,n_jobs-1,train_sizesnp.linspace(0.1,1.0,10),scoringaccuracy)plt.figure(figsize(10,6))plt.plot(train_sizes,train_scores.mean(axis1),labelTraining)plt.plot(train_sizes,val_scores.mean(axis1),labelValidation)plt.xlabel(Training Size)plt.ylabel(Accuracy)plt.title(Learning Curve)plt.legend()plt.grid(True)plt.show()总结指标适用任务公式Accuracy分类正确数/总数Precision分类TP/(TPFP)Recall分类TP/(TPFN)F1分类2×P×R/(PR)RMSE回归√(MSE)R²回归1-SS_res/SS_tot