本次作業分析美國兩大運動MLB NBA
library(e1071)
library(scales)
library(ggplot2)
library(reshape2)
library(stats)
library(jpeg)
library(factoextra)
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
MLB = read.csv("C:/Users/User/Desktop/R HW/HW5/HW5.csv")
此篇MLB分析只討論打者方面
Cor<-cor(MLB[,c(4,15,16,27:29)])
Melt<-melt(Cor,varnames = c("x","y"),value.name = "relation")
Melt<-Melt[order(Melt$relation),]
ggplot(Melt,aes(x=x,y=y))+
geom_tile(aes(fill=relation))+
scale_fill_gradient2(low="red",mid="white",high="darkblue",guide=guide_colorbar(ticks=FALSE,barheight=10),limits=c(-1,1))+
theme_minimal()+
labs(x=NULL,y=NULL)
R.G=每場平均得分 SO=被三振次數 BA=打擊率 LOB=殘壘 W=勝 L=敗
此圖可看出這些項目的相關性
ANOVA分析
anova<-aov(W~R.G ,MLB)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## R.G 1 2008 2008 30.44 6.76e-06 ***
## Residuals 28 1848 66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
從極小的p值我推測不同群間的R.G有顯著差異
預測本球季(29場)NBA騎士隊的勝負情況(SVM)
NBA = read.csv("C:/Users/User/Desktop/R HW/HW5/HW.csv")
traindata <- NBA[1:20,c(2,5,14)]
testdata <- NBA[21:29, c(5,14)]
svmfit = svm(as.factor(W.L) ~ ., data = traindata,
kernel = "polynomial",
cost = 10, scale = FALSE)
plot(svmfit, traindata)
predict = predict(svmfit, testdata)
ans = table(predict,NBA[21:29,2])
print(ans)
##
## predict 0 1
## 0 0 0
## 1 1 8
t = (ans[1,1]+ans[2,2])/sum(ans)
print(t)
## [1] 0.8888889
模型精準度達0.8888889 但資料有點不足 所以準確度有待商榷