本次作業分析美國兩大運動MLB NBA

library(e1071)
library(scales)
library(ggplot2)
library(reshape2)
library(stats)
library(jpeg)
library(factoextra)
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
MLB = read.csv("C:/Users/User/Desktop/R HW/HW5/HW5.csv")

此篇MLB分析只討論打者方面

Cor<-cor(MLB[,c(4,15,16,27:29)])
Melt<-melt(Cor,varnames = c("x","y"),value.name = "relation")
Melt<-Melt[order(Melt$relation),]
ggplot(Melt,aes(x=x,y=y))+
  geom_tile(aes(fill=relation))+
  scale_fill_gradient2(low="red",mid="white",high="darkblue",guide=guide_colorbar(ticks=FALSE,barheight=10),limits=c(-1,1))+
  theme_minimal()+
  labs(x=NULL,y=NULL)

R.G=每場平均得分 SO=被三振次數 BA=打擊率 LOB=殘壘 W=勝 L=敗

此圖可看出這些項目的相關性

ANOVA分析

anova<-aov(W~R.G ,MLB)
summary(anova)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## R.G          1   2008    2008   30.44 6.76e-06 ***
## Residuals   28   1848      66                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

從極小的p值我推測不同群間的R.G有顯著差異

預測本球季(29場)NBA騎士隊的勝負情況(SVM)

NBA = read.csv("C:/Users/User/Desktop/R HW/HW5/HW.csv")
traindata <- NBA[1:20,c(2,5,14)]
testdata <- NBA[21:29, c(5,14)]
svmfit = svm(as.factor(W.L) ~ ., data = traindata, 
             kernel = "polynomial", 
             cost = 10, scale = FALSE)
plot(svmfit, traindata)

predict = predict(svmfit, testdata)
ans = table(predict,NBA[21:29,2])
print(ans)
##        
## predict 0 1
##       0 0 0
##       1 1 8
t = (ans[1,1]+ans[2,2])/sum(ans)
print(t)
## [1] 0.8888889

模型精準度達0.8888889 但資料有點不足 所以準確度有待商榷