從 LIBSVM Data 取得 wine dataset (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/wine.scale)。 透過read.libsvm.R
讀入資料並整理之。
#read in the data and convert to dataframe
source( 'read.libsvm.R' )
wine = read.libsvm( 'wine.scale', 13 )
wine = as.data.frame(wine)
#reassign attributes
names(wine) = c("type","alc","acid","ash","alk","mag","phenols","flav","nonflav","proanth","color","hue","OD","proline")
#encode wine type as factor
wine$type = as.factor(wine$type)
require(ggplot2)
## Loading required package: ggplot2
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
library(base)
根據英美的醫學報告https://www.ncbi.nlm.nih.gov/pubmed/28714503,男性經常食用富含類黃酮(flavonoids)的食品, 可顯著降低罹患帕金森氏症的風險。透過ANOVA分析,本題將就 wine dataset探討不同酒種對類黃酮含量的影響。在進行分析前, 先繪製盒鬚圖(boxplot)與平均值加上信賴區間(interval plot)以觀察類黃酮含量在不同酒種間的分布。
製作盒鬚圖。
#set black-white theme
old <- theme_set(theme_bw())
#plot boxplot
ggplot(data = wine, aes(x = type, y = flav)) +
geom_boxplot() + coord_flip() +
labs( y = 'Flavanoids', x = 'Wine Type',
title = 'Flavanoids in Wine')
#compute standardized mean of flav for different wine types
tapply(wine$flav, wine$type, mean)
## 1 2 3
## 0.1149253 -0.2654662 -0.8137306
#plot interval plot
ggplot(data = wine,
aes(x = type, y = flav))+
stat_summary(fun.data = 'mean_cl_boot', size = 1) +
scale_y_continuous(breaks = seq(-1, 0.2, by = 0.2))+
geom_hline(yintercept = mean(wine$flav) ,
linetype = 'dotted') +
labs(x = 'Wine Type', y = 'Flavanoids') +
coord_flip()
從上述兩張繪圖,可以看出類黃酮含量因酒種不同而有顯著的差異。
因為有三各類別的酒種,因此採用 ANOVA 分析探討類黃酮含量(連續型之因變量)與酒種(類別型之自變量)的關係, 再次檢驗上述繪圖的結論。
H0:mu1 = mu2 = mu3 ,表示樣本平均數的差異無顯著性,即酒種對類黃酮含量無顯著影響。
#ANOVA test
anova(m1 <- lm(flav ~ type, data = wine))
## Analysis of Variance Table
##
## Response: flav
## Df Sum Sq Mean Sq F value Pr(>F)
## type 2 22.8814 11.4407 233.93 < 2.2e-16 ***
## Residuals 175 8.5588 0.0489
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m1)$r.squared
## [1] 0.7277755
p-value=2.2e-16<0.05故 reject null hypothesis H0, 證實類黃酮含量因酒種不同而有顯著差異,呼應上述繪圖的結論。