以文字雲呈現Pubmed網站上對於Acinetobacter baumannii的論文題目關鍵字的分析
(未解決之問題:
因為是在Pubmed網站內搜尋,一旦換了下一頁的搜尋解果,網址就會變回Pubmed的總網址,無法擷取內容)
install.packages('rvest')
install.packages('NLP')
install.packages('RColorBrewer')
install.packages('wordcloud')
library('rvest')
## Warning: package 'rvest' was built under R version 3.3.3
## Loading required package: xml2
## Warning: package 'xml2' was built under R version 3.3.3
library('NLP')
## Warning: package 'NLP' was built under R version 3.3.3
library('RColorBrewer')
library('wordcloud')
## Warning: package 'wordcloud' was built under R version 3.3.3
doc<-read_html("https://www.ncbi.nlm.nih.gov/pubmed/?term=acinetobacter+baumannii")
a<-data.frame(html_nodes(doc,".title a")%>%html_text())
write.table(a,file ="Ab.txt")
Abartical = readLines("Ab.txt")
Abartical = gsub('"html_nodes.doc....title.a.......html_text.."',' ',Abartical)
Abartical = gsub('<i>',' ',Abartical)
Abartical = gsub('</i>',' ',Abartical)
Abartical = gsub('Acinetobacter baumannii',' ',Abartical)#把一定會出現的Acinetobacter baumannii去除,使其他標題更容易判斷其出現頻率
1.pneumonia:Acinetobacter baumannii最常造成的疾病 2.isolate:因為大多是實驗的論文,所以如何分離出菌株頻率也較高 3.resistance:指的是抗藥性,也是Acinetobacter baumannii最大的問題 4.Carbapenem:一線抗生素,若連這個也無法對抗Acinetobacter baumannii,那沒有抗生素可以醫治了 5.intensive,unit,patients:在加護病房的病人容易感染 6.novel:新興的多重抗藥性細菌 7.colistin:也是一種抗生素,Acinetobacter baumannii對其有抗藥性
wordcloud(Abartical, min.freq = 2, random.order = FALSE, scale=c(3.5, 0.5), color=brewer.pal(6, "Dark2"))
## Loading required package: tm
## Warning: package 'tm' was built under R version 3.3.3