作業三 B04611027 生機三 洪浩博

首先將library 匯入

## Warning: package 'rvest' was built under R version 3.4.2
## Loading required package: xml2
## Warning: package 'xml2' was built under R version 3.4.2
## Warning: package 'magrittr' was built under R version 3.4.2
## Warning: package 'jiebaR' was built under R version 3.4.2
## Loading required package: jiebaRD
## Warning: package 'jiebaRD' was built under R version 3.4.2
## Warning: package 'wordcloud2' was built under R version 3.4.2

再來到網站上探勘這次探勘目標是娛樂新聞

title=read_html("http://ent.ltn.com.tw/") %>%
      html_nodes(".boxTitle .listA .list_title") %>%
      html_text()  %>%iconv("UTF-8")

把資料中的數字清除因為不是我們的目標

i<-0
for(x in title ) {title[i]<-gsub("[0-9]","",x) 
i<-i+1}
Sys.setlocale(category = "LC_ALL", locale = "cht")
## [1] "LC_COLLATE=Chinese (Traditional)_Taiwan.950;LC_CTYPE=Chinese (Traditional)_Taiwan.950;LC_MONETARY=Chinese (Traditional)_Taiwan.950;LC_NUMERIC=C;LC_TIME=Chinese (Traditional)_Taiwan.950"

再來將資料換成dataframe的形式

cc = worker()
mlb <-table(cc[title])
mlb<-data.frame(mlb)

製作成電子雲

head(mlb[order(mlb$Freq,decreasing = TRUE),])
##       Var1 Freq
## 276 柯以柔   19
## 541   影音   12
## 166   老公   10
## 236     的   10
## 17  iPhone    8
## 30       X    8
wordcloud2(mlb)