作業三 B04611027 生機三 洪浩博
首先將library 匯入
## Warning: package 'rvest' was built under R version 3.4.2
## Loading required package: xml2
## Warning: package 'xml2' was built under R version 3.4.2
## Warning: package 'magrittr' was built under R version 3.4.2
## Warning: package 'jiebaR' was built under R version 3.4.2
## Loading required package: jiebaRD
## Warning: package 'jiebaRD' was built under R version 3.4.2
## Warning: package 'wordcloud2' was built under R version 3.4.2
再來到網站上探勘這次探勘目標是娛樂新聞
title=read_html("http://ent.ltn.com.tw/") %>%
html_nodes(".boxTitle .listA .list_title") %>%
html_text() %>%iconv("UTF-8")
把資料中的數字清除因為不是我們的目標
i<-0
for(x in title ) {title[i]<-gsub("[0-9]","",x)
i<-i+1}
Sys.setlocale(category = "LC_ALL", locale = "cht")
## [1] "LC_COLLATE=Chinese (Traditional)_Taiwan.950;LC_CTYPE=Chinese (Traditional)_Taiwan.950;LC_MONETARY=Chinese (Traditional)_Taiwan.950;LC_NUMERIC=C;LC_TIME=Chinese (Traditional)_Taiwan.950"
再來將資料換成dataframe的形式
cc = worker()
mlb <-table(cc[title])
mlb<-data.frame(mlb)
製作成電子雲
head(mlb[order(mlb$Freq,decreasing = TRUE),])
## Var1 Freq
## 276 柯以柔 19
## 541 影音 12
## 166 老公 10
## 236 的 10
## 17 iPhone 8
## 30 X 8
wordcloud2(mlb)