題目:比較美國前總統歐巴馬於總統任期前後的臉抒發文內容

以文字雲呈現美國前總統歐巴馬於總統任期前後的臉抒發文內容,分成任職前(Pre-Presidency)、第一任期(First Term)、第二任期(Second Term)、卸任後(Post-Presidency)。

套件安裝 - 抓臉書資料

install.packages("devtools")
library(devtools)
install_version("httr", version = "1.2.0", repos = "http://cran.us.r-project.org")
install.packages("rjson")
install.packages("httpuv")
install_github("pablobarbera/Rfacebook/Rfacebook")

套件執行 - 抓臉書資料

library(devtools)
library(httr)
library(rjson)
library(httpuv)
require(Rfacebook)
## Loading required package: Rfacebook
## 
## Attaching package: 'Rfacebook'
## The following object is masked from 'package:methods':
## 
##     getGroup

網路爬蟲

透過 fbOAuth function 取得 access token,再透過 token 取得歐巴馬於四段時期的發文。為減省作業,因此取百篇內的貼文。

# generate a 'long-lived' token using the fbOAuth function
#fb_oauth <- fbOAuth(app_id="1569896156438201", app_secret="8b593e1538dc2aae0ca986806bdd4093",extended_permissions = TRUE)

# save fb_oauth for future use
#save(fb_oauth, file="fb_oauth")
load("fb_oauth")

# set token
token = fb_oauth

# extract data from page
page_PreElect <- getPage("barackobama", token, n=100, since='2007/02/10', until ='2008/11/04')
## 25 posts 50 posts 65 posts
page_First <- getPage("barackobama", token, n=100, since='2009/01/20', until='2013/01/21')
## 25 posts 50 posts 75 posts 100 posts
page_Second <- getPage("barackobama", token, n=100, since='2013/01/21', until='2017/01/20')
## 25 posts 50 posts 75 posts 100 posts
page_Depart <- getPage("barackobama", token, n=100, since='2017/01/20', until='2017/10/21')
## 9 posts

套件安裝 - 製作文字雲

install.packages('rvest')
install.packages('NLP')
install.packages('tm')
install.packages('stringr')
install.packages('wordcloud')
install.packages('RColorBrewer')

套件執行 - 製作文字雲

library(rvest)
## Loading required package: xml2
library(NLP)
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:httr':
## 
##     content
require(tm)
## Loading required package: tm
library(stringr)
library(wordcloud)
## Loading required package: RColorBrewer
require(RColorBrewer)

文本清理

先將四段時期的發文個別合併成單一 string,再合併成 corpus。

#Data-Preprocessing: combine posts as separate strings
page_PreElect_post <- paste(page_PreElect$message, collapse = " ")
page_First_post <- paste(page_First$message, collapse = " ")
page_Second_post <- paste(page_Second$message, collapse = " ")
page_Depart_post <- paste(page_Depart$message, collapse = " ")

#Data-Preprocessing: combine as corpus
corpus <- c(page_PreElect_post, page_First_post, page_Second_post, page_Depart_post)

清理 corpus,其中排除影響力過大且分析意義較低的自,比如網址連結中的“http”,“OFA”、總統名字“barack”,“obama”、職稱“president”。

#Data-Preprocessing: removing non-words
corpus2 <- gsub("\\W"," ",corpus)

#Data-Preprocessing: removing digits
corpus2 <- gsub("\\d"," ",corpus2)

#Data-Preprocessing: changing all to lower case
corpus2 <- tolower(corpus2)

#Data-Preprocessing: removing stopwords
corpus2 <- removeWords(corpus2,stopwords())

#Data-Preprocessing: removing single letters
corpus2 <- gsub("\\b[A-z]\\b{1}"," ",corpus2)

#Data-Preprocessing: removing whitespaces
corpus2 <- stripWhitespace(corpus2)

#Data-Preprocessing: removing irrelevant words
corpus2 <- gsub("http"," ",corpus2)
corpus2 <- gsub("ofa"," ",corpus2)
corpus2 <- gsub("barack"," ",corpus2)
corpus2 <- gsub("obama"," ",corpus2)
corpus2 <- gsub("president"," ",corpus2)

文字雲

由文字雲可以看出歐巴馬於四段時的發文內容趨勢,從中觀察各時期關注的議題有何差異。

#Create Comparison Word Cloud: create corpus vector
corpus3 <- Corpus(VectorSource(corpus2))

#Create Comparison Word Cloud: create Term Document Matrix
tdm <- TermDocumentMatrix(corpus3)
m <- as.matrix(tdm)

#Create Comparison Word Cloud: rename columns of tdm
colnames(m) <- c("Pre-Presidency", "First Term", "Second Term", "Post-Presidency")

#Create Comparison Word Cloud
comparison.cloud(m, max.words=200,scale=c(1.5,.3),random.order=FALSE,title.size=1.5)

  1. 任職前(Pre-Presidency)

    時間設定為歐巴馬在伊利諾州首府春田市正式宣布參選至選舉結果公告當日,而在這之前歐巴馬曾任伊利諾州參議員與聯邦參議員,可以看出常見的關鍵字“senator”, “supporters”, “campaign”等正可呼應這樣的時代背景。

  2. 第一任期(First Term)

    包含“friends”, “share”, “make”, “plan”等國家建設相關字眼,另因時間涵蓋到2012年總統大選,也可以看出“election”及前共和黨總統候選人“romney”出現頻率也相當高。

  3. 第二任期(Second Term)

    歐巴馬於第二任期間一再強調氣候變遷議題,反映在大量的關鍵字“climate”, “change”上。另外,2016年歐巴馬提名自由派法官賈蘭德(Garland)擔任美國最高法院大法官一事也獲一定的重視,從關鍵字“supreme”, “court”, “judge”, “garland”中可以略見一斑。

  4. 卸任後(Post-Presidency)

    關鍵字“young”, “people”, “country”顯示即便卸任後,歐巴馬依舊相當關心美國國民,尤其是年輕人的發展。而今年九月美國同婚推手溫莎伊迪絲(Edith “Edie” Windsor)辭世,歐巴馬發文緬懷盛讚其貢獻獲廣大回響,也可以從關鍵字“edie”看出。