深入淺出R語言數據分析

深入淺出R語言數據分析

作者: 米霖
出版社: 清華大學
出版在: 2020-09-01
ISBN-13: 9787302543886
ISBN-10: 7302543887





內容描述


本書首先介紹數據分析的方法論,然後介紹數據分析的相關模型方法,並進一步通過數據分析案例,講解數據分析的思維、方法及模型實現過程。本書重點介紹R語言在數據分析方面的應用,讓讀者能夠快速地使用R語言進行數據分析、構建模型。
本書分為17章,內容包括:使用R語言獲取數據、數據分析中的數據處理與數據探索、生存分析、主成分分析、多維縮放、線性回歸模型、邏輯回歸模型、聚類模型、關聯規則、隨機森林、支持向量機、神經網絡、文本挖掘、社交網絡分析,以及關於R語言數據分析的兩個延伸內容:H2O機器學習和R語言爬蟲。
本書內容通俗易懂,案例豐富,實用性強,特別適合R語言的入門讀者和進階讀者閱讀,也適合數據分析人員、數據挖掘人員等其他數據科學從業者。另外,本書也適用於統計學、電腦、機器學習、數學等相關專業的本科生、研究生使用。


目錄大綱


目 錄

第1章 數據分析項目的流程

1.1 數據分析項目中的角色························1

1.2 數據分析項目的階段····························2

1.2.1 制定目標·····················································3

1.2.2 收集數據·····················································3

1.2.3 數據處理和分析·········································4

1.2.4 構建模型·····················································7

1.2.5 評估模型·····················································8

1.2.6 展示結果·····················································9

1.2.7 部署與維護模型·······································10

1.3 總結······················································10

第2章 數據的讀取

2.1 RData數據 ··········································11

2.2 readr高效讀取數據 ····························13

2.3 讀取Excel數據 ··································16

2.4 讀取SPSS、SAS、STATA數據 ·······17

2.5 R語言操作數據庫 ······························19

2.6 總結······················································23

第3章 數 據 探 索

3.1 缺失值的識別與處理··························24

3.1.1 缺失值的識別與描述性統計···················25

3.1.2 缺失值的可視化展示·······························26

3.1.3 缺失值的處理方法···································28

3.2 異常值··················································33

3.3 dlookr數據處理包 ······························38

3.3.1 所有變量的一般性診斷···························38

3.3.2 數值型變量的診斷···································39

3.3.3 分類變量的診斷·······································39

3.3.4 異常值的診斷···········································40

3.3.5 創建診斷報告···········································41

3.3.6 數據處理···················································42

3.3.7 缺失值處理···············································43

3.3.8 異常值處理···············································44

3.3.9 數據轉換···················································46 
3.3.10 數據分箱·················································49 
3.3.11 創建數據轉換報告·································52 
3.4 數據相關性··········································53 
3.5 自動化創建數據探索報告··················57 
3.6 總結······················································60 
第4 章生存分析
4.1 生存分析的基本內容··························61 
4.2 使用R 語言進行生存分析·················64 
4.3 非參數模型··········································66 
4.3.1 使用Kaplan-Meier 方法擬合數據 ··········66 
4.3.2 Kaplan-Meier 方法的可視化 ···················68 
4.4 半參數模型生存分析方法··················70 
4.4.1 構建Cox 模型··········································70 
4.4.2 檢查假設···················································71 
4.4.3 Coxph 模型可視化···································73 
4.4.4 預測···························································74 
4.4.5 分層···························································75 
4.5 參數模型··············································77 
4.6 隨機生存森林模型······························80 
4.7 總結······················································82 
第5 章主成分分析
5.1 概述······················································83 
5.1.1 維度相關的問題·······································83 
5.1.2 檢測多重共線性·······································84 
5.1.3 方差膨脹因子···········································84 
5.2 主成分分析詳解··································85 
5.2.1 主成分分析的定義···································85 
5.2.2 主成分分析的簡單原理···························86 
5.2.3 主成分分析的算法···································87 
5.3 使用R 語言進行主成分分析·············88 
5.3.1 主成分分析的實現···································89 
5.3.2 主成分分析案例·······································91 
5.4 總結······················································96 
第6 章多維縮放
6.1 MDS 的工作原理································97 
6.3 MDS 的優點······································105 
6.2 在R 語言中實現MDS·······················98 
6.4 總結····················································106 

第7 章線性回歸模型
7.1 線性回歸模型概述····························107 
7.2 在R 語言中實現回歸模型···············108 
7.2.1 圖形分析·················································109 
7.2.2 建立線性模型·········································114 
7.2.3 回歸模型的圖形診斷·····························119 
7.2.4 預測模型·················································122 
7.2.5 抽樣方法·················································124 
7.3 總結····················································126 
第8 章邏輯回歸模型
8.1 邏輯回歸的原理································127 
8.2 在R 語言中實現邏輯回歸模型·······128 
8.2.1 數據探索·················································129 
8.2.2 構建邏輯回歸模型·································131 
8.2.3 邏輯回歸預測·········································133 
8.2.4 邏輯回歸模型評估·································133 
8.3 總結····················································136 
第9 章聚類模型
9.1 概述····················································137 
9.1.1 聚類算法·················································137 
9.1.2 K均值聚類的原理·································138 
9.2 在R 語言中實現聚類模型···············139 
9.2.1 K均值聚類·············································140 
9.2.2 層次聚類·················································143 
9.2.3 Medoids 聚類(PAM) ·························144 
9.3 總結····················································146 
第10 章關聯規則
10.1 關聯規則概述··································147 
10.2 關聯規則的基本概念······················148 
10.3 在R 語言中實現關聯規劃·············148 
10.3.1 訓練模型···············································151 
10.3.2 模型的評估···········································153 
10.3.3 提升關聯規則的效果···························154 
10.3.4 關聯規則的可視化·······························155 
10.4 總結··················································158 

第11 章隨機森林
11.1 隨機森林的基本概念······················159 
11.3 總結··················································167 
11.2 在R 語言中實現隨機森林 ·············161 
第12 章支持向量機
12.1 概述··················································168 
12.3 總結··················································179 
12.2 在R 語言中實現支持向量機·········171 
第13 章神經網絡
13.2.2 評估模型效果·······································187
13.1 概述··················································180 
13.2 在R 語言中實現神經網絡·············182 
13.3 總結··················································192 
13.2.1 構建神經網絡模型·······························185 
第14 章文本挖掘
14.1 概述··················································193 
14.2 text2vec 背景及其基本原理 ···········194 
14.3 DTM 與TFIDF 的原理和實現·······194 
14.3.1 DTM 和TFIDF 的原理························194 
14.3.2 DTM 的實現·········································196 
14.3.3 TFIDF 的實現·······································199 
14.4 情感分析··········································199 
14.5 LDA 主題模型及其實現 ················206 
14.6 構建自動問答系統··························208 
14.7 總結··················································211 
第15 章社交網絡分析
15.1 社交網絡概述··································212 
15.2 igraph 簡介 ······································213 
15.2.1 準備工作···············································214 
15.2.2 圖的指標計算·······································215 
15.3 社交網絡的常見結構······················217 
15.4 社交網絡分析算法······················220 

IX 
目錄
15.4.1 Girvan-Newman ···································· 221 
15.4.2 基於傳播標簽的社區檢測··················· 223 
15.4.3 基於貪婪優化模塊的社區檢測··········· 224 
15.4.4 自旋轉玻璃社群··································· 224 
15.5 微博社交群體分析·························· 225 
15.5.1 自旋轉玻璃社群··································· 226 
15.5.2 社群檢測··············································· 228 
15.6 總結·················································· 229 
第16 章 H2O 機器學習
16.1 H2O 機器學習平臺························· 230 
16.2 在R 語言中使用H2O ···················· 231 
16.2.1 H2O 的安裝·········································· 231 
16.2.2 案例應用··············································· 231 
16.2.3 H2O 常用API ······································ 234 
16.2.4 模型的通用參數··································· 235 
16.2.5 參數調整··············································· 235 
16.3 H2O Flow········································· 238 
16.3.1 H2O Flow 的安裝································· 238 
16.3.2 H2O Flow 的基本使用方法················· 239 
16.4 總結·················································· 244 
第17 章 R 語言爬蟲
17.1 快速爬取網頁數據·························· 245 
17.2 rvest 簡介········································· 247 
17.2.1 rvest API················································ 248 
17.2.2 rvest API 詳解······································· 249 
17.3 爬取BOSS 直聘數據······················ 250 
17.4 模擬登錄·········································· 254




相關書籍

Introduction to AI Robotics, 2/e (Hardcover)

作者 Robin R. Murphy

2020-09-01

Spark 全棧數據分析

作者 Russell Jurney

2020-09-01

NLTK基礎教程—用NLTK和Python庫構建機器學習應用 (NLTK Essentials)

作者 哈登尼亞

2020-09-01