Data Mining in Social Science

Updated 8 months ago


"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage - to move in the opposite direction."

E. F. Schumacher


The flood of big data brings a urgent request for scholars to level up their skills. This is more challenging to social scientists who have zero programming experience.

This book provides a comprehensive but shallow and naive introduction on programming tools needed for a typical "data science" project. This usually covers three stages, data collection, analysis, and visualization.

This book requires no background in programming thus is particularly suitable for "computational social scientists". But I would also be happy if scholars and professionals from other areas find them benefit from this book. Note that although the Python and Processing codes provided in this book are generally easy to read and to play around with, customizing them for real projects may take a lot of extra practices.


1. Introduction

1.1 Beautiful Data and Human Behavior

1.2 Python for Basic Data Analysis

2. Data Collection

2.1 Connecting to Twitter API

2.2 Scraping Articles from The Washington Post

2.3 Processing the Dataset of Stack Exchange

2.4 Retrieving Raw Data from Figures

3. Data Analysis

3.1 Clustering Countries by Constitutions

3.2 Detecting Influential Papers in Citation Networks

3.3 Measuring the Difficulty of Questions on Q&A sites

3.4 Discovering the Hidden Structure of Global Money Flows

3.5 Modeling the Growth of Cities Using Satellite Images

4. Data Visualization

4.1 Network

4.2 Text

4.3 Map

4.4 Data Visualization using PlotDevice


Building Machine Learning Systems with Python

Programming Collective Intelligence: Building Smart Web 2.0 Applications