Course: Fundamentals of Quantitative Text Analysis

Section outline

Course Overview
Course Overview
The course surveys methods for systematically analyzing text using statistical methods and procedures for social scientific purposes, starting with classical content analysis, dictionary-based methods and the introduction scaling methods. The course lays a theoretical foundation for text analysis but also takes a practical and applied approach, so that students learn how to apply these methods in actual research. The common focus across all methods is that they can be reduced to a three-step process: first, identifying text and units for analysis; second, organizing texts into a structured corpus in preparation for analysis; and third, extracting from the texts quantitatively measured features, such as coded content categories, word counts, word types, dictionary counts, or their authors. The course surveys these methods in a logical progression.

This course contains a lot of guided activities, and there is an expectation that you immerse yourself in the tasks and investigate additional documentation and features of the tools presented.

Learning Outcomes

This course will help you to:
1. Explore the theoretical basis for Quantitative Text Analysis
2. Survey methods for systematically extracting quantitative information from text for social scientific purposes
3. Identify texts and units of texts for analysis
4. Convert texts into matrices for quantitative analysis
5. Analyze these matrices in order to generate inferences using statistical methods
Course Contributors
Course Contributors
- Blake Miller
  Blake Miller is an Assistant Professor of Computational Social Science in the Methodology Department at the London School of Economics. They received their PhD in Political Science and Scientific Computing from the University of Michigan in 2018 where they were also a graduate research affiliate in the Lieberthal-Rogel Center for Chinese Studies. Before coming to LSE, they were a Post-Doctoral Fellow at the Dartmouth College Program in Quantitative Social Science. For more information, please visit www.blakeapm.com.
  View Bio for Blake Miller
- Professor Jonathan Slapin
  Jonathan Slapin is Professor and Chair of Political Institutions and European Government in the Institute for Political Science at the University of Zürich. His research interests include comparative political institutions, parties, legislatures, quantitative content analysis, European politics, and European integration. His Cambridge University Press book The Politics of Parliamentary Debate (co-authored with Sven-Oliver Proksch) was awarded both the 2016 Richard Fenno Prize and the 2016 Leon Epstein Outstanding Book Award.
  View Bio for Professor Jonathan Slapin
Course Resources
Course Resources

You will need to access certain files and resources throughout the course to get the most out of the activities. You can find them all here.
Video Transcripts
Video Transcripts

You can access all video transcripts here.
Pre-Course Self Assessment
Pre-Course Self Assessment

Before you dive into this course, spend a few moments reflecting on your familiarity with the topic and your current level of skills confidence.

You will then re-visit the same questions in our Post-Course Self Assessment and reflect on how the course has helped you develop in confidence and grow your skills.
Module One: Introduction to Text Analysis and Conceptual Foundations
Module One: Introduction to Text Analysis and Conceptual Foundations
This module covers:
1. Introduction explaining course purpose: goals and objectives
2. Conceptual foundations of text analysis
3. Development of quantitative text analysis method
4. Logistics and software - required setup and work files
5. A basic example of performing a text analysis
Module Two: The Basics of Working with Textual Data
Module Two: The Basics of Working with Textual Data
This module covers:
1. Where to obtain textual data
2. Formatting and working with text files
3. Practical considerations of indexing and metadata
4. Units of analysis: strategies for selecting units of analysis
5. Overview and examination of complexity and readability measures
Module Three: Examining Individual Word Occurrences
Module Three: Examining Individual Word Occurrences
This module covers:
1. Keywords in context Coverage and examples of KWIC
2. Consideration of concordance and dictionaries
3. Detecting and identifying collocations
4. Stemming: An in-depth discussion of text types, tokens, and equivalencies
5. Stop words and feature weighting: An in-depth discussion of text types, tokens, and equivalencies
Module Four: Comparing Across Texts
Module Four: Comparing Across Texts
This module covers:
1. Euclidean distance and its use in comparing texts
2. Cosine similarity and its use in comparing texts General principles and rationale for dictionaries
3. External dictionaries: How to add a third party dictionary
4. How to create your own dictionary
5. Overview of wordscores
6. Implementing in R - a basic model
Post-Course Self Assessment
Post-Course Self Assessment

Now you’ve completed the course, spend a few moments reflecting on where your familiarity with the topics and your confidence skills le vels are at now.

Has the course helped you develop new skills and grow your confidence?

You'll need to complete the Post-Course Self Assessment in order to download your certificate. If you didn't do the Pre-Course Self Assessment before starting the course, please go to the top of the page and reflect on your familiarity with the topic and your level of skills confidence before you started the course.
Completion: Certificate
Completion: Certificate

Completing all modules (plus the pre and post-course assessments) will unlock the course certificate, which you can then download here. Your course certificate will only be made available once you have completed all these sections.

If you have difficulty accessing your certificate, please contact the Sage support team at: onlinesupport@sagepub.co.uk. You can also check out this FAQ page which may be helpful.
Give Feedback About This Course
Give Feedback About This Course

Did you enjoy the course? Please take two minutes to share your feedback. We use learner feedback in future course updates and developments to provide an excellent learning experience.
Accessibility
Accessibility
We have high standards of accessibility on Sage Campus and as of May/June 2024 all activities within this course are keyboard and screen reader compatible. For more details on accessibility standards, please see the Sage Campus Accessibility Guide.

For those using assistive technology, please note that within this course:
- Tab components: JAWS and NVDA behave slightly differently. For NVDA to keep reading, it is best to exit focus mode and go back to browse mode.
- Matching: JAWS does not read out question label on dropdown focus.

Fundamentals of Quantitative Text Analysis

Section outline

Course Overview

Course Contributors

Blake Miller

Professor Jonathan Slapin

Course Resources

Video Transcripts

Pre-Course Self Assessment

Module One: Introduction to Text Analysis and Conceptual Foundations

Module Two: The Basics of Working with Textual Data

Module Three: Examining Individual Word Occurrences

Module Four: Comparing Across Texts

Post-Course Self Assessment

Completion: Certificate

Give Feedback About This Course

Accessibility