Topic outline

  • Course Overview

    This course introduces the fundamentals of cleaning messy data. It provides a clear understanding about what messy datasets are and why they need to be cleaned, and gives lots of practical examples for cleaning data sets in different programs.

    Learning Outcomes

    This course will help you to:

    1. Recognize when data are messy and require cleaning
    2. Apply cleaning methods to messy datasets
    3. Understand how cleaning messy data contributes to good data management
    4. Perform quality control of data
    • Course Instructor: Dr. Alessandra Vigilante

      • Dr. Alessandra Vigilante

        Dr. Alessandra Vigilante is a Senior Lecturer in Bioinformatics at the Center for Stem Cells and Regenerative Medicine with a focus on genotype-phenotype interactions and data integration. Alessandra obtained her PhD in Bioinformatics in Naples (2008-2011) before moving to the UK to join the Nicholas Luscombe group first at the EMBL-European Bioinformatics Institute as a visiting student (2011-2012) and then as a postdoctoral fellow at UCL (2012-2017). Ale is actively involved in a great network of collaborations to develop multidisciplinary approaches to research efforts, working with faculty members within King’s and other research institutes. Her areas of interest include the implementation of novel computational methods for various bespoke analyses to gain biological insights.

        View Bio for Dr. Alessandra Vigilante
      • Module One: Help! My Data Are Messy

        This module will help you to: 

        1. Recognize what is meant by “messy data” 
        2. Identify when quantitative and qualitative data are messy    
        3. Predict common errors made while dealing with data   
      • Module Two: Why Clean Messy Data?

        This module will help you to: 

        1. Recognize how dealing with messy data makes analysis more complex    
        2. Discover the importance of the FAIR principles      
        3. Identify that messy data lead to inaccurate conclusions      
        4. Appreciate cleaning data as a skill for employability 
      • Module Three: How Can I Clean My Messy Data?

        This module will help you to: 

        1. Develop key skills for manually cleaning data in a spreadsheet      
        2. Recognize and avoid formatting problems 
        3. Master basic Excel terminology and concepts   
        4. Set up quality control systems to keep data clean   
        5. Familiarize yourself with different spreadsheet programs 


        Note: Topic 4: Cleaning Messy Data in LibreOffice Calc and Topic 5: Cleaning Messy Data in Apple Numbers are optional, and are not a requirement for completing this course. Feel free to skip these topics if they are not relevant to you.