Section outline

  • Course Overview

    This course introduces the fundamentals of cleaning messy data. It provides a clear understanding about what messy datasets are and why they need to be cleaned, and gives lots of practical examples for cleaning data sets in different programs.

    Learning Outcomes

    This course will help you to:

    1. Recognize when data are messy and require cleaning
    2. Apply cleaning methods to messy datasets
    3. Recognize how cleaning messy data contributes to good data management
    4. Perform quality control of data
    • Course Instructor: Dr. Alessandra Vigilante

      • Dr. Alessandra Vigilante

        Dr. Alessandra Vigilante is a Senior Lecturer in Bioinformatics at the Center for Stem Cells and Regenerative Medicine with a focus on genotype-phenotype interactions and data integration. Alessandra obtained her PhD in Bioinformatics in Naples (2008-2011) before moving to the UK to join the Nicholas Luscombe group first at the EMBL-European Bioinformatics Institute as a visiting student (2011-2012) and then as a postdoctoral fellow at UCL (2012-2017). Ale is actively involved in a great network of collaborations to develop multidisciplinary approaches to research efforts, working with faculty members within King’s and other research institutes. Her areas of interest include the implementation of novel computational methods for various bespoke analyses to gain biological insights.

        View Bio for Dr. Alessandra Vigilante
      • Course Resources

        You will need to access certain files and resources throughout the course to get the most out of the activities. You can find them all here.   

      • Video Transcripts

        You can access all video transcripts here.

      • Pre-Course Self Assessment

        Before you dive into this course, spend a few moments reflecting on your familiarity with the topic and your current level of skills confidence.  

        You will then re-visit the same questions in our Post-Course Self Assessment and reflect on how the course has helped you develop in confidence and grow your skills. 

        • Module One: Help! My Data Are Messy

          This module will help you to: 

          1. Recognize what is meant by “messy data” 
          2. Identify when quantitative and qualitative data are messy    
          3. Predict common errors made while dealing with data   
        • Module Two: Why Clean Messy Data?

          This module will help you to: 

          1. Recognize how dealing with messy data makes analysis more complex    
          2. Discover the importance of the FAIR principles      
          3. Identify that messy data lead to inaccurate conclusions      
          4. Appreciate cleaning data as a skill for employability 
        • Module Three: How Can I Clean My Messy Data?

          This module will help you to: 

          1. Develop key skills for manually cleaning data in a spreadsheet      
          2. Recognize and avoid formatting problems 
          3. Master basic Excel terminology and concepts   
          4. Set up quality control systems to keep data clean   
          5. Familiarize yourself with different spreadsheet programs 


          Note: Topic 4: Cleaning Messy Data in LibreOffice Calc and Topic 5: Cleaning Messy Data in Apple Numbers are optional, and are not a requirement for completing this course. Feel free to skip these topics if they are not relevant to you. 

        • Glossary of Key Terms

          In addition to the glossary you’ll find woven throughout the course, you can find the full glossary collated in one place here.  

        • Post-Course Self Assessment

          Now you’ve completed the course, spend a few moments reflecting on where your familiarity with the topics and your confidence skills levels are at now.  

          Has the course helped you develop new skills and grow your confidence? 

          You'll need to complete the Post-Course Self Assessment in order to download your certificate. If you didn't do the Pre-Course Self Assessment before starting the course, please go to the top of the page and reflect on your familiarity with the topic and your level of skills confidence before you started the course.

          • Completion: Certificate

            Completing all modules (plus the pre and post-course assessments) will unlock the course certificate, which you can then download here. Your course certificate will only be made available once you have completed all these sections.

            If you have difficulty accessing your certificate, please contact the Sage support team at: onlinesupport@sagepub.co.uk. You can also check out this FAQ page which may be helpful.

            • Give Feedback About This Course

              Did you enjoy the course? Please take two minutes to share your feedback. We use learner feedback in future course updates and developments to provide an excellent learning experience. 

            • Accessibility

              We have high standards of accessibility on Sage Campus and as of May/June 2024 all activities within this course are keyboard and screen reader compatible. For more details on accessibility standards, please see the Sage Campus Accessibility Guide.

              For those using assistive technology, please note that within this course:

              • Tab components: JAWS and NVDA behave slightly differently. For NVDA to keep reading, it is best to exit focus mode and go back to browse mode. 
              • Matching: JAWS does not read out question label on dropdown focus.