Contribute Media
A thank you to everyone who makes this possible: Read More

Understanding Missing Data & How to Deal with It

Description

Missing data has become an increasingly troublesome problem. With the explosion of data, so has the amount of incomplete data. With the proliferation of data science and machine learning techniques. Complete datasets are needed as machine learning algorithms as well as many statistical estimators are not tolerant of missing fields. This workshop explores the mechanisms of missingness and techniques to treat or impute missing values so that further tools can use the data.

This hands on tutorial will demonstrate to the participant the three categories of missingness mechanisms. Furthermore, the participant will be shown how to determine the degree of missing data damage and how to evaluate whether the data set is still viable for the user’s purpose. The participants will apply various techniques for treating missing data including a variety of imputation techniques. By the end of the workshop, the participant should have acquired the skills to assess the degree of missing data in a data set and apply an appropriate imputation technique to impute missing data.

This tutorial is intended for participants with an intermediate level understanding of Python and basic understanding of Pandas. The majority of the tutorial uses Jupyter notebooks, so participants should be comfortable in that environment. While not necessary, a rudimentary familiarity with tensorflow and keras is helpful.

https://github.com/WestHealth/scipy2022-missingness-tutorial/blob/main/README.md

Improve this page