Drop Missing Data Solution

In this tutorial, we will solve the Drop Missing Data problem. We will provide the implementation of the solution in Python.

Problem Description​

DataFrame students
| Column Name | Type |
| student_id | int |
| name | object |
| age | int |

There are some rows having missing values in the name column.

Write a solution to remove the rows with missing values.


Example 1:

| student_id | name | age |
| 32 | Piper | 5 |
| 217 | None | 19 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |

| student_id | name | age |
| 32 | Piper | 5 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |

Student with id 217 havs empty value in the name column, so it will be removed.


  • You have to solve using python pandas only.

Solution for Drop Missing Data​

import pandas as pd

def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
return students[students['name'].notnull()]

Complexity Analysis​

  • Time Complexity: O(n)O(n)
  • Space Complexity: O(n)O(n)
  • Iterating over the DataFrame: The notnull() function on the 'name' column performs an O(n) operation, iterating through each row of the DataFrame, where n is the number of rows.
  • Boolean indexing: Usually an O(n) process, this involves building a new DataFrame from the boolean mask (the output of notnull()).
  • New DataFrame: To hold the filtered data, the function builds a new DataFrame. The number of rows with complete "name" values determines the size of this new DataFrame, and in the worst scenario, that number may reach n. As a result, O(n) is also the space complexity.

