Skip to main content

Drop Missing Data Solution

In this tutorial, we will solve the Drop Missing Data problem. We will provide the implementation of the solution in Python.

Problem Description​

DataFrame students
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
+-------------+--------+

There are some rows having missing values in the name column.

Write a solution to remove the rows with missing values.

Examples​

Example 1:

Input:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 32 | Piper | 5 |
| 217 | None | 19 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
+------------+---------+-----+

Output:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 32 | Piper | 5 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
+------------+---------+-----+

Explanation:
Student with id 217 havs empty value in the name column, so it will be removed.

Constraints​

  • You have to solve using python pandas only.

Solution for Drop Missing Data​

import pandas as pd

def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
return students[students['name'].notnull()]

Complexity Analysis​

  • Time Complexity: O(n)O(n)
  • Space Complexity: O(n)O(n)
  • Iterating over the DataFrame: The notnull() function on the 'name' column performs an O(n) operation, iterating through each row of the DataFrame, where n is the number of rows.
  • Boolean indexing: Usually an O(n) process, this involves building a new DataFrame from the boolean mask (the output of notnull()).
  • New DataFrame: To hold the filtered data, the function builds a new DataFrame. The number of rows with complete "name" values determines the size of this new DataFrame, and in the worst scenario, that number may reach n. As a result, O(n) is also the space complexity.

Authors:

Loading...