Drop Missing Data Solution
In this tutorial, we will solve the Drop Missing Data problem. We will provide the implementation of the solution in Python.
Problem Description​
DataFrame students
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
+-------------+--------+
There are some rows having missing values in the name column.
Write a solution to remove the rows with missing values.
Examples​
Example 1:
Input:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 32 | Piper | 5 |
| 217 | None | 19 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
+------------+---------+-----+
Output:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 32 | Piper | 5 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
+------------+---------+-----+
Explanation:
Student with id 217 havs empty value in the name column, so it will be removed.
Constraints​
- You have to solve using python pandas only.
Solution for Drop Missing Data​
import pandas as pd
def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
return students[students['name'].notnull()]
Complexity Analysis​
- Time Complexity:
- Space Complexity:
- Iterating over the DataFrame: The notnull() function on the 'name' column performs an O(n) operation, iterating through each row of the DataFrame, where n is the number of rows.
- Boolean indexing: Usually an O(n) process, this involves building a new DataFrame from the boolean mask (the output of notnull()).
- New DataFrame: To hold the filtered data, the function builds a new DataFrame. The number of rows with complete "name" values determines the size of this new DataFrame, and in the worst scenario, that number may reach n. As a result, O(n) is also the space complexity.
Authors:
Loading...