} -->

AWS Certified ML Exam preparation series: Data Cleaning, Imputation, Outlier Detection, and Feature Engineering with AWS Services (Part 6)

Welcome back to our AWS Machine Learning Associate Series, a series we started for our readers to learn about AWS Machine learning free. In the last post, we had seen whether a business problem really needs an ML solution and fundamentals of data . Now, let’s imagine the answer is yes. We’ve collected data — but raw data is messy. Before we can train any model, we need to first clean the data! So, lets’ talk about data cleaning in machine learningoutlier detection, and the AWS services that make this easier in this post. 

Alright, let’s begin..

The Problem: Dirty Data, Data Quality & Data Preprocessing

Jake sat at his counter one evening, flipping through his old leather ledger. To him, it was just a habit — jotting down sales, customer notes, and little reminders. But as Ethan, his nephew and budding data analyst, leaned over, he noticed something troubling.

Some entries were neat and clear:

  • “3 iPhone 15 Pro sold.”
  • “2 iPhone SE sold.”