Welcome back to our AWS Machine Learning Associate Series, a series we started for our readers to learn about AWS Machine learning free.
In the last post, we had seen whether a business problem really needs an ML solution and fundamentals of data . Now, let’s imagine the answer is yes. We’ve collected data — but raw
data is messy. Before we can train any model, we need to first clean the data! So,
lets’ talk about data cleaning in machine learning, outlier
detection, and the AWS services that make this easier in this post.
Alright, let’s begin..
The Problem: Dirty Data, Data Quality & Data Preprocessing
Jake sat at his counter one evening, flipping through his
old leather ledger. To him, it was just a habit — jotting down sales, customer
notes, and little reminders. But as Ethan, his nephew and budding data analyst,
leaned over, he noticed something troubling.
Some entries were neat and clear:
- “3
iPhone 15 Pro sold.”
- “2 iPhone SE sold.”