Welcome back to our AWS Machine Learning Associate Series, which we started for our non tech blog readers to learn AWS Machine Learning for free to clear AWS ML Associate certification.. In the last post, we saw how raw data must be cleaned and validated before training.
Now, let’s imagine Jake and Ethan have done that work. The ledger is neat, the blanks are filled, and the duplicates are gone. But Ethan knows there’s still a hidden danger: bias in the data before training. In this post, lets see about pretraining Bias concepts like class imbalance, label imbalance, SMOTE, DL in AWS Machine Learning with same Jake and Ethan story for non tech readers to understand. Let's begin!
Chapter 1: Understanding Class Imbalance: The Core ML Bias Problem
Jake slid his old leather ledger across the counter with a grin. “You said you’d do some training to predict my iPhone sales, right? Well, here’s the book. Go ahead, train your magic machine.”
Ethan opened the ledger, flipping through the pages. “Uncle Jake, this isn’t magic. It’s machine learning. But before I can train anything, I need to check if your data is fair.”
Jake raised an eyebrow. “Fair? It’s just sales. What’s unfair about that?”
What is Class Imbalance in ML?
Ethan tapped a page. “Look here. Out of 1,000 entries, 900 are men buying phones and only 100 are women. That’s called class imbalance.”