} -->

AWS ML Exam preparation FREE Guide: Pretraining Bias on Class, Label Imbalance, SMOTE, DPL & Clarify Explained (Part 7)

Welcome back to our AWS Machine Learning Associate Series, which we started for our non tech blog readers to learn AWS Machine Learning for free to clear AWS ML Associate certification.. In the last post, we saw how raw data must be cleaned and validated before training.

Now, let’s imagine Jake and Ethan have done that work. The ledger is neat, the blanks are filled, and the duplicates are gone. But Ethan knows there’s still a hidden danger: bias in the data before training. In this post, lets see about pretraining Bias concepts like class imbalance, label imbalance, SMOTE, DL in AWS Machine Learning with same Jake and Ethan story for non tech readers to understand. Let's begin!

Chapter 1: Understanding Class Imbalance: The Core ML Bias Problem

Jake slid his old leather ledger across the counter with a grin. “You said you’d do some training to predict my iPhone sales, right? Well, here’s the book. Go ahead, train your magic machine.”

Ethan opened the ledger, flipping through the pages. “Uncle Jake, this isn’t magic. It’s machine learning. But before I can train anything, I need to check if your data is fair.”

Jake raised an eyebrow. “Fair? It’s just sales. What’s unfair about that?”

What is Class Imbalance in ML?

Ethan tapped a page. “Look here. Out of 1,000 entries, 900 are men buying phones and only 100 are women. That’s called class imbalance.”

AWS Certified ML Exam preparation series: Data Cleaning, Imputation, Outlier Detection, and Feature Engineering with AWS Services (Part 6)

Welcome back to our AWS Machine Learning Associate Series, a series we started for our readers to learn about AWS Machine learning free. In the last post, we had seen whether a business problem really needs an ML solution and fundamentals of data . Now, let’s imagine the answer is yes. We’ve collected data — but raw data is messy. Before we can train any model, we need to first clean the data! So, lets’ talk about data cleaning in machine learningoutlier detection, and the AWS services that make this easier in this post. 

Alright, let’s begin..

The Problem: Dirty Data, Data Quality & Data Preprocessing

Jake sat at his counter one evening, flipping through his old leather ledger. To him, it was just a habit — jotting down sales, customer notes, and little reminders. But as Ethan, his nephew and budding data analyst, leaned over, he noticed something troubling.

Some entries were neat and clear:

  • “3 iPhone 15 Pro sold.”
  • “2 iPhone SE sold.”