Is this email not displaying correctly? View it in your browser.
Train in Data, learn machine learning online

Is Resampling Imbalanced Datasets a Thing of the Past?

Image description

Welcome to Data Bites!



Every Monday, I’ll drop a no-fluff, straight-to-the-point tip on a data science skill, tool, or
method to help you stay sharp in the field. I hope you find it useful!

🎉 Big December News: A New Way to Pay!

We’ve heard you...👂🏻 international fees and surprise charges can be a pain.



So this month, we’re testing something new… and you get first access. 🙌

Image description

💳 You can now pay in your own currency. NO hidden fees, NO extra charges.



For now, it’s available on just two courses (like a secret soft launch 😉):

Machine Learning Interpretability
Clustering and Dimensionality Reduction



And because you're helping us test this new feature, here’s a thank-you:

🎟 30% OFF all month with code HELPUSTEST
⏰ Until December 31st



If anything feels odd or doesn’t work smoothly, just email us at [email protected]. We’ll fix it right away.



Thank you for being part of our global learning family.



Here’s to smoother, simpler learning this December. 🌍✨

Enroll today and enjoy 30% OFF

Is Resampling Imbalanced Datasets a Thing of the Past?

Mounting evidence suggests that there are better ways to tackle imbalanced datasets that do not include resampling.



For once, we should use strong classifiers like xgboost and catboost.



Then, we need to optimize the decision threshold for the classification, and not just use 0.5



There is also cost-sensitive learning to optimize the model at no extra cost. It comes backed into most model implementations.



🤔 So then, when should we use resampling?



👉 Resampling can still be used when it is impossible to use a strong classifier, say for legacy, or for whatever reason that I can’t imagine. It has been shown to improve the performance of weaker learners like random forests, adaboost, SVMs and MLPs.



👍Resampling can also be useful if the model does not output a probability, just a class.



💡So as always, there is not one solution that fits all. Depending on the project, the model, and the data, resampling may be a tool we can use, or it is better to stay away from it.


I hope this information was useful!



Wishing you a successful week ahead - see you next Monday! 👋🏻


Sole

Merry Christmas from Train in Data! 🎄✨

Image description

Wishing you a joyful and peaceful holiday season filled with warmth, gratitude, and meaningful moments.



Thank you for being part of our learning community this year, your curiosity, dedication, and passion inspire everything we do.



May this festive season bring light to your days and hope to your year ahead. 



Warm wishes,

Train in Data Team 💙

Ready to enhance your skills?

Our specializations, courses and books are here to assist you:

More courses

Did someone share this email with you? Think it's pretty cool? Then just hit the button and subscribe to Data Bites. Don’t miss out on any of our tips and propel your data science career to new heights.

Subscribe
Image description

Hi…I’m Sole



The main instructor at Train in Data. My work as a data scientist, includes creating and implementing machine learning models for evaluating insurance claims, managing credit risk, and detecting fraud. In 2018, I was honoured with a Data Science Leaders' award, and in 2019 and again in 2024, I was acknowledged as one of LinkedIn's voices in data science and analytics.

View

You are receiving this email because you subscribed to our newsletter, signed up on our website, purchased or downloaded any products from us.

Follow us on social media

Copyright (C) 2025 Train in Data. All rights reserved.

If you would like to unsubscribe, please click here.