Ive seen many discussions and videos claiming Optuna magically finds the best hyperparameter configurations for your models. But let’s be clear—Optuna relies on well-studied methods like Bayesian optimization and random search.
By default, Optuna uses Bayesian optimization with Tree-structured Parzen Estimators (TPE), which is also employed by Hyperopt. From a purely technical standpoint, Optuna isn't inherently better than Hyperopt.
What’s unique about Optuna, and what skyrocketed it to success, is the design of the API. You no longer need lines and lines of code to define the hyperparameter space, and then create an objective function to pass that space into your model.
Optuna has a define-by-run API, which means you can define the hyperparameter space as you need it—directly within the objective function. This simplicity has fully embraced the Pythonic way of coding, and it's a big part of why Optuna has become so popular. 🙌
What else sets Optuna apart:
- Support for various sampling methods (random search, TPE, grid search and more)
- Early stopping techniques like Successive Halving and Hyperband
- Good documentation that lets you get started fast
And I’ll squeeze this in: to tune hyperparameters of scikit-learn algorithms, I don’t think you need to get out of the scikit-learn ecosystem. Randomized search has been shown to be one of the best methods for optimization, if allowed enough trials. And scikit-learn also supports successive halving to stop training suboptimal configurations, so you can save tons of time.
So, when should you use Optuna? When training models outside the sklearn ecosystem, like xgboost or lightgbms, and for neural networks.
I hope you find it useful!