Hyperparameter Tuning: Grid Search, Random Search, and Bayesian Optimization

Hyperparameter tuning is the process of finding the best settings for a machine learning model. These settings control how the model learns, how complex it becomes, and how well it generalizes to unseen data.

Common tuning methods include Grid Search, Random Search, and Bayesian Optimization. Each method searches the hyperparameter space differently, with different trade-offs between simplicity, speed, and efficiency.

What are Hyperparameters?

Hyperparameters are model settings chosen before training begins. They are not learned directly from the data. Instead, they guide the learning process and influence the final model.

For example, in a Random Forest, the number of trees and maximum tree depth are hyperparameters. In logistic regression, regularization strength is a hyperparameter. In KNN, the value of K is a hyperparameter.

Core Idea: Parameters are learned by the model during training. Hyperparameters are chosen by the modeller before training and tuned using validation performance.

Parameters vs Hyperparameters

Concept	Meaning	Example	How It Is Determined
Model Parameter	Value learned from training data.	Regression coefficient, tree split, model weight.	Learned automatically during training.
Hyperparameter	Setting that controls the learning process.	Tree depth, learning rate, number of neighbors, regularization strength.	Chosen and tuned by the modeller.

Why Hyperparameter Tuning Matters

The same algorithm can perform very differently with different hyperparameter settings. A decision tree with unlimited depth may overfit, while a tree with very low depth may underfit. A boosting model with a very high learning rate may become unstable, while a very low learning rate may require many trees.

Hyperparameter tuning helps find a balance between underfitting, overfitting, accuracy, training time, interpretability, and business usefulness.

Important: Hyperparameter tuning should be based on validation or cross-validation performance, not training performance. Tuning on training results encourages overfitting.

Search Methods at a Glance

Visual Intuition

Grid Search

Random Search

Bayesian Optimization

What is Grid Search?

Grid Search tests every combination of hyperparameter values from a predefined grid. If we define three values for maximum depth and four values for number of trees, grid search tests all twelve combinations.

It is simple, systematic, and easy to explain. However, it can become very slow when many hyperparameters and many possible values are included.

Hyperparameter	Values to Try	Grid Search Meaning
max_depth	3, 5, 7	Try each depth value.
n_estimators	100, 200, 300	Try each tree count.
learning_rate	0.01, 0.05, 0.1	Try each learning rate.

Grid Search Logic: If there are 3 depth values, 3 tree-count values, and 3 learning-rate values, the search tests 3 × 3 × 3 = 27 combinations.

Advantages and Limitations of Grid Search

Advantages

Simple and easy to understand.
Systematically checks all specified combinations.
Useful when the search space is small.
Good for final fine-tuning around a promising region.

Limitations

Can be very slow for large search spaces.
Wastes time testing unimportant combinations.
Only tests values explicitly listed in the grid.
Can become expensive with cross-validation.

What is Random Search?

Random Search randomly samples hyperparameter combinations from a defined search space. Instead of testing every possible combination, it tests a fixed number of random combinations.

Random Search is often more efficient than Grid Search when some hyperparameters matter much more than others. It can explore a wider range of values with fewer trials.

Grid Search	Random Search
Tests every combination in a fixed grid.	Tests randomly selected combinations.
Can be inefficient when many hyperparameters are included.	Can cover more diverse values with fewer trials.
Best for small, focused search spaces.	Best for broad exploration and large search spaces.

Advantages and Limitations of Random Search

Advantages

Often faster than Grid Search.
Explores wider hyperparameter ranges.
Works well when only some hyperparameters are highly important.
Lets you control the number of trials directly.

Limitations

Results can vary depending on random seed.
May miss the best region if too few trials are used.
Less systematic than Grid Search.
Still may be expensive for very slow models.

What is Bayesian Optimization?

Bayesian Optimization is a smarter hyperparameter search method. Instead of testing combinations blindly, it learns from previous trials and chooses the next combination based on what seems promising.

It builds a probabilistic model of the relationship between hyperparameters and validation performance. Then it uses that model to decide where to search next.

Simple Explanation: Bayesian Optimization remembers what worked and what did not, then uses that knowledge to make better choices for the next trial.

Advantages and Limitations of Bayesian Optimization

Advantages

More efficient than blind search in many cases.
Useful when model training is expensive.
Adapts based on previous results.
Can find strong configurations with fewer trials.

Limitations

More complex to understand and implement.
Requires careful setup of search spaces.
May still overfit validation results if overused.
Not always necessary for simple models or small search spaces.

Grid Search vs Random Search vs Bayesian Optimization

Method	How It Searches	Best Used When	Main Trade-Off
Grid Grid Search	Tests all predefined combinations.	Search space is small and well understood.	Can be slow and rigid.
Random Random Search	Tests randomly sampled combinations.	Search space is large or uncertain.	May miss good regions if trials are too few.
Bayesian Bayesian Optimization	Uses past results to choose promising next trials.	Training is expensive and efficient search matters.	More complex and requires careful setup.

Hyperparameter Tuning with Cross-Validation

Hyperparameter tuning is usually combined with cross-validation. Each hyperparameter combination is evaluated across multiple folds, and the average validation score is used to compare combinations.

This reduces the chance of choosing a setting that performs well only on one lucky validation split.

Cross-Validated Tuning Workflow

Define Model

→

Define Search Space

→

Run CV for Each Setting

→

Compare Mean Scores

→

Select Best Configuration

Choosing the Right Evaluation Metric

Hyperparameter tuning optimizes a metric. Therefore, choosing the wrong metric can lead to the wrong model. The metric should match the business objective.

Problem	Possible Tuning Metric	Why
House Price Prediction	MAE or RMSE.	Measures prediction error in business-relevant units.
Fraud Detection	Recall, F1, PR-AUC.	Positive class is rare and important.
Customer Churn	F1, recall, precision, lift, or business profit.	Depends on retention cost and customer value.
Loan Default	ROC-AUC, recall, precision, cost-sensitive score.	False approvals and false rejections have different costs.

Important Hyperparameters by Model Type

Model	Important Hyperparameters	What They Control
Linear / Logistic Regression	Regularization strength, penalty type.	Coefficient shrinkage and overfitting control.
KNN	n_neighbors, distance metric, weights.	Local smoothness and similarity calculation.
Decision Tree	max_depth, min_samples_leaf, min_samples_split.	Tree complexity and overfitting control.
Random Forest	n_estimators, max_depth, max_features, min_samples_leaf.	Number of trees, tree complexity, and randomness.
Gradient Boosting / XGBoost / LightGBM	learning_rate, n_estimators, max_depth, num_leaves, subsample, regularization.	Sequential learning speed, complexity, and overfitting control.
SVM	C, kernel, gamma.	Margin flexibility, boundary type, and non-linear sensitivity.

Search Space Design

A search method is only as good as the search space. If the search space is too narrow, the best setting may be missed. If it is too wide, tuning may waste time on unrealistic combinations.

Good Search Space

Includes realistic values.
Covers both simple and complex models.
Uses domain and model knowledge.
Starts broad, then narrows for fine-tuning.

Poor Search Space

Too many irrelevant values.
Only tests extremely complex models.
Ignores known model behaviour.
Uses values copied blindly from another project.

Example: Tuning a Random Forest

Customer Churn Classification

A telecom company wants to tune a Random Forest model for churn prediction. The team chooses F1 score as the primary metric because both false positives and false negatives matter.

Hyperparameter	Candidate Values	Expected Effect
n_estimators	100, 300, 500	More trees improve stability but increase training time.
max_depth	4, 8, 12, None	Controls tree complexity and overfitting.
min_samples_leaf	1, 5, 10, 20	Higher values create smoother trees and reduce overfitting.
max_features	sqrt, log2, 0.5	Controls randomness and feature selection at each split.

Example: Tuning a Gradient Boosting Model

Sales Forecasting Regression

A retail company wants to tune a gradient boosting model for weekly sales prediction. The primary metric is MAE because business users want average error in units sold or revenue.

Hyperparameter	Candidate Values	Expected Effect
learning_rate	0.01, 0.03, 0.05, 0.1	Lower values learn slowly but may generalize better.
n_estimators	200, 500, 1000	More trees may improve performance but can overfit without early stopping.
max_depth	2, 3, 5, 7	Controls interaction complexity.
subsample	0.6, 0.8, 1.0	Adds randomness and can reduce overfitting.

Early Stopping in Tuning

Early stopping is commonly used with boosting models. It stops training when validation performance stops improving. This prevents the model from continuing to learn noise after it has learned useful signal.

Early stopping is especially useful when tuning the number of trees or boosting rounds. Instead of guessing the exact number of trees, the model can stop when validation error stops improving.

Practical Rule: Use early stopping with a validation set for boosting models. It often gives better generalization and reduces unnecessary training time.

Nested Cross-Validation

Nested cross-validation is used when we want a more unbiased estimate of performance after hyperparameter tuning. It uses an inner loop for tuning and an outer loop for evaluation.

This is more computationally expensive, but it helps avoid overly optimistic results caused by repeatedly selecting the best hyperparameters on the same validation folds.

CV Loop	Purpose	Meaning
Inner Loop	Hyperparameter tuning.	Finds the best settings for each training split.
Outer Loop	Model evaluation.	Estimates how well the tuned model generalizes.

Overfitting During Hyperparameter Tuning

Hyperparameter tuning itself can overfit. If many combinations are tested repeatedly on the same validation data, the selected configuration may perform well by chance rather than because it is truly better.

High-Risk Mistake: Do not repeatedly tune models using the final test set. The test set should be used only once at the end for final evaluation.

Safe Tuning Workflow

Leakage-Safe Hyperparameter Tuning Workflow

Split Final Test Set

→

Tune on Training Data with CV

→

Select Best Hyperparameters

→

Refit on Training Data

→

Evaluate Once on Test Set

Common Tuning Mistakes

Mistake	Why It Is Harmful	Better Approach
Tuning on the test set	Test performance becomes biased and unreliable.	Use validation or cross-validation for tuning; reserve test set for final evaluation.
Using the wrong metric	The selected model may not match business goals.	Choose the metric before tuning based on business cost.
Searching only complex settings	May overfit and ignore simpler, more stable models.	Include both simple and complex configurations.
Creating a huge grid blindly	Wastes time and resources.	Start with Random Search or informed ranges, then refine.
Ignoring preprocessing leakage	Validation scores become overly optimistic.	Put preprocessing inside the cross-validation pipeline.
Not recording experiments	Results become hard to reproduce.	Track search spaces, random seeds, metrics, and selected settings.

Best Practices for Hyperparameter Tuning

Hyperparameter Tuning Checklist

Start with a baseline: Tune only after you know how the default model performs.
Choose the metric first: The tuning metric should match the business objective.
Use cross-validation: Average validation performance gives a more stable estimate.
Keep preprocessing inside the CV pipeline: Avoid leakage from validation folds.
Start broad, then narrow: Use Random Search for exploration and Grid Search for focused refinement.
Use Bayesian Optimization when training is expensive: It can find good configurations with fewer trials.
Control complexity: Include hyperparameters that reduce overfitting, not only those that increase model power.
Use early stopping for boosting: Stop when validation performance no longer improves.
Reserve the final test set: Evaluate the selected model only once on untouched test data.
Document everything: Record search space, metric, random seed, CV strategy, and final parameters.

Why Hyperparameter Tuning is a Balance

Hyperparameter tuning is not about finding the most complex model. It is about finding the model configuration that performs best on unseen data while remaining stable, interpretable, efficient, and aligned with the business decision.

A well-tuned model can improve performance significantly. But careless tuning can cause overfitting, leakage, wasted computation, and misleading results.

Practical Insight: Good tuning improves generalization. Bad tuning only improves validation scores while making the model less trustworthy in the real world.

Key Takeaways

Hyperparameters are model settings chosen before training.
Hyperparameter tuning finds settings that improve validation performance.
Grid Search tests all combinations in a predefined grid.
Random Search tests randomly sampled combinations and is useful for broad exploration.
Bayesian Optimization uses previous results to choose promising future trials.
Tuning should use validation data or cross-validation, not training data alone.
The tuning metric should match the business objective.
Preprocessing must be kept inside the cross-validation pipeline to avoid leakage.
Repeated tuning on the test set creates biased performance estimates.
The final test set should be used only once after model selection is complete.

7.4 Hyperparameter tuning (Grid Search, Random Search, Bayesian optimization)

Hyperparameter Tuning: Grid Search, Random Search, and Bayesian Optimization

What are Hyperparameters?

Parameters vs Hyperparameters

Why Hyperparameter Tuning Matters

Search Methods at a Glance

Visual Intuition

What is Grid Search?

Advantages and Limitations of Grid Search

What is Random Search?

Advantages and Limitations of Random Search

What is Bayesian Optimization?

Advantages and Limitations of Bayesian Optimization

Grid Search vs Random Search vs Bayesian Optimization

Hyperparameter Tuning with Cross-Validation

Cross-Validated Tuning Workflow

Choosing the Right Evaluation Metric

Important Hyperparameters by Model Type

Search Space Design

Example: Tuning a Random Forest

Customer Churn Classification

Example: Tuning a Gradient Boosting Model

Sales Forecasting Regression

Early Stopping in Tuning

Nested Cross-Validation

Overfitting During Hyperparameter Tuning

Safe Tuning Workflow

Leakage-Safe Hyperparameter Tuning Workflow

Common Tuning Mistakes

Best Practices for Hyperparameter Tuning

Hyperparameter Tuning Checklist

Why Hyperparameter Tuning is a Balance

Key Takeaways