Univariate and Bivariate Analysis

Exploratory Data Analysis becomes more powerful when we study variables at two levels: individually and in relation to each other. Univariate analysis helps us understand one variable at a time, while bivariate analysis helps us understand how two variables move together.

These two techniques help identify distributions, outliers, category imbalance, relationships, trends, and predictors that may be useful for machine learning models.

What is Univariate Analysis?

Univariate analysis means analysing one variable at a time. The goal is to understand the basic behaviour of a single feature or target variable without considering its relationship with other variables.

For example, if we analyse only customer age, monthly income, transaction amount, product category, or churn status individually, we are performing univariate analysis.

Core Idea: Univariate analysis answers the question: “What does this one variable look like?”

What is Bivariate Analysis?

Bivariate analysis means analysing the relationship between two variables. The goal is to understand whether one variable changes when another variable changes.

For example, if we analyse the relationship between house size and house price, income and loan default, advertising spend and sales, or customer complaints and churn, we are performing bivariate analysis.

Core Idea: Bivariate analysis answers the question: “How does one variable behave in relation to another variable?”

Univariate vs Bivariate Analysis

Aspect	Univariate Analysis	Bivariate Analysis
Number of Variables	One variable at a time.	Two variables together.
Main Question	What does this variable look like?	How are these two variables related?
Common Outputs	Distribution, frequency, spread, outliers.	Relationship, trend, comparison, association.
Common Visuals	Histogram, box plot, bar chart.	Scatter plot, grouped bar chart, box plot by category, cross-tabulation.
Modelling Use	Helps detect data quality issues and preprocessing needs.	Helps identify useful predictors and relationships with the target.

Simple Difference Between Univariate and Bivariate Analysis

Univariate: One Variable Distribution

Bivariate: Relationship Between Two Variables

Why These Analyses Matter for Predictive Modelling

🔍

Understand Variables

Univariate analysis shows distribution, range, missing values, and unusual values for each variable.

🔗

Find Relationships

Bivariate analysis reveals whether features are related to each other or to the target variable.

⚙️

Guide Feature Engineering

Patterns discovered during analysis can inspire transformations, bins, ratios, and interaction features.

🎯

Improve Model Strategy

These analyses help choose algorithms, preprocessing steps, metrics, and validation methods.

Univariate Analysis for Numerical Variables

For numerical variables, univariate analysis focuses on central tendency, spread, distribution shape, skewness, and outliers.

Check	What to Look For	Recommended Visual	Modelling Action
Univariate Central Value	Mean, median, and whether they are far apart.	Summary table.	Decide whether mean or median better represents typical behaviour.
Univariate Spread	Minimum, maximum, range, standard deviation, IQR.	Box plot.	Check if scaling or outlier treatment is required.
Univariate Shape	Symmetry, skewness, long tails, multiple peaks.	Histogram.	Apply transformation if distribution is highly skewed.
Univariate Outliers	Extreme values that are unusually high or low.	Box plot or percentile table.	Investigate whether to remove, cap, transform, or keep.

Univariate Analysis for Categorical Variables

For categorical variables, univariate analysis focuses on frequency counts, category percentages, dominant categories, rare categories, and class imbalance.

Check	What to Look For	Recommended Visual	Modelling Action
Categorical Frequency Count	Number of records in each category.	Bar chart.	Understand category distribution.
Categorical Dominant Category	One category appearing much more than others.	Bar chart or frequency table.	Check whether feature has low information value.
Categorical Rare Categories	Categories with very few observations.	Frequency table.	Group rare categories into “Other” before encoding.
Categorical Target Balance	Whether target classes are balanced or imbalanced.	Bar chart.	Use stratified splitting and appropriate metrics.

Bivariate Analysis: Choosing the Right Technique

The best bivariate analysis method depends on the data types of the two variables being compared. Numerical-to-numerical relationships are different from categorical-to-numerical or categorical-to-categorical relationships.

Bivariate Analysis Matrix

Numerical vs Numerical

Use scatter plots, correlation, trend lines, and pair plots.

Example: House area vs. house price.

Categorical vs Numerical

Use grouped summaries, box plots, violin plots, and bar charts of averages.

Example: Product category vs. sales amount.

Categorical vs Categorical

Use cross-tabulation, stacked bar charts, and proportion tables.

Example: Plan type vs. churn status.

Time vs Numerical

Use line charts, rolling averages, seasonal plots, and trend analysis.

Example: Month vs. sales revenue.

Numerical vs Numerical Analysis

When both variables are numerical, bivariate analysis helps identify whether they move together, move opposite to each other, or show no clear relationship.

Analysis Method	What It Shows	Example	Modelling Insight
Numerical Scatter Plot	Pattern between two continuous variables.	Advertising spend vs. sales.	Shows linear, non-linear, clustered, or outlier patterns.
Numerical Correlation	Strength and direction of linear relationship.	Income vs. credit limit.	Helps detect useful predictors and multicollinearity.
Numerical Trend Line	Average direction of relationship.	Property size vs. price.	Shows whether relationship may be linear or non-linear.

Categorical vs Numerical Analysis

When one variable is categorical and the other is numerical, we compare the distribution or average value of the numerical variable across categories.

For example, we may compare average monthly spending across customer segments or compare house prices across different locations.

Analysis Method	What It Shows	Example	Modelling Insight
Bivariate Grouped Mean / Median	Average numerical value by category.	Average spend by customer segment.	Shows which categories have higher or lower outcomes.
Bivariate Box Plot by Category	Distribution of numerical variable across groups.	Salary distribution by department.	Shows spread, outliers, and group differences.
Bivariate Bar Chart of Averages	Comparison of average values between categories.	Average order value by region.	Helps identify categories with predictive value.

Categorical vs Categorical Analysis

When both variables are categorical, bivariate analysis focuses on frequency combinations and proportions. This is especially useful in classification problems.

For example, in customer churn prediction, we may analyse whether churn rate differs across plan types, regions, payment methods, or complaint categories.

Analysis Method	What It Shows	Example	Modelling Insight
Categorical Cross-Tabulation	Counts for combinations of two categories.	Plan type vs. churn status.	Shows whether categories are associated with the target.
Categorical Proportion Table	Percentage distribution within categories.	Churn rate by region.	Helps compare groups fairly even when group sizes differ.
Categorical Stacked Bar Chart	Visual comparison of category proportions.	Payment method vs. repeat purchase.	Highlights group-level differences in outcome behaviour.

Target-Based Bivariate Analysis

In predictive modelling, one of the most important uses of bivariate analysis is studying each feature against the target variable. This helps identify which features may be useful predictors.

Feature-to-Target Analysis Workflow

Select Feature

→

Compare with Target

→

Identify Pattern

→

Check Business Logic

→

Use in Modelling

Example: Customer Churn Analysis

Business Problem

A telecom company wants to predict customer churn. Before building the model, analysts perform univariate and bivariate analysis to understand customer behaviour.

Analysis Type	Variable or Relationship	Finding	Modelling Decision
Univariate	Monthly charges	Distribution is right-skewed with a few high-value customers.	Check outliers and consider transformation if needed.
Univariate	Contract type	Most customers are on monthly contracts.	One-hot encode contract type and check target relationship.
Bivariate	Contract type vs. churn	Monthly contract customers have much higher churn rate.	Contract type is likely an important predictor.
Bivariate	Tenure vs. churn	New customers churn more frequently than long-term customers.	Create tenure groups or use tenure as a strong feature.
Bivariate	Support tickets vs. churn	Customers with repeated complaints show higher churn risk.	Create complaint frequency feature.

Example: House Price Analysis

Regression Problem

A real estate company wants to predict house prices. Univariate analysis helps understand individual variables, while bivariate analysis helps understand what drives price.

Univariate: Analyse distribution of price, area, rooms, property age, and location.
Bivariate: Analyse area vs. price, location vs. price, rooms vs. price, and property age vs. price.
Insight: If area has a strong positive relationship with price, it becomes an important model feature.
Insight: If price differs strongly by location, location encoding becomes important.

Common Patterns Found During Bivariate Analysis

Pattern	Meaning	Possible Modelling Action
Positive Relationship	As one variable increases, the other also increases.	Use as predictor; consider linear relationship.
Negative Relationship	As one variable increases, the other decreases.	Use as predictor; check business interpretation.
No Clear Relationship	Variables do not show visible association.	Feature may have weak individual predictive power.
Non-Linear Relationship	Relationship changes direction or shape.	Use transformations, bins, or tree-based models.
Group Difference	Numerical outcome differs across categories.	Encode category carefully and consider interaction features.
Outlier Relationship	Some points behave very differently from the pattern.	Investigate outliers and decide whether to treat them.

Common Mistakes to Avoid

Mistake	Why It Is Harmful	Better Approach
Skipping univariate analysis	Data quality issues, skewness, and outliers may remain hidden.	Analyse every important variable individually first.
Using only correlation	Correlation captures mainly linear relationships and may miss non-linear patterns.	Use scatter plots and grouped summaries along with correlation.
Ignoring categorical relationships	Important category-level patterns may be missed.	Use cross-tabs, stacked bars, and group-wise target rates.
Confusing association with causation	A relationship between two variables does not prove one causes the other.	Interpret relationships carefully and validate with business logic.
Not analysing features against target	Important predictive signals may remain undiscovered.	Perform feature-to-target analysis for every meaningful feature.

Best Practices for Univariate and Bivariate Analysis

Analysis Checklist

Start with univariate analysis: Understand each variable before studying relationships.
Separate numerical and categorical variables: Use different summaries and visuals for each type.
Analyse the target variable carefully: Check class imbalance, skewness, and unusual values.
Use bivariate analysis with the target: Identify features that may have predictive value.
Use the right chart: Histograms for numerical distributions, bar charts for categories, scatter plots for numerical relationships.
Compare groups carefully: Use proportions, medians, and distributions, not only raw counts.
Look for non-linear relationships: Not all predictive patterns are straight-line relationships.
Connect findings to feature engineering: Convert EDA insights into useful model inputs.
Validate with business logic: Statistical patterns should make practical sense.

How This Analysis Improves Predictive Models

Univariate analysis improves modelling by revealing data quality issues, outliers, skewness, missingness, and imbalance. Bivariate analysis improves modelling by revealing relationships, target patterns, useful features, and possible transformations.

Together, these methods help analysts move from raw data to modelling strategy. They guide decisions about encoding, scaling, transformations, feature selection, feature engineering, and model evaluation.

Practical Rule: Do not start modelling before asking two questions: “What does each variable look like?” and “How does each important variable relate to the target?”

Key Takeaways

Univariate analysis studies one variable at a time.
Bivariate analysis studies the relationship between two variables.
Univariate analysis helps detect distributions, outliers, rare categories, and imbalance.
Bivariate analysis helps identify relationships and useful predictors.
Numerical, categorical, and time-based variables require different analysis techniques.
Feature-to-target analysis is especially important for predictive modelling.
EDA findings should guide preprocessing, feature engineering, model selection, and evaluation strategy.
Strong predictive modelling begins with careful univariate and bivariate analysis.

3.2 Univariate and bivariate analysis