Creating New Features: Mathematical Transforms, Binning, and Date/Time Features

Feature engineering is the process of creating useful input variables from raw data so that predictive models can learn better patterns. A strong feature can make a simple model perform well, while weak features can limit even advanced algorithms.

In this chapter, we will learn how to create new features using mathematical transformations, ratios, interaction features, binning, and date/time extraction.

What is Feature Engineering?

Feature engineering means transforming raw data into meaningful model inputs. These new or modified variables help machine learning algorithms understand hidden patterns more clearly.

For example, a raw date column like order_date may not be directly useful. But from it, we can create features such as order month, weekday, weekend flag, festival season, and days since last purchase.

Core Idea: Predictive models do not understand business context automatically. Feature engineering helps convert business understanding into numerical or categorical signals that models can learn from.

Why Creating New Features Matters

🧠

Reveals Hidden Patterns

New features can expose relationships that are not obvious in raw columns.

📈

Improves Model Accuracy

Better features usually improve prediction performance more than simply changing algorithms.

⚙️

Supports Simpler Models

Well-designed features can help linear models capture useful patterns more effectively.

🎯

Connects Data to Business Logic

Features such as customer tenure, average order value, and risk ratio reflect real business behaviour.

Feature Creation Workflow

Practical Feature Engineering Pipeline

Understand Business Problem

→

Explore Raw Variables

→

Create Candidate Features

→

Validate with EDA

→

Test Model Impact

Three Major Feature Creation Techniques

Feature Creation at a Glance

Mathematical Transform

Binning

Low

Medium

High

Date/Time Features

1. Mathematical Transformations

Mathematical transformations change the scale, shape, or meaning of numerical variables. They are useful when data is skewed, has large differences in magnitude, or contains relationships that become clearer after transformation.

Transformation	What It Does	Best Used When	Example
Math Log Transform	Compresses large values and reduces right skew.	Data has long right tail or extreme high values.	Transform income, sales, transaction amount, or house price.
Math Square Root Transform	Moderately compresses large values.	Count data or moderately skewed numerical variables.	Transform number of visits or number of complaints.
Math Power Transform	Changes the relationship between variable and target.	Non-linear patterns are present.	Create area squared for property price modelling.
Math Ratio Feature	Compares two quantities meaningfully.	Relative value is more useful than raw value.	Debt-to-income ratio, profit margin, conversion rate.
Math Difference Feature	Measures gap between two variables.	The difference itself carries business meaning.	Delivery delay = actual delivery date − promised delivery date.
Math Interaction Feature	Combines two variables to capture joint effect.	Impact of one feature depends on another.	Discount × customer segment, price × quantity.

Log Transformation

Log transformation is useful when a variable has many small values and a few extremely large values. It reduces skewness and makes the distribution more manageable for many models.

New Feature = log(Original Value + 1)

Adding 1 helps avoid problems when the original value is zero.

For example, customer spending may range from ₹100 to ₹10,00,000. A log transform can reduce the dominance of extremely high spenders while preserving order.

Ratio Features

Ratio features are often powerful because they express relationships between two quantities. In many business problems, relative values are more meaningful than absolute values.

Debt-to-Income Ratio = Monthly Debt Payment / Monthly Income

This feature is often more useful for credit risk than income or debt alone.

2. Binning

Binning converts a continuous numerical variable into groups or intervals. Instead of using exact values, the model uses ranges such as low, medium, and high.

For example, instead of using exact age, we can create age groups such as 18–25, 26–35, 36–50, and 50+. This may make patterns easier to interpret and more stable.

Simple Explanation: Binning turns a continuous variable into categories so that the model can learn group-level behaviour.

Binning Method	Meaning	Example	Best Used When
Binning Equal Width Binning	Divides the value range into intervals of equal size.	Age: 0–20, 21–40, 41–60, 61+	Range-based interpretation is simple and meaningful.
Binning Equal Frequency Binning	Each bin contains approximately the same number of observations.	Income divided into quartiles.	Data is skewed and balanced bin sizes are desired.
Binning Business Rule Binning	Bins are created using domain knowledge.	Credit score: Poor, Fair, Good, Excellent.	Business interpretation matters.
Binning Target-Based Binning	Bins are chosen based on target behaviour.	Age groups where churn rate changes significantly.	Predictive separation is important, but leakage must be avoided.

When Binning is Useful

Improves Interpretability

Age groups are easier to understand than exact ages.
Risk bands are easier for business teams to use.
Segments can support dashboards and decision rules.

Handles Non-Linear Patterns

Risk may increase sharply after a threshold.
Customer behaviour may differ by income band.
Churn may be high only for very new customers.

Reduces Noise

Small fluctuations in exact values may not matter.
Grouping can make patterns more stable.
Useful when exact values are unreliable.

Potential Risk

Binning can lose detailed information.
Poorly chosen bins may hide useful patterns.
Target-based bins can cause leakage if created incorrectly.

3. Date and Time Features

Date and time columns are extremely valuable in predictive modelling. Raw dates are rarely useful by themselves, but they can be converted into powerful features that capture seasonality, recency, frequency, customer lifecycle, and time-based behaviour.

Date/Time Feature	Meaning	Example Use Case	Why It Helps
Date/Time Year	Extract year from date.	Long-term sales or price trends.	Captures annual growth or decline.
Date/Time Month	Extract month number or month name.	Retail demand forecasting.	Captures seasonality and monthly demand cycles.
Date/Time Day of Week	Extract weekday from date.	Restaurant orders, website traffic, delivery demand.	Captures weekday vs weekend behaviour.
Date/Time Hour of Day	Extract hour from timestamp.	Ride booking, call centre volume, app usage.	Captures daily activity patterns.
Date/Time Weekend Flag	Marks whether date is Saturday or Sunday.	Retail, tourism, entertainment, food delivery.	Weekend behaviour often differs from weekdays.
Date/Time Holiday or Festival Flag	Marks special days or periods.	Sales forecasting, demand planning.	Captures demand spikes around events.
Date/Time Days Since Last Event	Measures recency.	Customer churn, repeat purchase, engagement prediction.	Recent behaviour is often highly predictive.
Date/Time Tenure	Time since customer joined or account opened.	Churn prediction, loyalty analysis.	Longer-tenure customers often behave differently from new customers.

Recency, Frequency, and Monetary Features

In customer analytics, one of the most useful feature engineering approaches is creating RFM features: Recency, Frequency, and Monetary value.

Recency

How recently did the customer act?
Example: Days since last purchase.
Useful for churn and repeat purchase prediction.

Frequency

How often does the customer act?
Example: Number of purchases in last 90 days.
Useful for engagement and loyalty prediction.

Monetary

How much value does the customer generate?
Example: Total spend or average order value.
Useful for customer value and targeting models.

Combined RFM Signal

Customers who bought recently, frequently, and with high value are often more valuable.
RFM features support segmentation and prediction.
They are widely used in marketing analytics.

Interaction Features

Interaction features are created when the effect of one variable depends on another variable. These features help models capture combined effects that may not be visible from individual variables alone.

Interaction Feature	Original Variables	Business Meaning
Price × Quantity	Price and quantity sold.	Total sales value.
Discount × Customer Segment	Discount rate and customer type.	Different customer groups may respond differently to discounts.
Income × Credit Score	Income and credit score.	Financial strength may depend on both income and repayment history.
Tenure × Complaint Count	Customer tenure and complaints.	Complaints may affect new and old customers differently.

Example: Feature Engineering for Customer Churn

Business Problem

A telecom company wants to predict whether a customer will churn. The raw dataset contains customer join date, monthly charges, support tickets, payment history, data usage, and churn status.

Raw Data	New Feature	Feature Type	Why It Helps
Join Date	Customer tenure in months.	Date/Time	New customers may churn more frequently than long-term customers.
Support Tickets	Tickets per month.	Ratio	Normalizes complaints by customer tenure.
Monthly Charges	Charge band: Low, Medium, High.	Binning	Helps detect price sensitivity groups.
Last Payment Date	Days since last payment.	Recency	Recent payment behaviour may signal engagement or risk.
Data Usage	Log of data usage.	Transform	Reduces the effect of extremely high usage values.

Example: Feature Engineering for Sales Forecasting

Business Problem

A retail company wants to forecast product demand. Raw sales data includes date, product ID, store location, price, discount, units sold, and inventory level.

Month: Captures seasonal buying behaviour.
Weekend flag: Captures higher weekend demand.
Festival flag: Captures demand spikes during holidays.
Discount percentage: Captures promotion impact.
Previous week sales: Captures recent demand momentum.
Stockout flag: Helps explain zero or unusually low sales.

These features convert raw transaction data into signals that reflect real buying behaviour.

Feature Engineering and Data Leakage

Feature engineering must be done carefully to avoid data leakage. Leakage happens when a feature uses information that would not be available at the time of prediction.

High-Risk Example: If you are predicting whether a customer will churn next month, you cannot use “cancellation date” or “reason for cancellation” as features because these values are known only after churn happens.

Feature	Safe or Leakage?	Reason
Number of complaints before prediction date	Safe	Available before prediction.
Cancellation reason	Leakage	Known only after customer has churned.
Sales from previous month	Safe	Past information used to predict future.
Sales from next month	Leakage	Future information used incorrectly.

How to Evaluate New Features

Not every new feature improves a model. Some features add noise, duplicate existing information, or cause leakage. Every engineered feature should be validated using business logic, EDA, and model performance.

Feature Validation Process

Create Feature

→

Check Business Meaning

→

Inspect Distribution

→

Check Leakage Risk

→

Test Model Performance

Common Mistakes in Feature Creation

Mistake	Why It Is Harmful	Better Approach
Creating too many random features	Adds noise and increases overfitting risk.	Create features guided by business logic and EDA.
Using future information	Causes data leakage and unrealistic model performance.	Use only information available at prediction time.
Binning without reason	Can lose useful numerical detail.	Use binning when it improves interpretability or captures thresholds.
Ignoring feature distribution	New features may be skewed, sparse, or full of missing values.	Perform EDA on every engineered feature.
Not testing model impact	A feature may look meaningful but not improve prediction.	Compare model performance with and without the feature.

Best Practices for Creating New Features

Feature Engineering Checklist

Start with business understanding: Create features that reflect real drivers of the outcome.
Use EDA findings: Let distributions, trends, and target relationships guide feature ideas.
Transform skewed variables: Use log or square root transformations when appropriate.
Create ratios carefully: Ratios often capture stronger business meaning than raw values.
Use binning when useful: Bins can capture thresholds and improve interpretability.
Extract date/time signals: Month, weekday, tenure, recency, and seasonality are often powerful.
Check leakage: Use only information available before the prediction moment.
Validate every feature: Inspect distribution, missingness, target relationship, and model impact.
Keep the feature set manageable: More features are not always better.

Why Feature Engineering is a Core Modelling Skill

Feature engineering is where data science meets domain understanding. Algorithms learn from the features we provide. If the features are weak, noisy, or poorly designed, the model may struggle. If the features are meaningful, clean, and predictive, the model can perform much better.

In real-world predictive analytics, thoughtful feature creation often makes the difference between an average model and a useful business solution.

Practical Insight: The best features are not always complicated. Often, simple features like customer tenure, days since last purchase, average order value, and complaint frequency are extremely powerful.

Key Takeaways

Feature engineering creates useful model inputs from raw data.
Mathematical transformations help handle skewness, scale, ratios, and non-linear patterns.
Binning converts continuous variables into meaningful groups or bands.
Date/time features capture seasonality, recency, frequency, tenure, and time-based behaviour.
Interaction features capture combined effects between variables.
Feature engineering should be guided by business logic, EDA, and validation performance.
Data leakage must be avoided by using only information available at prediction time.
Good features can significantly improve predictive model performance and interpretability.

4.1 Creating new features

Creating New Features: Mathematical Transforms, Binning, and Date/Time Features

What is Feature Engineering?

Why Creating New Features Matters

Feature Creation Workflow

Practical Feature Engineering Pipeline

Three Major Feature Creation Techniques

Feature Creation at a Glance

1. Mathematical Transformations

Log Transformation

Ratio Features

2. Binning

When Binning is Useful

3. Date and Time Features

Recency, Frequency, and Monetary Features

Interaction Features

Example: Feature Engineering for Customer Churn

Business Problem

Example: Feature Engineering for Sales Forecasting

Business Problem

Feature Engineering and Data Leakage

How to Evaluate New Features

Feature Validation Process

Common Mistakes in Feature Creation

Best Practices for Creating New Features

Feature Engineering Checklist

Why Feature Engineering is a Core Modelling Skill

Key Takeaways