Most businesses pay to acquire 100% of their website traffic, yet statistically, only 2% of those visitors will ever make a purchase. The remaining 98% are casually browsing, comparing prices, or clicking away - often consuming retargeting budgets in the process.

But what if an algorithm could mathematically separate the window shoppers from the active buyers before they even click your retargeting ad?

What is Propensity Scoring? Propensity scoring is a predictive machine learning technique that analyses historical user behaviour to calculate the mathematical probability of a specific visitor converting. By analysing this data, an algorithm learns to recognise the subtle, technical patterns of a converting user. When a new visitor lands on the site, the model scores their behaviour in real time, predicting their exact likelihood to convert.

Descriptive Analytics vs. Predictive Propensity Modeling

Traditional analytics is strictly descriptive. A standard dashboard will tell you exactly what happened yesterday - how many users visited, which pages they viewed, and who ultimately converted. While accurate reporting is a fundamental baseline, it is inherently backward-looking. A business stuck exclusively in reporting mode is merely auditing the past, not architecting the future.

Propensity scoring represents a paradigm shift from descriptive to predictive analytics. By running thousands of historical interactions through a machine learning model, you are generating a statistical forecast of the future. The algorithm stops telling you who has bought, and instead starts mathematically calculating who is about to buy next .

Here are 200 recent website visitors, represented with different characteristics (Shape, Colour, and Size).

1. Raw Visitors
2. Analyze Conversions
3. Score Prospects

The Inefficiency of Generic Retargeting

Most digital strategies treat all retargeting traffic equally. Standard configurations simply blast the same Facebook and Google Ads to everyone who visited the site in the last 30 days. This is highly inefficient.

When a propensity scoring model is implemented, it unlocks the ability to bid dynamically based on calculated user intent. Ad platforms can be configured to bid aggressively for users with an 80% or higher propensity score, ensuring you win the auction for high-value prospects. Conversely, you can aggressively suppress users with a score under 20%, structurally preventing ad budgets from bleeding out on low-intent clicks.

Doesn't Google Ads Already Do This?

If you use automated bidding strategies in Google Ads or Meta (like Target CPA or Maximize Conversions), you are already relying on their internal propensity scoring. Those platforms use their own algorithms to predict which users are likely to click and convert.

So why build your own?

Because native ad platforms operate from a "long distance." Google's algorithm excels at identifying broad audience trends across the web to bring a user to your door. However, once that user lands on your site, Google relies on very basic signals - like a generic pageview or a final checkout - to judge the quality of the visit.

A bespoke propensity model handles the "close proximity" data. It tracks the hyper-specific, on-site behavioural micro-signals (like hovering over pricing tiers or expanding technical specs) that Google cannot see. It also connects to your CRM to understand post-conversion reality - did they refund the next day, or did they become a high-lifetime-value client?

These two systems are not competing; they are highly complementary. Google Ads acts as the long-distance radar, finding potential traffic across the web. Your internal propensity model acts as the close-proximity targeting system, analysing exact on-site behaviour and CRM data to calculate a true intent score. By feeding this highly refined signal back into the ad network, you force Google's algorithm to bid exclusively on your true ideal buyers.

Technical Architecture: Integrating GA4, BigQuery, and Machine Learning

Out of the box, Google Analytics 4 provides basic predictive metrics. However, these are "black box" solutions - they are opaque, difficult to tune, and generally only work for massive e-commerce stores with huge transaction volumes.

To build a truly potent scoring engine tailored to specific business logic, Pathfinder Digital bypasses the generic tools. By streaming raw GA4 event data directly into a Google BigQuery data warehouse, we can construct bespoke machine learning models that intimately understand the unique conversion pathways of your specific platform. Common model choices include:

  • XGBoost: Highly effective for tabular behavioural data and handling complex, non-linear relationships between user actions.
  • Logistic Regression: Offers excellent interpretability, allowing stakeholders to easily understand exactly which factors (like specific page views) are driving the conversion probability.
Micro-Signals Targeting Audiences Raw Event Export Propensity Scores Propensity Scores User Visit Website Behavior GA4 Raw Event Capture BigQuery Propensity Model Google Ads Bid Optimization CRM System Sales & Email Routing

By capturing custom, high-resolution event data in GA4, the model can look far beyond standard pageviews. We can feed the algorithm hundreds of behavioural micro-signals.

Feature Engineering: The Micro-Signals That Drive the Model

  • Engagement Depth: Not just that they visited a page, but that they scrolled past the 75% mark on a technical specification sheet or watched an embedded video to completion.
  • Micro-Interactions: Tracking if a user expanded a specific FAQ accordion, hovered over a pricing tier, or interacted with a complex calculator widget.
  • Velocity and Recency: How much time elapsed between their first and second visit, and the frequency of their sessions over a rolling 7-day period.
  • Referral Quality: Cross-referencing their on-site behaviour with their exact acquisition source (e.g., distinguishing the behaviour of a high-intent organic search versus a casual social media click).

These specific micro-signals hold immense predictive value because they occur deep within the user journey, sitting incredibly close to the actual conversion point. When you feed this rich, multi-dimensional dataset into a BigQuery ML model, the accuracy of the propensity score skyrockets. The algorithm isn't just looking at superficial, top-of-funnel clicks; it is calculating intent based on your exact buyer psychology, rather than generic industry averages.

Example Output: Mapping Propensity Scores to Marketing Actions

The result of this pipeline is a constantly updating dataset of active users and their conversion probabilities. Here is a simplified example of what this BigQuery output looks like:

User ID Propensity Score Automated Routing Target Platform
user_12345 94% Aggressive Retargeting / Send to Sales Team Salesforce / Google Ads
user_67890 68% Trigger Email Nurture Sequence HubSpot / ActiveCampaign
user_54321 32% Standard Newsletter List Mailchimp
user_98765 12% Exclude from Paid Campaigns Meta Ads / Google Ads

CRM Activation: Beyond the Ad Network

While optimising ad spend is the most immediate use case, the real leverage happens when these scores are piped directly into your CRM (like HubSpot or Salesforce). By marrying online predictive behaviour with offline sales processes, you unlock highly targeted outreach strategies.

  • Sales Prioritisation (Phone Calls): Instead of a sales team cold-calling a list alphabetically, the CRM automatically surfaces users with a 90%+ propensity score to the top of the queue. Sales reps focus their time exclusively on prospects who are mathematically ready to buy, dramatically increasing close rates.
  • Dynamic Email Offerings: Email marketing shifts from generic newsletters to precision targeting. A user hovering at a 60% score might receive an automated nurture sequence designed to build trust. However, a user sitting at 85% might automatically trigger an aggressive, time-limited discount email to push them over the line.
  • High-ROI Direct Mail: Physical mail outs are expensive. By filtering your database to only include users with a high propensity score, you can send premium, physical marketing materials strictly to the users most likely to convert, ensuring a massive return on investment for your physical campaigns.

This ensures the entire marketing and sales stack becomes entirely predictive, reacting to user intent without manual intervention.

Engineering Predictive Analytics

Transitioning from reporting on the past to predicting the future is one of the most impactful architectural upgrades a business can make. If a platform is generating consistent traffic but struggling with high acquisition costs, deploying a bespoke propensity model provides a mathematical, engineering-led solution to the problem.

Implementation Roadmap

  1. Data Collection: Export raw GA4 event data directly into Google BigQuery.
  2. Model Training: Use BigQuery ML to train a predictive model on historical conversions and engineered features.
  3. Activation: Pipe the resulting user scores back into your advertising platforms or CRM for automated, intent-based routing.
Johari Lanng

Written by Johari Lanng

Johari is a Principal Analyst and Data Engineer who loves turning chaotic marketing data into clear business strategies. When he isn't architecting BigQuery pipelines or building machine learning models, he's usually experimenting with WebGL and generative coding.