You spend thousands of dollars on agencies (like us!) to handle your SEO. But when it's time to report on performance, you likely face a frustrating reality: you can see the traffic going up, but you can't get the exact dollar figure generated by specific keywords. Google's privacy protocols cut the data pipeline between search terms and revenue, leaving you with a significant blind spot.
If you can't tie revenue back to the search query, how do you justify the SEO budget?
The answer is data engineering. Bridging the gap between Google Search Console (GSC) and Google Analytics 4 (GA4) in BigQuery using a Proxy Report allows teams to estimate the monetary value of individual search queries - from broad brand segmentations down to long-tail keywords.
The Disconnected Data Pipeline
Google cut the link between search terms and revenue over a decade ago for privacy reasons. When a user clicks a search result, Google essentially black-boxes the journey, stripping the query data from the referring URL. This leaves you flying blind with two disconnected datasets:
- Google Search Console (GSC)
- Tracks what people type into Google to find you, and how many times they click. It does not track post-click behavior or revenue.
- Google Analytics 4 (GA4)
- Tracks what people do (and buy) once they actually land on your website. It records the traffic as "Organic" but hides the specific search term they used.
Without a shared "Session ID" or "User ID," joining these databases 1-to-1 is impossible. The workaround requires building a proxy.
The Proxy Report
A proxy variable stands in for a value that cannot be directly measured. For example, if health officials cannot directly count the exact number of people who have the flu in a city, they might use the volume of cough medicine sales at local pharmacies as a proxy variable. It is not a perfect 1-to-1 measurement of actual flu cases, but the correlation is strong enough to guide their public health response. Just as cough medicine sales act as a proxy for the flu, a URL's total click-share acts as a proxy for keyword revenue.
Similarly, in marketing analytics, a proxy provides enough correlation for budget allocation and strategy adjustments when direct tracking fails.
To build this Proxy Report, we rely on the few dimensions that both datasets still share:
Using this combination as a "Proxy Key," we match GA4 revenue for a specific page, on a given day, from a specific device and country, with GSC query data sharing those exact same attributes. This multi-key approach significantly increases the accuracy of the proportional attribution.
Proportional Attribution
Revenue is assigned using Proportional Attribution. This method looks at the total clicks a landing page received in GSC, calculates a single keyword's percentage share of that traffic, and assigns it that exact percentage of the GA4 revenue.
A Practical Example
Consider a footwear retailer evaluating branded vs. generic search terms:
- In GA4, the
/mens/running-shoes/landing page generated $1,000 in organic search revenue on Tuesday. - In GSC, the
/mens/running-shoes/page received 100 organic clicks on Tuesday.
Within GSC, we can see exactly which keywords drove those 100 clicks: 50 came from "summitgear shoes" (Branded), 30 came from "mens running shoes" (Generic), and 20 came from "buy trail runners" (Generic).
Using proportional attribution, the revenue is distributed as follows:
- "summitgear shoes" (50% of total clicks) = $500 assigned.
- "mens running shoes" (30% of total clicks) = $300 assigned.
- "buy trail runners" (20% of total clicks) = $200 assigned.
The Takeaway: As the interactive diagram illustrates, proportional proxy reporting is "close but not 100% accurate." Because the model assumes all clicks convert at exactly the same rate, the estimated proxy revenue will vary slightly from the actual underlying truth. However, at scale, these values are close enough to be incredibly useful, providing the directional data required to justify SEO ROI.
Technical Execution in BigQuery
If you are managing a small website, you can often run these proportional calculations manually in a spreadsheet. But for enterprise sites with millions of rows of data, that approach quickly collapses.
For Business Owners: You now understand the strategy behind unlocking your SEO revenue! If you don't want to build this yourself, contact us and our engineers will deploy it for you.
For Data Analysts: Grab a coffee. Here is the exact technical blueprint to build this BigQuery architecture yourself:
Step 1: Extract and Clean GSC Data
Pull the query, URL, clicks, and impressions from the searchdata_url_impression table.
During this extraction, we also use a CASE statement with a Regular Expression to categorize queries into "Branded" vs. "Non-Branded." We strongly recommend this approach, as it lets you easily filter out branded traffic from your final ROI calculations - ensuring you are measuring true, net-new organic acquisition. You can also customize this logic to classify queries in other ways, such as by Search Intent (e.g., matching "buy" or "price" for Transactional) or by Product Line.
SELECT
data_date,
device,
country,
query,
REGEXP_EXTRACT(url, r'^([^?#]+)') AS cleaned_url,
CASE
WHEN REGEXP_CONTAINS(query, r'(?i)(yourbrand|your brand)') THEN 'Branded'
ELSE 'Non-Branded'
END AS brand_category,
SUM(clicks) AS query_clicks,
SUM(impressions) AS query_impressions,
SUM(sum_position) AS query_sum_position
FROM
`your-project.your_dataset.searchdata_url_impression`
GROUP BY
1, 2, 3, 4, 5, 6
Step 2: Calculate URL Totals
Calculate the total GSC clicks for the specific URL on that specific day to serve as the denominator.
Step 3: Extract and Clean GA4 Revenue
Filter the nested GA4 events_* tables for organic traffic, identify the landing page (page_view events where entrances = 1), and sum the ecommerce.purchase_revenue from purchase events.
Step 4: The Proxy Join
LEFT JOIN the GA4 revenue data onto the GSC keyword data, matching on data_date, cleaned_url, device, and country.
ROUND(SUM(
SAFE_DIVIDE(g.query_clicks, t.total_url_clicks) * COALESCE(ga.total_organic_revenue, 0)
), 2) AS est_revenue
The COALESCE(..., 0) function acts as a safety net. If GSC records clicks for a URL but GA4 tracked no revenue that day, it outputs $0.00 instead of a NULL error.
Once executed, your BigQuery view will output a clean, flat table that perfectly attributes your GA4 revenue to specific GSC queries:
| cleaned_url | query | brand_category | clicks | est_revenue |
|---|---|---|---|---|
| /mens/running-shoes/ | summitgear shoes | Branded | 50 | $500.00 |
| /mens/running-shoes/ | mens running shoes | Non-Branded | 30 | $300.00 |
| /mens/running-shoes/ | buy trail runners | Non-Branded | 20 | $200.00 |
Data Engineering Challenges
Three common data cleanliness issues can break proxy reports:
Hurdle 1: URL Sanitization
GA4 and GSC format URLs differently. GSC generally records the canonical URL (https://www.example.com/shoes/), while GA4 records the literal URL visited, including tracking parameters (https://www.example.com/shoes/?utm_source=google&gclid=123xyz).
Joining these natively fails. Use a Regular Expression like REGEXP_EXTRACT(url, r'^([^?#]+)') on both datasets before the join to remove parameters.
(Business Impact: If your engineers don't clean these URLs properly, your proxy report will break, and your dashboard will falsely tell you your campaigns are failing.)
Hurdle 2: Dataset Alignment
Verify dataset IDs match by running a UNION ALL on a sample of 5 URLs from both tables to verify the domain structures align exactly. A mismatch in www. vs non-www, or missing trailing slashes, will cause the join to fail.
(Business Impact: A single missing slash can cause thousands of dollars of revenue to vanish from your reporting.)
Hurdle 3: Anonymized Queries and Noise
GSC hides low-volume queries (is_anonymized_query). Explicitly filter these out. Otherwise, proportional math assigns them revenue, inflating the unanalyzable data. Additionally, filter out deep-page ranking noise (e.g., average positions > 20) to limit attribution to viable keywords.
(Business Impact: Failing to filter this "noise" will artificially inflate your numbers, leading to over-confident and flawed marketing budgets.)
Visualizing the Data: The Aggregation Trap
Piping this BigQuery view into a BI tool like Looker Studio or Tableau often breaks Click-Through Rates (CTR) and Average Positions.
The BigQuery view calculates data daily. When viewing a broader date range (e.g., "Last 30 Days"), BI tools sum the pre-calculated daily percentages. If a keyword had a 5% CTR on Monday and a 6% CTR on Tuesday, the tool reports an invalid 11% CTR for the week.
To get accurate rates, force the dashboard to divide after aggregating raw metrics. Do not import the ctr column. Instead, import raw metrics and create a Calculated Field: SUM(total_clicks) / SUM(total_impressions).
Limitations of Proxy Reporting
This is a marketing analysis tool, not financial accounting. It has specific limitations:
The Intent Assumption
Proportional attribution assumes all clicks to a page convert at the same rate. In reality, human intent varies. A branded search indicates high commercial intent, while a generic term may just be early-stage research. The proxy math distributes revenue based purely on click volume, ignoring underlying intent. (To move beyond proportional math and actually weigh individual human behavior, read our guide to Machine Learning Propensity Scoring).
Attribution Lag (The "Last Click" Limitation)
Because proxy joining relies on a single-day snapshot, it functions similarly to a strict Last Click attribution model. If a user searches organically on Monday (tracked in GSC), leaves, and returns on Friday via direct traffic to make a $500 purchase (tracked as Direct in GA4), the proxy join misses it entirely. It only assigns revenue to the keywords that drove the session on the exact day the purchase occurred.
Macro-Trend Accuracy
Despite these limitations, proxy reporting effectively measures SEO ROI. While exact dollar amounts per keyword are estimates, the macro-trends are highly accurate. If generic keyword revenue drops 40% month-over-month while branded revenue remains flat, teams can identify specific algorithmic shifts that GA4 or GSC alone cannot surface. Proxy reporting translates raw traffic into measurable revenue trends.
The Final Verdict
Google's privacy protocols created a massive blind spot for marketers, making it nearly impossible to justify SEO budgets with exact, 1-to-1 revenue tracking. But as we've demonstrated, you don't need perfect tracking to make highly profitable business decisions.
By engineering a proxy report in BigQuery, you can bridge the gap between Google Search Console and GA4. Yes, the proportional math provides an estimate rather than a flawless financial ledger. However, at scale, it restores crucial visibility to your organic acquisition funnel, allowing you to confidently allocate budgets to the keywords that are actually driving revenue.