Customer Cohort Analysis: Practical Guide
Customer Cohort Analysis: Practical Guide
Customer cohort analysis without a data warehouse. Group customers by signup month or plan tier using CRM fields and run retention reports in 60 minutes.
No credit card required
Free 100k syncs every month
Cohort analysis has a reputation problem. Every guide on the internet treats it as a warehouse-and-SQL problem: extract from Stripe to Snowflake, write a window function, visualize in Looker. That stack is fine if you already run it. For the thousands of teams under 200 people who do not, the entire technique gets shelved as something they will get to after hiring a data engineer.
This guide is for those teams. Customer cohort analysis is not a SQL problem. It is a data-availability problem. If signup date, plan tier, and activation events live in your CRM next to the contact record, you can run retention cohort analysis as a saved-list filter and a spreadsheet pivot. We cover what cohort analysis actually answers, why competitors gate it behind warehouses they want to sell you, how to do cohort analysis without SQL, two worked examples for SaaS and ecommerce, and how to push the resulting cohorts back into your activation tools. For a refresher on the metric most cohort grids ultimately feed into, see our guide to customer lifetime value.
What customer cohort analysis reveals about retention and revenue
A cohort is a group of customers who share a starting characteristic. The most common starting characteristic is signup month, but plan tier, acquisition channel, first product purchased, and activation week are all valid. Cohort analysis is what happens when you track that group's behavior over time and compare it against other groups.
The single-number version of churn hides what cohort analysis exposes. Saying "we churn at 4% monthly" averages across customers who joined two years ago and customers who joined last week. Maybe the old cohort is sticky and the new one leaks. Maybe a product change in March improved retention for every signup since. You cannot see either pattern in the aggregate. A retention cohort grid lays those differences side by side. The gap between cohorts often dwarfs the average, and the underlying research on customer retention economics has been clear on this point for decades.
A few of the questions a cohort grid actually answers:
Did the customers acquired through paid ads in Q1 retain better or worse than customers acquired through organic search?
After the pricing change in June, did new cohorts upgrade to a higher plan faster than older cohorts?
Do customers who hit the activation milestone in their first week churn at half the rate of customers who take three weeks?
Is the average revenue per customer in newer cohorts trending up or down? (This is the leading indicator that something in the funnel is changing.)
The output is usually a triangular grid: one row per cohort, one column per elapsed period, each cell containing a retention rate, revenue figure, or count. The triangle shape comes from the fact that newer cohorts have not lived long enough to fill in the later columns yet. That blank space on the lower right is the part of the chart that gets more interesting every month you keep the analysis going.
Why traditional cohort analysis requires a warehouse and SQL
Read any vendor guide on the topic and you get the same architecture: ingest source data into a warehouse, write SQL with date bucketing and window functions, render in a BI tool. The classic ELT-tool walkthrough always starts with "pull the raw data," which assumes the data is already centralized somewhere queryable. Most teams under 200 people do not have that. The technique itself, as Wikipedia's definition of cohort analysis makes clear, is about grouping and comparison, not about any specific data infrastructure.
The warehouse-first architecture exists for good reasons at scale. A cohort grid over 50 million users with second-by-second event resolution needs columnar storage and a query engine. Most cohort analyses do not. A SaaS team with 2,000 customers running monthly signup cohorts is dealing with a few thousand rows. A series of CRM filters and a spreadsheet pivot handles it without breaking a sweat.
The interesting thing about the vendor framing is that it sells the warehouse before it explains the analysis. ELT tool guides spend most of their words on the loading layer rather than the cohort logic. Enterprise CDP marketing pages treat cohort analysis as a feature you cannot run until you buy the CDP. For teams that already pay for Stripe, HubSpot, and Customer.io, that recommendation skips the question of whether you can run cohort analysis with what you have today.
You can. The rest of this guide shows how.
How to do cohort analysis without SQL or a data warehouse
The mechanical requirement for cohort analysis is simple: every customer record needs a cohort identifier (the field that puts them in a group) and a state field that changes over time (the thing you measure). Both can live in the CRM if you sync the right data to it.
1. Pick the cohort dimension. Signup month is the default. It works because most billing tools record subscription creation dates, and "did people who signed up in X retain?" is the most common business question. Plan tier (Free, Team, Enterprise) and acquisition channel are common alternatives. Activation week is the most useful one for product-led teams because it captures whether onboarding worked.
2. Make sure the cohort field is on the contact record. This is the step most teams skip and the one that breaks everything downstream. If signup_date lives only in Stripe and your CRM has no equivalent property, you cannot filter contacts by cohort in HubSpot. Sync it. Most CRMs will let you create a custom property called signup_date, signup_month, plan_name, or first_activated_at, and a sync tool can populate them from the source system.
3. Build one saved list per cohort row. In HubSpot or Attio, create a list filtered to contacts whose signup_date is between March 1 and March 31. That is your March 2026 cohort. Repeat for each month. Yes, this is tedious for 24 months of history. It is also a 10-minute one-time setup, after which the lists update themselves as new contacts come in.
4. Capture the state field for each elapsed period. This is where most CRMs hit their natural limit. A native CRM list can tell you "how many contacts in the March cohort have subscription_status = active right now," but it cannot tell you "how many were active 3 months after they joined" without a history table. For monthly retention, the workaround is a recurring snapshot: every month, export the cohort sizes and active counts into a sheet. After 12 months of snapshots, you have the full grid.
5. Build the grid in a spreadsheet. Google Sheets or Excel handles this cleanly with a pivot table. Cohort month is the row dimension, elapsed months are the column dimension, retention percentage is the value. Conditional formatting turns the grid into the classic green-to-red heat map you see in every retention deck.
6. Re-export each month. The grid grows by one column per month as cohorts age. New rows appear as new cohorts join. After a year of consistent snapshots, the grid becomes the most useful single chart in the business.
The whole loop assumes one thing: the cohort and state fields are in the CRM. If they are not, no amount of analysis tooling helps. Direct sync between billing, product, and CRM is what makes the rest possible.
Customer cohort analysis examples for SaaS retention and ecommerce repeat purchase
Two examples that show the same technique applied to different businesses.
SaaS signup-month retention cohort grid
A B2B SaaS company with 800 active customers wants to see whether retention has improved since they overhauled onboarding in February. They sync Stripe to HubSpot, mapping subscription.created to signup_date and subscription.status to subscription_status.
Cohort month | Cohort size | Month 1 retention | Month 3 retention | Month 6 retention |
|---|---|---|---|---|
Dec 2025 | 42 | 91% | 79% | 67% |
Jan 2026 | 51 | 90% | 78% | 65% |
Feb 2026 | 47 | 96% | 89% | 81% |
Mar 2026 | 55 | 95% | 88% | — |
Apr 2026 | 60 | 97% | — | — |
The Feb 2026 cohort retains 14 points higher at month 6 than the Jan 2026 cohort. The onboarding change worked, and the improvement is durable across the next two cohorts as well. That conclusion came from CRM data and a spreadsheet pivot, no warehouse involved.
Ecommerce repeat-purchase cohort grid
An ecommerce brand with 12,000 buyers tracks how many customers acquired in a given month make a second purchase within the next year. They sync Shopify customer and order data to their CRM, mapping order.created to first_order_date and counting subsequent orders per contact.
Cohort month | Cohort size | % with 2nd purchase by month 3 | % with 2nd purchase by month 6 | % with 2nd purchase by month 12 |
|---|---|---|---|---|
Q1 2025 | 1,180 | 22% | 38% | 51% |
Q2 2025 | 1,340 | 25% | 41% | 54% |
Q3 2025 | 1,510 | 24% | 39% | — |
Q4 2025 | 1,820 | 28% | — | — |
Q4 2025's month-3 repeat rate jumped 4 points over Q3. Holiday-acquired customers come back faster. That insight changes how the brand thinks about acquisition cost in November and December. Same technique, same toolchain, different cohort dimension.
Both grids took less than an hour of analyst time per quarter once the fields were synced. Neither required a data engineer. Both update themselves the moment new billing data lands in the CRM.
Cohort analysis tools and how to activate insights in your CRM, email, and ad tools
Most cohort analysis tutorials end at the grid. That is the wrong place to stop. A grid is a backwards-looking report. The reason to run cohort analysis is to change behavior, and that means pushing the cohort definition into the tools where you actually engage customers.
A few patterns we see work:
Declining cohort triggers a win-back sequence. A cohort whose month-6 retention dropped 10 points below the previous quarter is a signal to launch a win-back email campaign for the customers in that group. If the cohort is already a saved list in your CRM, syncing it into Customer.io or Mailchimp is one configuration step.
High-LTV cohort routes to CS. A cohort with above-average retention and revenue at month 6 is your best expansion target. Push that list to your CS platform as a tagged segment and have the team prioritize expansion conversations there.
Lookalike audiences for paid acquisition. Export the top-retaining cohort to Meta or Google Ads as a custom audience. The platforms will build lookalike audiences from it. You are now spending acquisition budget on the kind of customer who already proved they stick around.
Triggered upgrade campaigns for plan-tier cohorts. Group customers by their current plan, then watch each plan-tier cohort's usage. When a Free-tier cohort starts hitting limits in week 6 on average, that is your trigger to test upgrade prompts in week 5.
Bidirectional sync makes these activations cheap. The cohort list lives in your CRM as a saved list. The same sync tool that pulled signup_date from Stripe pushes the cohort tag back into Customer.io. The cohort definition stays in one place; the tools that need to act on it stay current automatically.
You still want a warehouse eventually. Once your cohort grids exceed a few hundred thousand customers, or once you start running multi-dimensional cohorts (signup month crossed with acquisition channel crossed with plan tier), a warehouse and a notebook will be faster than a spreadsheet. For most teams under 200 people, that day is not today. Today, the data you need is in Stripe and your CRM, and the gap between them is the only thing standing between you and a working cohort grid.
Can you do cohort analysis without a data warehouse?
What is the simplest cohort analysis example for a SaaS team?
What cohort analysis tools work without SQL?
How is retention cohort analysis different from churn rate?
How often should cohort grids be refreshed?