A growth engineer renames subscription_started to subscription_activated in the checkout flow on a Tuesday afternoon. The change ships at 3 PM. By 5 PM, the activation funnel in your analytics tool shows a 100% drop. The lifecycle email campaign that fires on "subscription started" goes dark. The Slack alert for new paying customers stops pinging. Nothing logged an error, because nothing failed. The event the producer sends and the event the consumers expect simply no longer agree. The schema document that was supposed to encode that agreement has been out of date for weeks.
A tracking plan is supposed to prevent exactly this. In practice, most are spreadsheets that nobody updates after the third week, and the schema contract they were meant to encode lives only in tribal memory.
What a tracking plan is and why teams build one
A tracking plan is the schema contract between the code that emits events and every tool that reads them. It lists each event your product sends, the properties attached to that event, the type of each property, the allowed values, and the team that owns the event. Some teams call it a data tracking plan, others ask "what is a data plan" and mean the same thing, and a few call it an event tracking plan. The artifact is identical regardless of the label. The work of building one is what people mean by data planning.
The reason teams build one is straightforward. As soon as more than one person writes instrumentation code, naming conventions diverge. One engineer writes signed_up. Another writes signup_completed. A third writes user_signup. Six months later the analytics team is stitching three event names together in every funnel query and quietly dropping the fourth that nobody noticed. The plan exists so that the answer to "what do we call this event" lives in one place instead of in five Slack threads.
The plan also pins down properties. subscription_started with a plan property that is sometimes the string "team" and sometimes the integer plan ID is unusable downstream. The plan says: plan is a string, allowed values are free, team, business, enterprise, and the destination tools should reject anything else.
Most plans cover four things:
Naming conventions. Verb-noun or noun-verb, snake_case or camelCase, prefixes for product area.
Event catalog. Every event the product can emit, with a one-sentence description of when it fires.
Property schemas. For each event, the properties, their types, and any allowed-value enums.
Ownership. Which team is responsible for each event and who to ask before changing it.
That is the floor. Teams that take the discipline seriously also add downstream consumer columns (which tool reads this event), priority tiers (which events block dashboards if they break), and links to the code that emits them.
What goes in the plan: events, properties, types, and ownership
The format is less important than the columns. Most teams start in a spreadsheet, graduate to a JSON Schema or YAML file checked into a repo, and end up in a dedicated tooling layer once the plan crosses fifty events. A workable template has six columns.
Column | Example | Notes |
|---|---|---|
Event name |
| Verb-past-tense, snake_case, no PII |
Description | Fires when a paid subscription begins | One sentence, plain English |
Properties |
| Names follow the same convention |
Types and allowed values |
| Use JSON Schema if you can |
Owner | Billing team | A team, not a person |
Consumers | Mixpanel, Customer.io, HubSpot, Stripe Sigma | Who breaks if this changes |
A few rules earn their place once you have written more than a handful of events.
Use past-tense verbs for event names. subscription_started, not start_subscription. Events describe things that happened, not commands. Past tense also avoids the awkward "is this event the user's intent or the system's confirmation" ambiguity.
Treat enums as part of the schema. A plan property that accepts any string is a property that will eventually contain team-monthly, Team, and TEAM_v2. Specify allowed values in the plan and validate them at emit time, not later.
Avoid PII in event names. user_jane.doe@acme.com_logged_in is a real anti-pattern. Names should be cardinality-bounded and free of personal data. PII goes in properties on identified events, where access control and deletion can apply.
Pick one ownership team per event, not a list. Shared ownership means no ownership. If the billing team owns subscription_started, the billing team is the one who reviews changes, regardless of who writes the code that emits it.
For the data types themselves, JSON Schema has become the default lingua franca because it travels well across languages. The instrumentation library can generate strongly typed stubs in TypeScript and Python from the same schema, and the validation layer in the CDP or warehouse can use it to reject malformed events at the door. PostHog, Amplitude, and most modern CDPs accept JSON Schema descriptions natively.
Tracking plan vs data dictionary vs event taxonomy
People use these three terms as if they were interchangeable, which causes most of the confusion when a team tries to "build one" and ends up with a document that does none of the three jobs well.
The first governs events your app emits. The audience is the engineers writing instrumentation code and the analysts who consume the events downstream. It is forward-looking: it describes what the system should send.
A data dictionary describes fields on records already stored somewhere. The fields on the users table in your Postgres, the columns in your events table in BigQuery, the properties on a HubSpot contact. The audience is anyone querying or modifying that store. It is backward-looking: it documents what is already there.
Event taxonomy lives inside the plan. The taxonomy is the naming and categorization scheme: the rule that says event names are past-tense verbs, the rule that says property names are snake_case, the rule that says events group into categories like auth, billing, and engagement. The plan applies the taxonomy to a specific catalog of events. Taxonomy without a catalog is theory.
Practically, most companies need both artifacts. The same property often appears in each. plan is a property on subscription_started events (tracking plan) and also a column on the customers table (data dictionary). They should agree, but they describe two different surfaces.
A useful mental check: if the value comes into existence by an action your app instruments, it belongs in the event schema. If it sits at rest in a store you query, it belongs in the data dictionary. Some fields legitimately appear in both, and that overlap is where most schema drift originates.
Why most plans drift from reality
Walk into any company older than two years and ask to see their event schema documentation. You will get one of three answers. There is no plan. There is a plan but it is six months out of date. There is a plan, it is current, and the company has paid a real ongoing cost to keep it that way.
The drift is structural. Engineers ship features on the engineering team's clock, and updating the schema document is on someone else's clock. A renamed property in code is a one-line diff. Updating the spreadsheet, telling analytics, updating dashboards, updating downstream filters in the CDP, and notifying the marketing team that their lifecycle trigger needs a new condition is a week of cross-team coordination. The cheaper path always wins in the short run.
There are four common drift patterns:
Silent renames. Property gets renamed in the producer. Consumers keep querying the old name. The old name returns null. Dashboards quietly go to zero.
Stealth additions. A new event ships without a tracking-plan entry. It works fine for the producer. The downstream tool ignores it because no one configured it. The data is in the pipeline but invisible to the people who would use it.
Type creep. A property starts as an integer (
plan_tier: 1). Someone changes it to a string (plan_tier: "starter"). Half the events have integers, half have strings, queries break in subtle ways depending on which half they hit.Orphan events. Old events kept emitting after the feature was deprecated. Nobody removed the instrumentation. They show up in queries and confuse new hires for years.
The reason this isn't a discipline problem is that the validation layer usually sits inside the CDP. The validation layer is the thing that would catch the rename or the type change. But the CDP is the analytics team's tool. The engineer making the rename has no reason to look at it before merging. By the time the validation error surfaces, the change is already in production.
You can paper over this with process. Schema reviews in PRs, automated checks on JSON Schema diffs, weekly tracking-plan audits. Some teams pull it off. Most don't, because the cost of the process is paid by engineering and the benefit accrues to analytics. The asymmetry guarantees the process slips.
For a deeper look at how this plays out in CDP-driven workflows, the CDP Institute publishes annual surveys on data governance practices. The surveys tend to confirm what most practitioners already suspect: fewer than half of teams with a formal plan report it as accurate.
How to keep your schema contract in sync
There are two strategies that actually work, and they are almost opposites.
The first is heavy enforcement. Treat the plan as a contract that the producer cannot violate. JSON Schema files in the repo. Codegen for instrumentation libraries so engineers cannot emit an event that isn't in the plan. Validation at the CDP boundary that rejects malformed events with a loud error, not a silent drop. CI checks on schema diffs that require sign-off from the consumer teams. Companies running on Segment Protocols, RudderStack, or Snowplow's iglu schemas all use some version of this. It works when the company commits to it, and it costs ongoing engineering time forever.
The second is to sidestep the problem by not generating custom events in the first place.
This sounds backwards, so consider the breakdown of where your customer data actually lives. For most teams under 200 people, the answer is: Stripe holds the billing data. HubSpot or Salesforce holds the CRM data. PostHog or Mixpanel holds the product analytics. The application database holds the user records. Each of those tools already has a schema. Each one already enforces its schema at write time. Each one already exposes the schema through an API or a documented integration spec.
If your downstream tools (the CRM, the marketing platform, the warehouse) read directly from those source schemas through a sync layer that respects them, there is no custom event stream to govern, because the events are owned by the source SaaS, not by your app. The plan you would have built to describe subscription_started is replaced by the Stripe subscription object. Stripe owns the schema. Stripe versions it. Your sync layer respects it.
This doesn't eliminate the need for a plan everywhere. Custom in-app events still need one: feature_x_used, dashboard_filter_applied, anything that only exists because your product instrumented it. But the surface shrinks dramatically when the billing schema is Stripe's problem and the CRM schema is HubSpot's problem.
We built Oneprofile on this assumption. Record types and properties are inherited from the source SaaS schemas, not declared in a separate document you maintain. When a property changes shape at the source, the sync engine flags it as a schema-drift error on the next run instead of silently writing bad data downstream. The governance artifact and the enforcement layer are the same surface, which is the one place tracking-plan workflows traditionally split apart.
If your data is mostly custom events you instrument yourself, build a real plan and invest in the enforcement layer. If your data mostly already exists in SaaS tools with their own schemas, you probably need less governance overhead than the standard advice suggests, and more of a sync layer that doesn't lose track of which schema is canonical. The right answer depends on where your data actually lives, not on what a CDP vendor tells you you should have built.
What is a tracking plan in simple terms?
What is the difference between a tracking plan and a data dictionary?
Do I need a tracking plan if I use a CDP or analytics tool?
What goes in a tracking plan template?
Why do tracking plans drift?
What is event taxonomy?
