Data Sync API Errors: A 5-Category Guide

Data Sync API Errors: A 5-Category Guide

Data Sync API Errors: A 5-Category Guide

Photo of Utku Zihnioglu

Utku Zihnioglu

CEO & Co-founder

An expired OAuth token can produce 50,000 identical 401s in one sync run. Your engineer sees the alert the next morning, scrolls through page after page of the same response body, and eventually realizes it's a single root cause. By then the marketing team has noticed that nobody in HubSpot has a current plan status, and the sync queue is backed up by an entire day's worth of changes. That's how most data sync API errors end up wasting a morning.

This is not a debugger problem. It's a UI problem. The sync log faithfully recorded what happened, one row per failed record, which is exactly how a log file works and exactly the wrong way to think about it. A debugger view shows you every event. An operator view tells you what to do.

Most sync tools never make that distinction. They show you the debugger. The argument here is for the operator view: a taxonomy on the sync run itself, errors grouped by their underlying cause, each one tagged with whether it's actionable (you need to do something) or transient (the sync will keep trying), and a direct link to the page where you fix it.

Why a flat list of data sync API errors is the wrong abstraction

Open a typical sync tool's error tab and you'll see something like this:

2024-11-12 03:14:22  contact_id=8472  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
2024-11-12 03:14:23  contact_id=8473  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
2024-11-12 03:14:24  contact_id=8474  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
...
2024-11-12 03:14:22  contact_id=8472  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
2024-11-12 03:14:23  contact_id=8473  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
2024-11-12 03:14:24  contact_id=8474  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
...
2024-11-12 03:14:22  contact_id=8472  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
2024-11-12 03:14:23  contact_id=8473  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
2024-11-12 03:14:24  contact_id=8474  POST /contacts  401  {"category":"EXPIRED_AUTHENTICATION"}
...

Multiply that by 50,000 rows. The format makes sense for a debugger replaying a single record's journey, but a single record is rarely what an operator cares about. The operator cares about the run: how many things failed, why, and what to do next.

There's a deeper issue with the flat list, which is that it confuses noise with signal. A run with 50,000 auth failures is not 50,000 problems. It is one problem repeated 50,000 times. A flat log presents them as equal in weight, so a reader has to scan for patterns the tool already knows about. The destination API returned the same error code on every row. The tool collected it. Then it threw away the structure and gave you a stream.

This shows up most visibly in CRM sync failures because CRMs have rich error responses. HubSpot returns a category and a subcategory. Salesforce returns a errorCode enum with about thirty entries. Both APIs are already pre-classifying the failure for you. A good sync tool reads those fields. A flat-log sync tool ignores them.

I keep coming back to this because it's the single biggest gap between how sync tools are built and how teams actually use them. We built Oneprofile after watching three different ops teams paste the same JSON snippets into Slack and ask each other "is this auth or rate limit?" The information was in the response body. The tool just wasn't reading it.

The five categories every data sync API errors taxonomy needs

After enough time staring at destination API responses, the failure modes converge. There are really only five categories, and almost every error you'll see in production is one of them.

Category

What it means

Who fixes it

Retry?

Auth

Token expired or revoked

Operator reconnects

No

Permissions

App lacks scope or role

Admin updates grants

No

Plan/quota

Destination billing tier exceeded

Owner upgrades plan

No

Validation

Payload rejected by schema

Operator fixes mapping

No

Transient

Rate limit, 5xx, network blip

Sync engine retries

Yes

The categories are not equally common. In our data, transient errors make up roughly 70% of the raw error volume but almost none of the operator-actionable work. Auth and plan limits are 5% of the volume but close to 80% of the tickets. Validation errors are uneven: they cluster around a few records (a missing required field, an invalid date format) and a one-time fix usually resolves them all.

What this means for the UI is that grouping matters more than completeness. If your error view shows transient errors with the same prominence as auth errors, the operator drowns. If it folds 50,000 transient retries into a single "Rate limit, retried successfully" row and surfaces the one auth failure in red at the top, you've done the hardest part of the job.

A note on definitions. "Actionable" here means a human has to change something outside the sync tool itself. Reconnecting an integration, updating billing, fixing a field mapping in another system. Transient errors are non-actionable by definition because the sync engine handles them on its own.

What an actionable sync error looks like

There's a difference between an error message and an error object that can drive a UI. The first is a string. The second has structure. For sync error taxonomy to be useful, every error a tool surfaces should carry at least:

  • A category from the fixed taxonomy (auth, permissions, plan, validation, transient).

  • A count, not a list of rows.

  • A human-readable summary, not just a JSON dump.

  • An actionable flag. Does a human need to do something, or will the sync recover on its own?

  • A link to where the human goes to fix it.

That last one is where most tools fall short. An error that says "OAuth token expired for HubSpot" is informational. An error that says "OAuth token expired for HubSpot" with a button labeled "Reconnect HubSpot" that opens the integration page is actionable. The first one ends in a Slack thread. The second one ends in a fixed sync. Same information, different time-to-resolution.

The OAuth spec leaves token lifetime mostly up to the provider, so refresh behavior varies, and the OAuth 2.0 RFC's discussion of refresh tokens is worth reading if you're building any of this yourself. In practice you don't need to understand the spec to fix the error. You just need the link.

The same applies to validation errors. A typical validation response looks like {"message":"property values were not valid","errors":[{"isValid":false,"message":"Property \"plan_tier\" does not exist"}]}. The taxonomy view collapses this to "Field mapping invalid: plan_tier" with a link to the field mapping screen for that integration. The operator clicks once and is in the right place.

Don't underestimate how much this matters. Search any RevOps Slack and you'll find threads that exist only because someone couldn't figure out which settings page would fix a specific error. The settings exist. The link from the error to the settings is what's missing.

Auth, rate limit, payload, plan limit, and outage: handling each kind of data sync API errors

This is the operator's reference. For each category, what it looks like, what it means, and what the sync tool should do.

Auth (HTTP 401, sometimes 403): The OAuth token has expired or been revoked. Common across HubSpot, Salesforce, Pipedrive, Mailchimp, Intercom, and any other tool that uses OAuth. The destination response is almost always self-explanatory: HubSpot returns category: EXPIRED_AUTHENTICATION, Salesforce returns INVALID_SESSION_ID, Stripe (which uses API keys) returns Invalid API Key provided. The right response: pause the sync, surface a single auth error with a reconnect link, and stop retrying. Retrying an expired token is just generating noise.

Permissions (HTTP 403): The token is valid but the connected app doesn't have the right scope, or the user it represents doesn't have the right role. This shows up after admin changes ("we moved everyone to a new permission group"), after app updates that request new scopes, or when a sync writes to an object that was previously read-only. Salesforce in particular generates a specific code (INSUFFICIENT_ACCESS_OR_READONLY) for this. Surface as a single actionable error with a link to the admin role page in the destination tool.

Plan/quota (HTTP 402, 403, or sometimes a custom 4xx): The destination tool's billing tier or hard quota is exhausted. Mailchimp returns errors when you exceed your audience size; HubSpot returns specific codes when you hit a custom property cap; Intercom limits monthly active people on lower plans. These are easy to spot in retrospect ("oh, we hit the limit") but easy to miss in the moment because they often arrive on only a subset of records. The taxonomy view groups them under "Plan limit reached: HubSpot custom properties" with a link to the billing page.

Validation (HTTP 400, 422): The payload was rejected. Usually one of three things: a field doesn't exist on the destination, a value doesn't match the destination's type, or a required field was empty. The right grouping is by field, not by record. "82 records failed because phone is not a valid phone number" tells you to fix the source data or change the field mapping. "82 records failed" tells you nothing.

Transient (HTTP 429, 5xx, timeouts): Rate limit errors data sync engines hit constantly, plus the occasional 502 or connection reset. The destination is alive but not happy. The right response is to back off, retry with jitter, and only escalate if the rate persists across multiple runs. HubSpot's rate limit documentation describes their per-second and daily caps, and most CRMs publish similar guides. A sync engine that respects those caps will produce far fewer transient errors than one that hammers the API and gets throttled.

There's a sixth thing I deliberately didn't include in the taxonomy: destination outages. Outages are technically transient (the API will come back) but they're not really an error in the sense the operator cares about. If HubSpot is down, no amount of sync configuration fixes it. Group these under "Destination unreachable" and show a status page link. Operators can stop asking questions and go work on something else.

When to retry, surface, or abort on data sync API errors

The retry question is the cleanest test of whether a tool's error model is right. The answer comes directly from the taxonomy:

  • Transient → retry, surface a summary only if it persists.

  • Auth → don't retry, surface, pause the sync.

  • Permissions → don't retry, surface, keep syncing records that aren't affected.

  • Plan/quota → don't retry, surface, optionally pause if the whole sync is blocked.

  • Validation → don't retry the same payload, surface grouped by field.

The two things to never do: retry indefinitely on non-transient errors (this turns 1 problem into 100,000 API calls), or drop records silently after a few retries (this loses data). Most flat-log sync tools commit one of these two sins. Surface and pause is almost always the right default for actionable errors.

A small but useful detail: the right point to abort the whole sync is when the actionable error blocks every record. If the OAuth token is expired, every record will fail, and continuing wastes both API quota and processing time. If a single field mapping is broken, only some records fail, and aborting the run would block everything else from syncing. Operate at the record level when you can, at the run level when you have to.

We built integration API error handling into Oneprofile to behave this way by default. Every sync run shows a grouped error view classified by the taxonomy above. Each group has an actionable flag, a count, and a button that takes you to the right settings page in one click. Transient errors are retried in the background and only surfaced if they persist; the rate-limit spike that used to look like a thousand emergencies now looks like one line that says "rate limited, retried successfully." The auth failure that used to hide in a sea of 401s now sits at the top of the page in red with a reconnect button next to it.

The goal isn't a better log viewer. The goal is to get the human to the right settings page in one click, then get out of the way. A sync tool that does this turns most data sync API errors from a debugging session into a thirty-second fix. The five categories don't change. What changes is how long the error sits there before someone resolves it.

What causes most data sync API errors?

Should a sync tool retry every failed record?

How do I debug a 401 error on a CRM sync?

Are rate limit errors data sync failures?

Why do plan limit errors keep recurring?

Ready to get started?

No credit card required

Free 100k syncs every month