What an SLA actually is in field service
Most of what's published about SLAs comes from IT operations — uptime percentages, ticket queues, P1 incident response. Useful framework, wrong vocabulary for your business. In field service, an SLA is a contractual commitment to a specific physical action: a qualified technician arrives at the customer's site, diagnoses a problem, and restores service within a defined window. The clock runs in real-world hours, not synthetic monitoring intervals.
Per the ITIL service-management framework, a service-level agreement is "an agreement between a service provider and a customer that documents service-level targets and specifies the responsibilities of the service provider and the customer." Field-service SLAs typically combine three concrete commitments: response time (how fast someone is on the way), arrival time (how fast they reach the site), and resolution time (how fast service is restored). For commercial HVAC contracts, industry guides from facilities-maintenance and maintenance-management vendors put typical premium response targets at 2 hours, mid-tier at 4 hours, and standard at 8 to 24 hours.
The other thing that makes field-service SLAs different: the breach is physical. If your tech is stuck in traffic, the SLA breaches. If the truck stock is wrong and the tech needs a return visit, the resolution SLA breaches. If a snowstorm pushes every drive time by an hour, half your day's SLAs breach. None of those are software problems in the IT-SLA sense. They're operational problems that need an operational fix.
That's why most of the published advice — IT-SLA templates, ITIL framework summaries — gets you halfway there at best. The rest of this guide is the field-service-specific half.
Why most field service teams track SLAs badly
The pattern is consistent across the 5-to-100-tech B2B service businesses we talk to. SLA terms exist in the customer contract, sometimes negotiated months or years ago, and the operations team is supposed to honor them. But the tracking lives in a spreadsheet that gets updated weekly by someone who isn't in dispatch, isn't on the trucks, and only sees what's already happened.
Four failure modes show up over and over.
The clock starts in the wrong place. Most spreadsheet-based trackers timestamp the SLA from "ticket created" — but contractually, the clock often starts at "customer notified provider" or "service request received." Those can be hours apart. By the time the dispatcher logs the job, you've already burned a chunk of the response window without knowing it.
The clock doesn't stop in the right places. Real SLAs pause when the ball is in the customer's court — waiting for site access, waiting for parts the customer agreed to supply, waiting for approval on a change order. Spreadsheets don't pause. The breach gets recorded against you for time you weren't responsible for.
Reporting is retrospective. The monthly SLA report tells you what happened last month. By then, the customer has already noticed three slipped responses, and the only thing the report does is confirm what they were already mad about.
Tiers don't map to dispatch behavior. The contract says "4-hour response for Gold accounts." The dispatch board doesn't show which accounts are Gold, so the dispatcher treats every job equally — and the cherry-pick problem hits Gold customers the same way it hits everyone else.
The compounding effect is the watermelon effect — green on the outside (your monthly report shows 92% attainment), red on the inside (the customer's experience tells them you're slipping). That gap is what kills renewals.
Step 1: Define your SLA tiers
Before anything else, you need a tier structure. Most field-service contracts collapse into three or four tiers, and the structure below tracks closely with how public-sector facility maintenance frameworks define priority — for example, St. Mary's County, Maryland's published maintenance priority schedule defines Emergency work orders with a 2-hour working-hours response, 8-hour off-hours response, and 24-hour completion target, with progressively longer windows for Urgent, Routine, and Minor work.
| Tier | Typical customer | Response time | Arrival time | Resolution target | First-time-fix target | Uptime guarantee |
|---|---|---|---|---|---|---|
| Bronze (standard) | Residential, light commercial, single-site | 24 hours | Next business day | 72 hours | 75% | None |
| Silver (priority) | Multi-site commercial, retail | 8 hours | 8 hours | 48 hours | 80% | None |
| Gold (premium) | Healthcare, critical commercial, high-volume retail | 4 hours | 4 hours | 24 hours | 85% | 98% |
| Platinum (mission-critical) | Data centers, hospitals, labs, life-safety systems | 1 hour | 2 hours | 8 hours | 90% | 99.5%+ |
Two things to flag. First, every commitment in this table costs you something — fewer dispatch options, more on-call coverage, more truck-stock investment, more buffer time built into other accounts. If you're going to commit to Platinum SLAs, you need to price them like Platinum: the operational load is real. Second, response time and arrival time are different things in field service even though IT-SLA templates often conflate them. Response is "we acknowledge your request and assign a technician." Arrival is "the technician is on-site." The customer cares more about arrival; your dispatcher's metric is response. Both belong in the contract.
Build your real tiers from the contracts you already have. Pull the response/resolution language out of every active service agreement, sort by tier, and look at the spread. You'll usually find one of two problems: too many bespoke tiers (every big customer negotiated their own variant) or too few (everybody is on the same template and Platinum customers are getting Bronze service). Consolidating to four tiers — even if it means renegotiating a handful of edge-case contracts at renewal — pays back fast in dispatch simplicity.
Step 2: Map SLAs to customer accounts at intake
The tier structure means nothing if the dispatcher doesn't see it. Every customer record in your system needs a tier flag, and every job created against that customer needs to inherit it automatically. This is the single most under-implemented piece of SLA infrastructure in field service — the contract says Gold, the dispatch board says nothing, and the response time slips because the dispatcher had no way to know.
Three intake rules to enforce.
Tier flag on the customer master record. When a contract is signed or renewed, somebody updates the customer's tier in the system. This belongs in a closing checklist tied to contract execution, not a "we'll get to it" task. If the contract terms changed at renewal, the tier flag and the response/resolution clocks behind it have to change the same day.
Job inherits tier from customer at creation. The dispatch board shows the SLA clock and remaining time on every job, color-coded by tier. The Platinum job lands in red urgency the moment it's created, not when the dispatcher remembers to check.
Override path with audit. If a particular job needs a different SLA than the customer's default tier (an emergency walk-up from a standard customer; a non-critical PM at a Platinum site), the dispatcher can override — but the override is logged with reason and timestamp. This protects you from drift and gives you a clean audit trail when a customer disputes a service credit.
The reason this matters operationally: the dispatcher doesn't have time to look up tier terms in the middle of an emergency call. The system has to surface the relevant clock, qualified-tech list, and customer preferences in one screen. Anything more friction-laden than that gets bypassed.
Step 3: Set escalation triggers
Reactive SLA tracking — "the breach already happened, here's the report" — doesn't help you. Predictive escalation does. The pattern that works in field service is a 4-stage trigger model based on percentage of the SLA window consumed.
50% — Awareness. The clock has crossed the halfway mark. No action required, but the dispatch board flags the job in yellow. The dispatcher should know which jobs are mid-window so they don't accidentally make a reassignment decision that pushes a job past breach.
75% — Active monitoring. A push notification fires to the dispatcher: "Job #4231 is at 75% of SLA window. Tech currently 35 minutes from site." The dispatcher decides — keep current assignment, swap to a closer tech, or escalate to the on-call backup. The decision happens before breach is inevitable.
90% — Pre-breach intervention. Notification fires to dispatcher AND ops manager. At this stage, the question changes from "can we make it?" to "who do we call?" If the breach is about to happen, the customer hears from you first — not after.
100% — Breach logged with reason code. The breach is recorded against the job with a structured reason: traffic, parts unavailable, customer-caused delay, no qualified tech available, weather. Reason codes turn breach data into operational improvement data — which you'll need for Step 5.
Two technical notes. First, escalation channels need to match how your team actually works. Push notification to a phone the dispatcher checks every two minutes is useful. An email that lands in an inbox they read once a day is not. Second, escalation is only as good as the rule that triggers it — if your SLA clocks are wrong (Step 1) or the tier flag is wrong (Step 2), you're escalating noise.
Step 4: Pause and resume SLAs correctly
Most spreadsheet trackers don't pause SLAs. They should. Real customer contracts almost always include conditions that stop the clock, and if you're not honoring them in your tracking, you're recording breaches you're not contractually responsible for — and the watermelon report from Step 5 ends up worse than reality.
Common pause conditions in field-service contracts:
- Waiting for customer-supplied access (security clearance, after-hours building access, key handoff)
- Waiting for customer-supplied parts or materials
- Waiting for customer approval on a quote, change order, or scope expansion
- Force-majeure conditions explicitly carved out in the contract (severe weather, regional emergencies, utility outages)
- Pre-scheduled maintenance windows where the customer has agreed downtime is acceptable
The technical implementation: every SLA clock needs a pause/resume action with a reason code, logged with a timestamp. When the dispatcher pauses for "waiting on customer key access," the clock stops; when access is granted, the clock resumes from the pause point. The audit trail proves the pause was legitimate when the customer reviews the monthly report.
The discipline implementation: train the dispatchers to use the pause feature aggressively but honestly. Aggressive use protects you from breach inflation. Dishonest use — pausing for "weather" when the real cause was a parts shortage — gets caught by the customer the second time and destroys trust permanently.
Step 5: Report on SLA performance monthly
The monthly SLA report is the artifact your customer renews on. Get it right and renewals are conversations, not negotiations. Get it wrong and every renewal is a fight.
What belongs in a monthly SLA report for each commercial customer:
- SLA attainment percentage broken down by metric — response, arrival, resolution, first-time-fix
- Breach detail — every breach in the month, with reason code, duration past breach, and corrective action
- Trend lines — current month vs. trailing 3 months, so improvement or drift is visible
- Service-credit calculation if applicable — what's owed under the contract terms, computed automatically
- Top jobs by complexity or repeat-visit count — gives the customer signal on which assets need replacement vs. continued repair
Two design rules. First, the report should run automatically on the first business day of the month and land in the customer's inbox the same day. If your account manager has to manually compile it from spreadsheets, three things happen: it ships late, the numbers are wrong, and the customer notices both. Second, the report should match exactly what the customer is seeing in their portal in real time. If the monthly summary disagrees with the live data, the customer trusts neither.
Industry guides from maintenance-management vendors and B2B-support platforms commonly suggest 95% SLA attainment as a strong target for commercial contracts and 85%+ first-time-fix as a healthy benchmark. Track both. Below those thresholds, the renewal conversation gets harder; above them, you have a story to tell about why your premium tier costs what it costs.
Common SLA tracking mistakes
Patterns we see repeatedly in mid-market field-service operations:
- Tracking only on emergency calls. Resolution-time SLAs on standard work matter too. A Gold customer with a 24-hour resolution commitment whose PM visit drifts to 36 hours has a breach, even though no one called it an emergency.
- One-size-fits-all dispatch. The dispatcher treats every job equally because the dispatch board doesn't show tiers. The Platinum job gets the same response as the Bronze walk-up, and you breach the contract you can least afford to lose.
- Counting business hours wrong. Some SLAs run on 24/7 clocks (data centers, hospitals); others pause overnight or on weekends. Mixing them in the same tracker silently rewrites your contracts.
- Watermelon reporting. Aggregate attainment is 92%, but the four breached jobs were all at one specific Gold customer. The customer's experience isn't 92% — it's a series of misses they remember vividly. Per-customer breakdown beats overall percentage every time.
- No exception handling. When the breach was customer-caused (no site access, no decision on quote), the system records it as your breach. Without pause/resume discipline (Step 4), your numbers look worse than reality and you can't defend them.
- Manual escalation only. The dispatcher is supposed to "watch the board" for breaches. During a busy day they can't. Automated 50/75/90 thresholds exist because human attention is the wrong tool for this.
- Ignoring first-time-fix as an SLA. Resolution time looks good if you keep returning until the problem is fixed. First-time-fix tells the truth about whether you actually solved it.
What good SLA software actually does
Most maintenance-management and field-service tools claim "SLA tracking." What separates real implementations from feature-listicle implementations is whether the SLA clock is wired into the dispatch decision, not just the reporting layer.
A working SLA system shows the live clock on the dispatch board next to every job. It pauses and resumes with a logged reason. It fires escalation notifications at 50/75/90% of the window before breach happens — not after. It feeds tier and remaining time into the dispatch matching engine, so an SLA-at-risk job gets routed to the closest qualified tech automatically instead of waiting for the dispatcher to notice. And it generates per-customer monthly reports without anyone copy-pasting from a spreadsheet.
FSM Navigator's intelligent dispatch engine treats SLA urgency as a first-class input alongside skills, proximity, and workload — so when a Platinum job is at 75% of its window, the engine surfaces it for re-routing before breach. The dispatch board shows live SLA clocks on every job. Push notifications fire to the dispatcher's phone at the configured thresholds. Per-customer attainment reports run on schedule. Breach prediction is the difference between a system that documents your failures and a system that prevents them.
A 30-day SLA enforcement rollout plan
Most operations teams try to roll SLA tracking out as a 90-day project and lose momentum at week six. A 4-week plan with one milestone per week — and one accountable owner — gets you to live tracking faster, even if some details get refined later.
Week 1 — Audit and tier definition
Pull every active service agreement. Extract the SLA terms (response, arrival, resolution, first-time-fix, any uptime commitments). Sort customers by current commitment level. Define your 3 or 4 tiers and map every customer to one. Identify the 5-10 customer contracts that don't fit cleanly — these are renegotiation candidates at next renewal. Owner: ops manager + account management.
Week 2 — System configuration and tier mapping
Configure the tiers in your field-service platform. Add the tier flag to every customer master record. Verify the SLA clock starts at the right event (request received, not ticket created) and respects business-hour rules where applicable. Set up the pause/resume reason codes from Step 4. Owner: ops manager + admin user.
Week 3 — Escalation triggers and dispatcher training
Configure the 50/75/90% escalation thresholds. Wire push notifications to the dispatcher and the ops manager. Run a half-day training session with the dispatch team on tier-aware dispatch decisions, the pause/resume workflow, and the override audit trail. Run two days of pilot operation with the team watching for false positives or misconfigured rules. Owner: ops manager + lead dispatcher.
Week 4 — Reporting setup and customer communication
Configure the monthly per-customer SLA report. Run a sample report against last month's actual data and verify the numbers match what your accounts team would have compiled manually. Send a short note to your top 10 commercial customers letting them know they'll start receiving the monthly report on the first of next month — and walk one or two of them through it personally. Owner: ops manager + account management.
By end of week 4 you have live SLA tracking with predictive escalation, tier-aware dispatch, and customer-facing reporting. Compared with the 90-day status quo, that's your second-quarter renewals saved.
From spreadsheet to system
SLA tracking in a spreadsheet is the silent breach generator your operation can't afford. Customers notice slips weeks before your monthly report does. The fix isn't more careful spreadsheet work — it's wiring the SLA clock into the daily dispatch decision so breaches get prevented, not documented.
FSM Navigator's intelligent dispatch engine treats SLA urgency as a first-class matching factor, with live clocks on the dispatch board, pause/resume discipline built in, and breach-prediction escalations that fire before the breach lands. Works from day one — no training period, no black box. Configure your tiers Tuesday, dispatch correctly Wednesday. For a deeper read on where the dispatch leak comes from in the first place, see the hidden cost of manual dispatch.