Designing an Offline-First Sync Engine for Mobile Apps

Dhruval Dhameliya·February 17, 2026·7 min read

A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.

Designing an Offline-First Sync Engine for Mobile Apps

Context

Mobile apps operate in unreliable network environments. Users expect instant feedback regardless of connectivity. An offline-first sync engine treats the local database as the source of truth and syncs with the server asynchronously.

Problem

Most mobile apps treat the network as a given. They show a spinner, make a request, render the response. This breaks in three common scenarios:

  • Flaky connections: elevators, tunnels, rural areas, crowded venues
  • High latency: emerging markets where round trips take 2 to 5 seconds
  • Aggressive battery optimization: the OS kills background connections on both Android and iOS

The core problem: how do you keep the app fully functional offline while ensuring data consistency when connectivity returns?

Constraints

  • Local database must be the single source of truth for reads
  • Mutations must be captured and queued for async sync
  • Conflict resolution must be deterministic and predictable
  • Sync must be idempotent (safe to retry any operation)
  • Battery and bandwidth must be respected (no sync on every keystroke)
  • The engine must recover from mid-sync crashes without data loss

Design

The sync engine sits between the app's data layer and the remote API. Four responsibilities:

  1. Local persistence: all reads and writes hit a local database
  2. Change tracking: mutations captured as an append-only operation log
  3. Sync scheduling: background process pushes and pulls when connectivity allows
  4. Conflict resolution: deterministic strategy when local and remote diverge

Offline-First Sync Engine Architecture

Related: Designing Mobile Systems for Poor Network Conditions.

See also: Designing Event Schemas That Survive Product Changes.

Operation Log

Every mutation gets written to an append-only log before touching the local database. Each entry contains:

  • Unique operation ID
  • Entity type and entity ID
  • Operation type (create / update / delete)
  • Logical timestamp (monotonically increasing counter, not wall clock)
  • Payload (for creates and updates)
data class SyncOperation(
    val id: String = UUID.randomUUID().toString(),
    val entityType: String,
    val entityId: String,
    val type: OperationType,
    val timestamp: Long,
    val payload: Map<String, Any?>?,
    val status: SyncStatus = SyncStatus.PENDING
)
 
enum class OperationType { CREATE, UPDATE, DELETE }
enum class SyncStatus { PENDING, IN_FLIGHT, SYNCED, FAILED }

Logical clocks avoid issues with users changing device time or timezone drift across devices.

Sync Scheduling

Batch operations. Sync when conditions are favorable:

TriggerStrategy
Network availableConnectivityManager (Android) / NWPathMonitor (iOS)
DebounceWait 2 to 5 seconds after last write
RetryExponential backoff: 1s, 2s, 4s, 8s, capped at 60s
Periodic fallbackWorkManager / BGTaskScheduler every 15 minutes
class SyncScheduler(
    private val connectivityMonitor: ConnectivityMonitor,
    private val syncEngine: SyncEngine
) {
    private var debounceJob: Job? = null
 
    fun onLocalWrite() {
        debounceJob?.cancel()
        debounceJob = scope.launch {
            delay(3_000)
            if (connectivityMonitor.isConnected()) {
                syncEngine.push()
            }
        }
    }
}

Conflict Resolution

Two devices edit the same record while both are offline. Three strategies, ordered by complexity:

Last-Write-Wins (LWW): highest logical timestamp wins. Simple. Silently discards changes. Acceptable for user preferences, read receipts.

Field-Level Merge: merge at field level. Device A changes name, device B changes email, both survive. Conflict only when the same field is modified on both sides.

fun mergeFields(
    base: Map<String, Any?>,
    local: Map<String, Any?>,
    remote: Map<String, Any?>
): Map<String, Any?> {
    val merged = base.toMutableMap()
    for (key in (local.keys + remote.keys)) {
        val localChanged = local[key] != base[key]
        val remoteChanged = remote[key] != base[key]
        merged[key] = when {
            localChanged && !remoteChanged -> local[key]
            !localChanged && remoteChanged -> remote[key]
            localChanged && remoteChanged -> remote[key] // LWW fallback per field
            else -> base[key]
        }
    }
    return merged
}

Application-Level Resolution: domain-specific logic. Inventory systems sum deltas. Collaborative editors use CRDTs. Financial transactions require explicit user resolution.

Handling Deletes

Physical deletion creates a re-creation problem: if one device deletes a record and another hasn't synced, the un-synced device will re-create it.

Solution: tombstones. Mark records as deleted with a deletedAt timestamp. Propagate the tombstone via sync. Purge tombstones older than 30 days.

data class Entity(
    val id: String,
    val data: Map<String, Any?>,
    val updatedAt: Long,
    val deletedAt: Long? = null  // null = alive, non-null = tombstone
)

Ordering Guarantees

Operations on the same entity must be applied in order. Operations on different entities can be applied in any order.

  1. Group pending operations by entity ID
  2. Sort each group by logical timestamp
  3. Send sequentially per entity, wait for acknowledgment
  4. Different entities can sync concurrently

Trade-offs

GainCost
Works offlineLocal database + operation log storage overhead
Instant UI feedbackEventual consistency, UI may show stale data
Resilient to network failuresConflict resolution complexity is domain-specific
Battery-friendly batchingSync delay means data is not immediately available on other devices

For real-time multiplayer games or live auctions, this architecture is wrong. Know which category your app falls into before committing.

Failure Modes

FailureMitigation
Network drops mid-syncIdempotent operations with operation ID as server-side idempotency key
App killed by OS during syncTransactional batches: local DB update + queue insertion in one transaction
Double-send of operationsMark as IN_FLIGHT during sync, reset to PENDING on failure
Permanently failing operationsDead letter queue after N retries for manual inspection
Clock skew between devicesLogical clocks instead of wall-clock timestamps
Tombstone not propagatedPeriodic full-state reconciliation as fallback

Scaling Considerations

  • Operation log growth: compact the log periodically. Merge consecutive updates to the same entity into a single operation
  • Large backlogs: if a device comes online after extended offline, paginate sync. Do not send 10,000 operations in one batch
  • Server-side fan-out: when multiple devices sync for the same user, the server must handle concurrent writes with proper locking or CAS (compare-and-swap)
  • Selective sync: not all entities need to be synced. Allow per-entity-type opt-in to reduce bandwidth and storage

Observability

Track these metrics to understand sync health in production:

  • Sync latency: time between local mutation and server acknowledgment
  • Queue depth: number of pending operations per device (alerts if consistently growing)
  • Conflict rate: percentage of sync operations that trigger conflict resolution
  • Failure rate: percentage of operations that enter the dead letter queue
  • Tombstone accumulation: count of active tombstones (indicates deletion patterns)

Instrument the sync engine to emit structured logs for each operation lifecycle: PENDING, IN_FLIGHT, SYNCED, FAILED, DEAD_LETTER.

Key Takeaways

  • Local database is the source of truth. The server is a peer that eventually catches up
  • Use logical clocks, not wall clocks
  • Conflict resolution strategy depends on the domain. Start with LWW, graduate to field-level merge when needed
  • Tombstones solve the delete propagation problem
  • Idempotency is non-negotiable. Every operation must be safe to retry
  • Start simple: local persistence, operation queue, LWW. Layer complexity as requirements demand

Further Reading

Final Thoughts

The best sync engines are invisible. The user edits data, puts the phone in a pocket, and everything converges. Building that experience requires careful thinking about operation logs, conflict resolution, ordering guarantees, and failure recovery.

Start with the minimum viable sync: local persistence, an operation queue, last-write-wins. Layer in field-level merging, smarter scheduling, and observability as usage patterns emerge. The architecture should grow with the product, not ahead of it.

Recommended