# GSD: Asset Sync Retry Logic
## File: `cmms_import.go`

---

## WHAT TO ADD

When `SyncAllAssetsFromInnomaint` fetches an asset from Innomaint API and fails,
it currently logs the error and moves on. We need:

1. **Retry up to 3 times** with exponential backoff (1s → 2s → 4s)
2. **Track failed assets** in `ic3_sync_log` with error details
3. **Resume endpoint** — retry only previously failed assets
4. **Per-asset timeout** — 15s max per API call
5. **Context cancellation** — stop cleanly if server shuts down mid-sync

---

## STEP 1 — Add retry constants

At the top of `cmms_import.go` after existing constants, ADD:

```go
const (
    innomaintMaxRetries = 3
    innomaintRetryBase  = 1 * time.Second // 1s, 2s, 4s
    innomaintAPITimeout = 15 * time.Second
    innomaintRateMS     = 200 // ms between requests
)
```

---

## STEP 2 — Add FetchWithRetry function

ADD this new function after `FetchInnomaintAssetDetail`:

```go
// FetchInnomaintAssetDetailWithRetry wraps FetchInnomaintAssetDetail
// with up to innomaintMaxRetries attempts using exponential backoff.
func FetchInnomaintAssetDetailWithRetry(ctx context.Context, client *http.Client, serialnumber string) (*InnomaintAPIResponse, error) {
    var lastErr error

    for attempt := 0; attempt < innomaintMaxRetries; attempt++ {
        if attempt > 0 {
            // Exponential backoff: 1s, 2s, 4s
            wait := innomaintRetryBase * time.Duration(1<<(attempt-1))
            log.Printf("Retry %d/%d for %q — waiting %s (last err: %v)",
                attempt, innomaintMaxRetries, serialnumber, wait, lastErr)

            select {
            case <-time.After(wait):
            case <-ctx.Done():
                return nil, fmt.Errorf("context cancelled during retry wait: %w", ctx.Err())
            }
        }

        // Per-call timeout
        callCtx, cancel := context.WithTimeout(ctx, innomaintAPITimeout)
        resp, err := FetchInnomaintAssetDetail(callCtx, client, serialnumber)
        cancel()

        if err == nil {
            if attempt > 0 {
                log.Printf("Retry succeeded for %q on attempt %d", serialnumber, attempt+1)
            }
            return resp, nil
        }

        lastErr = err

        // Don't retry on context cancellation
        if ctx.Err() != nil {
            return nil, fmt.Errorf("context cancelled: %w", ctx.Err())
        }

        // Don't retry on 404 — asset doesn't exist in Innomaint
        if strings.Contains(err.Error(), "status 404") {
            return nil, fmt.Errorf("asset not found in innomaint (404): %w", err)
        }
    }

    return nil, fmt.Errorf("all %d attempts failed for %q: %w", innomaintMaxRetries, serialnumber, lastErr)
}
```

---

## STEP 3 — Update SyncAllAssetsFromInnomaint

Find `func (db *DB) SyncAllAssetsFromInnomaint` and replace it entirely:

```go
// SyncAllAssetsFromInnomaint fetches all (or batched) assets from Innomaint API
// and enriches ic3_asset_master + related tables.
// batchSize: 0 = all assets
// offsetStart: skip first N (for pagination/resume)
// retryFailed: true = only sync assets where last ic3_sync_log status='error'
func (db *DB) SyncAllAssetsFromInnomaint(ctx context.Context, batchSize, offsetStart int, retryFailed bool) (SyncResult, error) {
    start := time.Now()
    result := SyncResult{}

    var assetIDs []string
    var err error

    if retryFailed {
        // Only fetch assets that failed in last sync run
        assetIDs, err = db.getFailedSyncAssets(ctx, batchSize)
    } else {
        assetIDs, err = db.getSyncAssetIDs(ctx, batchSize, offsetStart)
    }
    if err != nil {
        return result, fmt.Errorf("fetch asset ids: %w", err)
    }

    result.Total = len(assetIDs)
    if result.Total == 0 {
        result.Duration = "0"
        return result, nil
    }

    log.Printf("Starting sync: %d assets (offset=%d, retryFailed=%v)", result.Total, offsetStart, retryFailed)

    client := &http.Client{
        Timeout: innomaintAPITimeout,
        Transport: &http.Transport{
            MaxIdleConns:        10,
            IdleConnTimeout:     30 * time.Second,
            DisableCompression:  false,
        },
    }

    for i, cmmsID := range assetIDs {
        // Check context before each asset
        select {
        case <-ctx.Done():
            log.Printf("Sync cancelled at asset %d/%d: %v", i+1, result.Total, ctx.Err())
            result.Duration = strconv.FormatInt(time.Since(start).Milliseconds(), 10)
            return result, nil
        default:
        }

        // Rate limit between requests
        if i > 0 {
            time.Sleep(time.Duration(innomaintRateMS) * time.Millisecond)
        }

        log.Printf("[%d/%d] Syncing: %s", i+1, result.Total, cmmsID)

        // Fetch with retry
        apiResp, fetchErr := FetchInnomaintAssetDetailWithRetry(ctx, client, cmmsID)
        if fetchErr != nil {
            result.Failed++
            errMsg := fetchErr.Error()
            result.Errors = append(result.Errors, fmt.Sprintf("%s: %s", cmmsID, errMsg))
            log.Printf("FAILED %s: %v", cmmsID, fetchErr)

            // Log failure to ic3_sync_log
            db.pool.Exec(ctx, `
                INSERT INTO ic3_sync_log (serialnumber, sync_type, status, error_message, synced_at)
                VALUES ($1, 'api_sync', 'error', $2, NOW())`,
                cmmsID, errMsg,
            )
            continue
        }

        // Enrich DB — each asset gets its own transaction
        tx, txErr := db.pool.Begin(ctx)
        if txErr != nil {
            result.Failed++
            result.Errors = append(result.Errors, fmt.Sprintf("%s: tx begin: %v", cmmsID, txErr))
            continue
        }

        enrichErr := db.EnrichAssetFromAPI(ctx, tx, apiResp)
        if enrichErr != nil {
            tx.Rollback(ctx)
            result.Failed++
            errMsg := enrichErr.Error()
            result.Errors = append(result.Errors, fmt.Sprintf("%s: enrich: %s", cmmsID, errMsg))
            log.Printf("ENRICH FAILED %s: %v", cmmsID, enrichErr)

            db.pool.Exec(ctx, `
                INSERT INTO ic3_sync_log (serialnumber, sync_type, status, error_message, synced_at)
                VALUES ($1, 'api_sync', 'error', $2, NOW())`,
                cmmsID, errMsg,
            )
            continue
        }

        if commitErr := tx.Commit(ctx); commitErr != nil {
            result.Failed++
            result.Errors = append(result.Errors, fmt.Sprintf("%s: commit: %v", cmmsID, commitErr))
            continue
        }

        result.Success++

        // Log success
        db.pool.Exec(ctx, `
            INSERT INTO ic3_sync_log (serialnumber, sync_type, status, synced_at)
            VALUES ($1, 'api_sync', 'success', NOW())`,
            cmmsID,
        )

        // Progress log every 50 assets
        if (i+1)%50 == 0 {
            log.Printf("Progress: %d/%d synced, %d failed", result.Success, result.Total, result.Failed)
        }
    }

    // Log batch summary to cmms_sync_log
    db.pool.Exec(ctx, `
        INSERT INTO cmms_sync_log (sync_type, status, assets_synced, duration_ms)
        VALUES ('api_sync', 'complete', $1, $2)`,
        result.Success,
        int(time.Since(start).Milliseconds()),
    )

    log.Printf("Sync complete: %d success, %d failed, %d skipped in %dms",
        result.Success, result.Failed, result.Skipped,
        time.Since(start).Milliseconds())

    result.Duration = strconv.FormatInt(time.Since(start).Milliseconds(), 10)
    return result, nil
}

// getSyncAssetIDs returns cmms_asset_ids in report_seq_no order
func (db *DB) getSyncAssetIDs(ctx context.Context, batchSize, offset int) ([]string, error) {
    limit := 9999
    if batchSize > 0 {
        limit = batchSize
    }
    rows, err := db.pool.Query(ctx, `
        SELECT cmms_asset_id FROM ic3_asset_master
        WHERE cmms_asset_id IS NOT NULL
        ORDER BY report_seq_no ASC NULLS LAST, asset_id ASC
        LIMIT $1 OFFSET $2`, limit, offset)
    if err != nil {
        return nil, err
    }
    defer rows.Close()
    var ids []string
    for rows.Next() {
        var id string
        rows.Scan(&id)
        ids = append(ids, id)
    }
    return ids, nil
}

// getFailedSyncAssets returns cmms_asset_ids that had errors in last sync
func (db *DB) getFailedSyncAssets(ctx context.Context, limit int) ([]string, error) {
    if limit <= 0 {
        limit = 9999
    }
    // Get assets whose last sync log entry was an error
    rows, err := db.pool.Query(ctx, `
        SELECT DISTINCT am.cmms_asset_id
        FROM ic3_asset_master am
        INNER JOIN (
            SELECT serialnumber, MAX(synced_at) as last_sync
            FROM ic3_sync_log
            WHERE sync_type = 'api_sync'
            GROUP BY serialnumber
        ) latest ON latest.serialnumber = am.cmms_asset_id
        INNER JOIN ic3_sync_log sl
            ON sl.serialnumber = am.cmms_asset_id
            AND sl.synced_at   = latest.last_sync
            AND sl.status      = 'error'
        WHERE am.cmms_asset_id IS NOT NULL
        ORDER BY am.cmms_asset_id
        LIMIT $1`, limit)
    if err != nil {
        return nil, err
    }
    defer rows.Close()
    var ids []string
    for rows.Next() {
        var id string
        rows.Scan(&id)
        ids = append(ids, id)
    }
    return ids, nil
}
```

---

## STEP 4 — Update syncInnomaintAssetsHandler in handlers.go

Find `syncInnomaintAssetsHandler` and replace:

```go
// POST /api/admin/cmms/sync?batch=100&offset=0&retry_failed=false
func syncInnomaintAssetsHandler(db *DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")

        q := r.URL.Query()
        batch, _       := strconv.Atoi(q.Get("batch"))
        offset, _      := strconv.Atoi(q.Get("offset"))
        retryFailed    := q.Get("retry_failed") == "true"

        if batch <= 0 {
            batch = 100
        }

        // Timeout: 300ms per asset + buffer
        timeout := time.Duration(batch)*300*time.Millisecond + 30*time.Second
        ctx, cancel := context.WithTimeout(r.Context(), timeout)
        defer cancel()

        result, err := db.SyncAllAssetsFromInnomaint(ctx, batch, offset, retryFailed)
        if err != nil {
            w.WriteHeader(500)
            json.NewEncoder(w).Encode(map[string]string{"error": err.Error()})
            return
        }

        w.WriteHeader(200)
        json.NewEncoder(w).Encode(result)
    }
}
```

---

## STEP 5 — Add retry-failed endpoint in handlers.go + main.go

In `handlers.go` ADD:

```go
// POST /api/admin/cmms/sync/retry-failed
// Retries only assets that failed in the last sync run
func retryFailedSyncHandler(db *DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")

        ctx, cancel := context.WithTimeout(r.Context(), 10*time.Minute)
        defer cancel()

        result, err := db.SyncAllAssetsFromInnomaint(ctx, 0, 0, true)
        if err != nil {
            w.WriteHeader(500)
            json.NewEncoder(w).Encode(map[string]string{"error": err.Error()})
            return
        }

        json.NewEncoder(w).Encode(result)
    }
}
```

In `main.go` ADD route:
```go
mux.HandleFunc("POST /api/admin/cmms/sync/retry-failed", retryFailedSyncHandler(db))
```

---

## STEP 6 — Add sync status endpoint

In `handlers.go` ADD:

```go
// GET /api/admin/cmms/sync/status
// Returns summary of last sync run
func syncStatusHandler(db *DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")

        var summary struct {
            TotalAssets   int        `json:"total_assets"`
            SyncedAssets  int        `json:"synced_assets"`
            FailedAssets  int        `json:"failed_assets"`
            NeverSynced   int        `json:"never_synced"`
            LastSyncAt    *time.Time `json:"last_sync_at"`
            LastBatchLog  []map[string]any `json:"last_batch"`
        }

        db.pool.QueryRow(r.Context(), `SELECT COUNT(*) FROM ic3_asset_master`).Scan(&summary.TotalAssets)

        db.pool.QueryRow(r.Context(), `
            SELECT COUNT(DISTINCT serialnumber) FROM ic3_sync_log
            WHERE sync_type='api_sync' AND status='success'`).Scan(&summary.SyncedAssets)

        db.pool.QueryRow(r.Context(), `
            SELECT COUNT(*) FROM ic3_asset_master am
            WHERE NOT EXISTS (
                SELECT 1 FROM ic3_sync_log sl
                WHERE sl.serialnumber = am.cmms_asset_id
                AND sl.sync_type = 'api_sync'
            )`).Scan(&summary.NeverSynced)

        // Failed = assets whose LAST sync was error
        db.pool.QueryRow(r.Context(), `
            SELECT COUNT(DISTINCT am.cmms_asset_id)
            FROM ic3_asset_master am
            INNER JOIN (
                SELECT serialnumber, MAX(synced_at) as last_sync
                FROM ic3_sync_log WHERE sync_type='api_sync'
                GROUP BY serialnumber
            ) latest ON latest.serialnumber = am.cmms_asset_id
            INNER JOIN ic3_sync_log sl
                ON sl.serialnumber = am.cmms_asset_id
                AND sl.synced_at = latest.last_sync
                AND sl.status = 'error'`).Scan(&summary.FailedAssets)

        // Last sync timestamp
        var lastSync time.Time
        if err := db.pool.QueryRow(r.Context(), `
            SELECT synced_at FROM cmms_sync_log
            WHERE sync_type='api_sync'
            ORDER BY synced_at DESC LIMIT 1`).Scan(&lastSync); err == nil {
            summary.LastSyncAt = &lastSync
        }

        // Last 5 batch log entries
        logs, _ := db.GetSyncLog(r.Context(), 5)
        summary.LastBatchLog = logs

        json.NewEncoder(w).Encode(summary)
    }
}
```

In `main.go` ADD route:
```go
mux.HandleFunc("GET /api/admin/cmms/sync/status", syncStatusHandler(db))
```

---

## USAGE AFTER APPLYING

```bash
# Normal batch sync
curl -X POST "http://localhost:9090/api/admin/cmms/sync?batch=100&offset=0" \
  -H "Authorization: Bearer <token>"

# Retry only failed assets
curl -X POST "http://localhost:9090/api/admin/cmms/sync/retry-failed" \
  -H "Authorization: Bearer <token>"

# Check sync status
curl "http://localhost:9090/api/admin/cmms/sync/status" \
  -H "Authorization: Bearer <token>"
```

---

## RETRY BEHAVIOR

```
Attempt 1 → fail → wait 1s
Attempt 2 → fail → wait 2s
Attempt 3 → fail → log to ic3_sync_log status='error' → move to next asset
```

After full batch, run retry-failed to catch transient network failures:
```bash
curl -X POST "http://localhost:9090/api/admin/cmms/sync/retry-failed" \
  -H "Authorization: Bearer <token>"
```

Expected: `{"total":N,"success":N,"failed":0,...}`
