Skip to content

Conversation

@shubhammalhotra28
Copy link
Contributor

@shubhammalhotra28 shubhammalhotra28 commented Oct 16, 2025

Analytics Consolidation - Issue #69

Overview

This PR implements comprehensive analytics consolidation by moving ALL performance metrics calculation from the application layer to the SDK layer. The app now acts purely as a display layer, with zero analytics calculations happening in the application code.

Problem Statement

Previously, analytics responsibilities were split between SDK and app:

  • Streaming mode: App manually tracked metrics (~105 lines of calculation code)
  • Non-streaming mode: SDK provided complete metrics
  • Result: Inconsistent calculations, code duplication, potential inaccuracies

Solution

Consolidated all analytics into the SDK with the following approach:

1. Created TokenCounter for Accurate Estimation

  • New TokenCounter.swift with improved heuristics (accounts for punctuation, whitespace, newlines)
  • Replaces simple text.count / 4 estimation
  • Exposed via RunAnywhere.estimateTokenCount() for consistent namespace

2. Streaming with Built-in Analytics

  • New StreamingResult struct containing both token stream and metrics task
  • MetricsCollector actor for thread-safe metrics accumulation during streaming
  • Tracks: tokens/sec, time-to-first-token, thinking time, total latency
  • Metrics calculated in real-time as tokens are generated

3. API Simplification

  • Single generateStream() method returns StreamingResult (includes both stream + metrics)
  • Removed redundant methods from public API
  • All streaming now includes analytics automatically

4. Application Layer Cleanup

  • Removed ~105 lines of manual analytics tracking from ChatViewModel
  • Deleted collectMessageAnalytics() method entirely
  • App now only reads SDK metrics and reformats for display

Code Changes

Before (App Layer Manual Tracking)

// App was doing calculations
var startTime = Date()
var tokensPerSecondHistory: [Double] = []
var totalTokensReceived = 0

for try await token in stream {
    totalTokensReceived += 1
    // Manual speed calculations...
    if totalTokensReceived % 10 == 0 {
        let currentSpeed = Double(totalTokensReceived) / elapsed
        tokensPerSecondHistory.append(currentSpeed)
    }
}

let analytics = collectMessageAnalytics(...) // 70 lines of calculation

After (SDK Provides Everything)

// SDK handles all calculations
let streamingResult = try await RunAnywhere.generateStream(prompt, options)

for try await token in streamingResult.stream {
    // Just display - NO calculation
    fullResponse += token
}

// Get complete SDK metrics
let sdkResult = try await streamingResult.result.value
let analytics = analyticsFromGenerationResult(sdkResult, ...) // Just reformat

Key Architecture Changes

SDK Layer Additions

  • TokenCounter.swift - Accurate token estimation and speed calculations
  • StreamingResult struct - Container for stream + metrics task
  • MetricsCollector actor - Thread-safe metrics accumulation
  • Enhanced PerformanceMetrics with comprehensive tracking
  • Simplified public API via RunAnywhere.* namespace

Application Layer Simplifications

  • Removed all analytics calculation logic
  • App reads SDK-provided metrics and displays them
  • No manual token counting, speed calculations, or time tracking

Bug Fixes

Fixed AsyncStream Double-Consumption Crash

  • Error: Fatal error: attempt to await next() on more than one task
  • Root Cause: AsyncStream can only be consumed once; we were consuming in both UI and metrics calculation
  • Solution: Track metrics during token generation (not by re-consuming stream), signal completion via actor continuation

Benefits

Single Source of Truth: All metrics calculated in one place (SDK)
Consistency: Same calculations for streaming and non-streaming modes
Accuracy: Improved token counting with proper heuristics
Simplicity: App code reduced by ~105 lines
Thread Safety: Actor-based metrics collection
Clean API: One method, always includes metrics

Files Changed

SDK (11 files):

  • TokenCounter.swift (new) - Token estimation and speed calculations
  • GenerationResult.swift - Added StreamingResult struct
  • StreamingService.swift - Implemented streaming with metrics tracking
  • RunAnywhere.swift - Simplified API, added convenience methods
  • RunAnywhere+StructuredOutput.swift - Updated to use new API
  • PerformanceMetrics.swift - Enhanced metrics tracking
  • GenerationService.swift - Integrated TokenCounter
  • ThinkingTagPattern.swift - Minor adjustments
  • LLMHandler.swift - Voice capability updates
  • ModelInfo.swift - Model metadata enhancements

App (1 file):

  • ChatViewModel.swift - Removed manual analytics, uses SDK metrics exclusively

Testing

  • ✅ Build successful
  • ✅ All pre-commit hooks passed (SwiftLint, TODOs, merge conflicts)
  • ✅ Verified streaming displays tokens correctly
  • ✅ Verified analytics display in UI (per-message and modal)
  • ✅ No AsyncStream crashes
  • ✅ Token counts accurate with improved estimation

API Usage

Developers using the SDK now get analytics automatically:

// Single method call - analytics included
let streamingResult = try await RunAnywhere.generateStream(prompt, options)

// Display tokens in real-time
for try await token in streamingResult.stream {
    updateUI(with: token)
}

// Get complete analytics when done
let result = try await streamingResult.result.value
print("Tokens/sec: \(result.performanceMetrics.tokensPerSecond)")
print("Total tokens: \(result.tokensUsed)")
print("Time to first token: \(result.performanceMetrics.timeToFirstTokenMs)ms")

Migration Notes

For SDK users:

  • generateStream() now returns StreamingResult instead of AsyncThrowingStream
  • Access the stream via .stream property
  • Get metrics via .result task
  • No breaking changes to configuration or options

For app developers:

  • Remove any manual analytics tracking code
  • Read metrics from GenerationResult returned by SDK
  • Use RunAnywhere.estimateTokenCount() for token estimation

Related Issues

Closes #69

Summary by CodeRabbit

  • New Features

    • Added token counting utilities for text analysis.
    • Enhanced thinking content parsing and separation from response content.
    • Introduced comprehensive performance metrics tracking including thinking time and response time.
    • Improved streaming with integrated metrics collection.
  • Bug Fixes

    • Refined performance measurement accuracy across generation paths.

Moved ALL performance metrics calculation from app to SDK. Created TokenCounter for accurate token estimation, implemented streaming with built-in analytics via StreamingResult, removed ~105 lines of manual tracking from ChatViewModel. Fixed AsyncStream double-consumption bug using actor-based MetricsCollector.
@shubhammalhotra28
Copy link
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

Walkthrough

This PR implements SDK-level analytics infrastructure for text generation and streaming, replacing manual app-level tracking. It adds thinking time and token separation, introduces model-specific thinking patterns, provides a token counter utility, and integrates comprehensive metrics into generation results while maintaining streaming functionality.

Changes

Cohort / File(s) Summary
Analytics Data Model
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift, sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift
Adds thinkingTokens and responseTokens to GenerationResult; introduces StreamingResult struct combining token stream with async metrics result; expands PerformanceMetrics with six new timing fields (timeToFirstTokenMs, thinkingTimeMs, responseTimeMs, thinkingStartTimeMs, thinkingEndTimeMs, firstResponseTokenTimeMs).
Metrics Collection (Generation)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift
Adds thinking time tracking, per-model thinking pattern selection with fallback, TokenCounter-based token counting, responseTimeMs calculation, and thinkingTokens/responseTokens fields to GenerationResult.
Metrics Collection (Streaming)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
Introduces public generateStreamWithMetrics() method, implements internal MetricsCollector actor for accumulating streaming metrics (tokens, timing, thinking content, model/framework), tracks thinking duration separately, and returns final GenerationResult with performance metrics alongside token stream.
Token Utilities
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift
New public TokenCounter utility class with three static methods: estimateTokenCount() for text token estimation, calculateTokensPerSecond() for throughput, and splitTokenCounts() for separating thinking vs. response tokens.
Model Configuration
sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift, sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift
Adds optional thinkingPattern property to ModelInfo with model-aware pattern selection; adds Sendable conformance to ThinkingTagPattern for concurrency.
Voice Pipeline
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift
Integrates thinking-aware streaming with model-specific thinking pattern parsing; distinguishes thinking tokens from content tokens; excludes thinking tokens from TTS processing; returns only response content when thinking parsing is enabled.
Public API Surface
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift, sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift
Updates generate() to return GenerationResult instead of String; updates generateStream() to return StreamingResult instead of AsyncThrowingStream; adds public estimateTokenCount() utility; refactors structured output to use new result types.
App Integration
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift
Replaces manual analytics collection with SDK-provided metrics via new analyticsFromGenerationResult() helper; uses streamingResult from generateStream and final metrics from SDK rather than local accumulators; removes custom token timing and thinking-mode tracking logic.

Sequence Diagram(s)

sequenceDiagram
    participant App as ChatViewModel
    participant SDK as RunAnywhere SDK
    participant Gen as GenerationService
    participant Stream as StreamingService
    participant Counter as TokenCounter
    participant Collector as MetricsCollector

    alt Non-Streaming Path
        App->>SDK: generate(prompt)
        SDK->>Gen: generateText()
        Gen->>Counter: estimateTokenCount(text)
        Gen->>Counter: splitTokenCounts()
        Note over Gen: Track timing,<br/>thinkingTime, responseTime
        Gen-->>SDK: GenerationResult<br/>(with metrics)
        SDK-->>App: GenerationResult
        App->>App: analyticsFromGenerationResult()
    else Streaming Path
        App->>SDK: generateStream(prompt)
        SDK->>Stream: generateStreamWithMetrics()
        Stream->>Collector: create actor
        par Token Streaming
            Stream-->>App: stream<String>
            loop Each Token
                App->>Stream: iterate tokens
                Stream->>Collector: update metrics
            end
        and Async Metrics Collection
            Collector->>Counter: token counting
            Note over Collector: Accumulate thinking,<br/>timing, throughput
            Collector-->>Stream: GenerationResult
        end
        Stream-->>App: StreamingResult<br/>{stream, result}
        App->>App: await result.task
        App->>App: analyticsFromGenerationResult()
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes span multiple systems (generation, streaming, voice, public API) with significant logic density. Key concerns include: new MetricsCollector actor concurrency patterns, thinking token parsing across multiple services, breaking API changes to generate() and generateStream(), and token counting implementations. While individual files are moderately sized, the heterogeneous nature of changes across domains and the architectural shift from app-level to SDK-level analytics tracking demand careful cross-system validation.

Possibly related PRs

Suggested labels

ios-sdk

Poem

🐰 Thinking goes deep, tokens now tracked,
Metrics flow free from the SDK's pack,
Time measured fine, from thought to speech,
Analytics now within SDK's reach!
Voice and stream, all unified bright,
Every generation measured just right.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The provided description, while thorough, does not follow the repository’s required template sections and is missing the “Type of Change”, “Testing”, “Labels”, “Checklist”, and “Screenshots” headings that are mandated by the description template. Please restructure the PR description to include the template’s required sections: a brief Description, the Type of Change checklist, Testing checklist, Labels, general Checklist, and Screenshots sections.
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly captures the main change by indicating that metrics are consolidated in the SDK layer, it is directly related to the core objective of the PR, and it avoids extraneous details or unrelated terms.
Linked Issues Check ✅ Passed All coding objectives from linked issue #69 have been met: thinking-specific metrics were added to GenerationResult and PerformanceMetrics, ModelInfo gained configurable thinking patterns, both streaming and non-streaming services now track thinking and response timing and token counts, TokenCounter ensures accurate estimation, and LLMHandler was updated for voice pipeline parity.
Out of Scope Changes Check ✅ Passed Every code change directly supports the objectives of centralizing analytics in the SDK or adjusts dependent API surfaces (such as structured output helpers) to accommodate the new metrics-enabled interfaces, with no unrelated modifications included.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch shubham/issue_69

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)

153-176: Compile error: tuple declared with let but assigned later

finalText/thinkingContent are mutated; declare as var.

-        let (finalText, thinkingContent): (String, String?)
+        var (finalText, thinkingContent): (String, String?)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift (1)

800-869: Fix streaming analytics mode flag

analyticsFromGenerationResult always sets generationMode: .nonStreaming, but the streaming path reuses this helper. Every streamed message is therefore logged as non-streaming, breaking segmentation of streaming vs non-streaming analytics. Please thread the actual mode through the helper and set the field accordingly (e.g. add a generationMode parameter defaulting to .nonStreaming, pass .streaming from the streaming caller, and use that parameter when populating generationMode).

-    private func analyticsFromGenerationResult(
+    private func analyticsFromGenerationResult(
         _ result: GenerationResult,
         messageId: String,
         conversationId: String,
         startTime: Date,
         inputText: String,
-        wasInterrupted: Bool = false,
-        options: RunAnywhereGenerationOptions
+        wasInterrupted: Bool = false,
+        options: RunAnywhereGenerationOptions,
+        generationMode: MessageAnalytics.GenerationMode = .nonStreaming
     ) -> MessageAnalytics? {
 ...
-            tokensPerSecondHistory: [], // Not tracked in non-streaming
-            generationMode: .nonStreaming,
+            tokensPerSecondHistory: [],
+            generationMode: generationMode,
             contextWindowUsage: 0.0,
             generationParameters: generationParameters
         )
     }

And in the streaming path:

-                        let analytics = analyticsFromGenerationResult(
+                        let analytics = analyticsFromGenerationResult(
                             sdkResult,
                             messageId: messages[messageIndex].id.uuidString,
                             conversationId: conversationId,
                             startTime: Date(timeIntervalSinceNow: -(sdkResult.latencyMs / 1000)),
                             inputText: prompt,
-                            wasInterrupted: false,
-                            options: options
+                            wasInterrupted: false,
+                            options: options,
+                            generationMode: .streaming
                         )
🧹 Nitpick comments (11)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift (5)

6-16: Mark StreamToken as Sendable

StreamToken crosses concurrency boundaries (AsyncThrowingStream). Add Sendable to avoid strict-concurrency warnings.

-public struct StreamToken {
+public struct StreamToken: Sendable {

19-25: Make StructuredOutputStreamResult Sendable (conditional)

Expose as Sendable so it’s safe to pass across tasks. Use a conditional where T: Sendable.

-public struct StructuredOutputStreamResult<T: Generatable> {
+public struct StructuredOutputStreamResult<T: Generatable>: Sendable where T: Sendable {

130-137: Publish real metrics instead of zeros

You already have GenerationResult. Use its metrics in the event.

-            // Generate the text
-            let generationResult = try await RunAnywhere.generate(userPrompt, options: effectiveOptions)
+            // Generate the text (captures metrics)
+            let generationResult = try await RunAnywhere.generate(userPrompt, options: effectiveOptions)
...
-            events.publish(SDKGenerationEvent.completed(
-                response: "Structured output generated for \(String(describing: type))",
-                tokensUsed: 0,
-                latencyMs: 0
-            ))
+            events.publish(SDKGenerationEvent.completed(
+                response: "Structured output generated for \(String(describing: type))",
+                tokensUsed: generationResult.tokensUsed,
+                latencyMs: generationResult.latencyMs
+            ))

Also applies to: 138-143


191-221: Hook stream cancellation to producer Task

Avoid background work after consumer cancels.

-        let tokenStream = AsyncThrowingStream<StreamToken, Error> { continuation in
-            Task {
+        let tokenStream = AsyncThrowingStream<StreamToken, Error> { continuation in
+            let producer = Task {
                 do {
                     var tokenIndex = 0
@@
                     await accumulator.markComplete()
                     continuation.finish(throwing: error)
                 }
             }
+            continuation.onTermination = { _ in
+                producer.cancel()
+            }
         }

241-245: Use modern sleep API

Prefer Task.sleep(for:) in Swift 6.

-                        // Brief delay before retry
-                        try? await Task.sleep(nanoseconds: 100_000_000)
+                        // Brief delay before retry
+                        try? await Task.sleep(for: .milliseconds(100))
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (2)

41-43: Remove dead code

The Date() placeholder is unused.

-        // Start performance tracking
-        _ = Date() // Will be used for performance metrics in future
+        // Start performance tracking (handled in per-path implementations)

189-199: Minor: centralize metric math to avoid drift

tokensPerSecond/responseTimeMs duplicated across paths. Consider a small helper for consistency.

Also applies to: 200-217

sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (1)

85-92: Ensure onToken is serialized to avoid data races on buffer state

buffer/inThinkingSection are mutated across callbacks. If llmService calls onToken concurrently, this races.

  • Confirm onToken is invoked serially.
  • If not, guard with an actor or a serial DispatchQueue around parseStreamingToken.
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (2)

101-107: Brittle text extraction leaves thinking tags in result.text and skews counts

Using replacingOccurrences(of: thinkingContent, with: "") on fullText:

  • Leaves tags in result.text.
  • Risks accidental removals if thinking text appears in response.
  • Skews response token counts.

Recommended: accumulate responseText separately during streaming and feed that into GenerationResult and TokenCounter.

Example refactor (outline):

  • In MetricsCollector:
    • Add var responseText = ""
    • Add func appendResponse(_ chunk: String) { responseText += chunk }
    • Use responseText for result.text and responseContent in TokenCounter.splitTokenCounts.
  • In onToken (content case): Task { await collector.appendResponse(cleanToken) }

If you prefer a minimal change: also strip the known thinking tags when building result:

  • Pass thinkingPattern into MetricsCollector and remove opening/closing tags from fullText before counting.

I can provide a concrete patch either way.

Also applies to: 135-149


39-40: Unused state: tokenCount

tokenCount is maintained but never used for metrics. Consider removing to avoid confusion.

sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1)

17-40: Count words by all whitespace, not just spaces

Split by Character.isWhitespace to avoid undercounting when there are tabs/newlines/multiple spaces.

Apply this diff:

-        let wordCount = text.split(separator: " ").count
+        let wordCount = text.split(whereSeparator: { $0.isWhitespace }).count
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5d2d9e and 53dd5b9.

📒 Files selected for processing (11)
  • examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift (6 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift (1 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (3 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (3 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (3 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift (5 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift (2 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift (4 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift (1 hunks)
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{sdk/runanywhere-swift,examples/ios}/**/*.swift

📄 CodeRabbit inference engine (CLAUDE.md)

{sdk/runanywhere-swift,examples/ios}/**/*.swift: Always use the latest Swift 6 APIs and syntax (including modern concurrency features)
Do not use NSLock in Swift code

Files:

  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift
  • examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
sdk/runanywhere-swift/**/*.swift

📄 CodeRabbit inference engine (CLAUDE.md)

Use async/await for async APIs in the iOS SDK

Files:

  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
🧬 Code graph analysis (8)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
  • generate (37-105)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (2)
  • generate (373-412)
  • generateStream (438-456)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/StructuredOutput/Services/StructuredOutputHandler.swift (1)
  • parseStructuredOutput (85-99)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (1)
  • estimateTokenCount (692-694)
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/llm/GenerationResult.kt (1)
  • text (6-11)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift (1)
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/ui/models/components/EnhancedModelCard.kt (1)
  • tokensPerSecond (575-582)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
  • generate (37-105)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (1)
  • generateStreamWithMetrics (27-256)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1)
  • estimateTokenCount (8-40)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (2)
  • splitTokenCounts (49-63)
  • calculateTokensPerSecond (43-46)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/ThinkingParser.swift (1)
  • parse (18-48)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
  • getCurrentModel (27-29)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/ThinkingParser.swift (1)
  • parseStreamingToken (51-113)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Operations/StreamingTTSOperation.swift (1)
  • processToken (34-70)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (3)
  • generateStream (438-456)
  • generate (373-412)
  • estimateTokenCount (692-694)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1)
  • estimateTokenCount (8-40)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (5)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (2)
  • splitTokenCounts (49-63)
  • calculateTokensPerSecond (43-46)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationOptionsResolver.swift (2)
  • resolve (28-84)
  • preparePrompt (91-111)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
  • getCurrentModel (27-29)
sdk/runanywhere-swift/Sources/RunAnywhere/Components/LLM/LLMComponent.swift (1)
  • streamGenerate (470-508)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/ThinkingParser.swift (1)
  • parseStreamingToken (51-113)
🔇 Additional comments (6)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift (1)

4-4: Good addition: Sendable conformance

Matches Swift 6 concurrency expectations; safe to pass across tasks.

sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (1)

173-177: Return content-only is correct; emit final event with same

Looks good and aligns with thinking-aware streaming.

sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift (2)

3-20: StreamingResult API looks good

Pairing the token stream with a Task<GenerationResult, Error> is clean and ergonomic. Public init and types are appropriate.


60-99: GenerationResult token fields integration LGTM

thinkingTokens/responseTokens are well-integrated with sensible defaults; keeping the initializer internal preserves API control.

sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (2)

439-456: generateStream API change LGTM

Returning StreamingResult is a good simplification for callers and aligns with the new metrics pipeline.


676-695: Replace NSLock with an actor for device registration (guideline: no NSLock)

This file still uses NSLock for registration coordination (see Line 136). Per coding guidelines, prefer Swift 6 concurrency. Suggest an actor (e.g., RegistrationCoordinator) to serialize registration and expose async methods with await and backoff logic. I can provide a concrete refactor if you want.

As per coding guidelines

⛔ Skipped due to learnings
Learnt from: CR
PR: RunanywhereAI/runanywhere-sdks#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-07T00:42:41.855Z
Learning: Applies to {sdk/runanywhere-swift,examples/ios}/**/*.swift : Do not use NSLock in Swift code

Comment on lines 270 to 285
let modelInfo = loadedModel.model
let (finalText, thinkingContent): (String, String?)
var thinkingTimeMs: TimeInterval? = nil

if modelInfo.supportsThinking {
let pattern = ThinkingTagPattern.defaultPattern
// Use model-specific pattern or fall back to default
let pattern = modelInfo.thinkingPattern ?? ThinkingTagPattern.defaultPattern
let parseResult = ThinkingParser.parse(text: generatedText, pattern: pattern)
finalText = parseResult.content
thinkingContent = parseResult.thinkingContent

// Estimate thinking time if present
if thinkingContent != nil && !thinkingContent!.isEmpty {
thinkingTimeMs = latency * 0.6
}
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same tuple mutability bug in hybrid path

Make the tuple mutable here as well.

-        let (finalText, thinkingContent): (String, String?)
+        var (finalText, thinkingContent): (String, String?)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let modelInfo = loadedModel.model
let (finalText, thinkingContent): (String, String?)
var thinkingTimeMs: TimeInterval? = nil
if modelInfo.supportsThinking {
let pattern = ThinkingTagPattern.defaultPattern
// Use model-specific pattern or fall back to default
let pattern = modelInfo.thinkingPattern ?? ThinkingTagPattern.defaultPattern
let parseResult = ThinkingParser.parse(text: generatedText, pattern: pattern)
finalText = parseResult.content
thinkingContent = parseResult.thinkingContent
// Estimate thinking time if present
if thinkingContent != nil && !thinkingContent!.isEmpty {
thinkingTimeMs = latency * 0.6
}
} else {
let modelInfo = loadedModel.model
var (finalText, thinkingContent): (String, String?)
var thinkingTimeMs: TimeInterval? = nil
if modelInfo.supportsThinking {
// Use model-specific pattern or fall back to default
let pattern = modelInfo.thinkingPattern ?? ThinkingTagPattern.defaultPattern
let parseResult = ThinkingParser.parse(text: generatedText, pattern: pattern)
finalText = parseResult.content
thinkingContent = parseResult.thinkingContent
// Estimate thinking time if present
if thinkingContent != nil && !thinkingContent!.isEmpty {
thinkingTimeMs = latency * 0.6
}
} else {
🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift
around lines 270 to 285, the tuple (finalText, thinkingContent) is declared with
let inside the hybrid path making it immutable which prevents later updates;
change the declaration to var (finalText, thinkingContent): (String, String?) so
the tuple is mutable, allowing assignment to finalText and thinkingContent later
(mirror the mutable tuple used in the other code path), and ensure any
subsequent assignments remain valid.

Comment on lines +66 to +71
func recordError(_ err: Error) {
error = err
resultContinuation?.resume(throwing: err)
resultContinuation = nil
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Result task can hang on errors; ensure waitForResult throws immediately if an error occurred

If an error happens before waitForResult sets resultContinuation, recordError() resumes nothing and isComplete remains false. Later waiters will suspend forever.

Fix by marking completion on error and short‑circuiting in waitForResult:

             func recordError(_ err: Error) {
                 error = err
-                resultContinuation?.resume(throwing: err)
+                isComplete = true
+                resultContinuation?.resume(throwing: err)
                 resultContinuation = nil
             }
 
             func waitForResult() async throws -> GenerationResult {
-                // If already complete, return immediately
-                if isComplete, let modelName = modelName {
+                // If error occurred, throw immediately
+                if let err = error {
+                    throw err
+                }
+                // If already complete, return immediately
+                if isComplete, let modelName = modelName {
                     return buildResultSync(modelUsed: modelName, framework: framework)
                 }
 
                 // Otherwise, wait for completion
                 return try await withCheckedThrowingContinuation { continuation in
                     resultContinuation = continuation
                 }
             }

Also applies to: 85-95

🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
around lines 66-71 (and likewise address lines 85-95), recordError currently
sets error and resumes the continuation only if present which can leave
isComplete false and later waitForResult suspended forever; update recordError
to set isComplete = true, store the error, resume any existing
resultContinuation with throwing the error and clear it, and ensure any
subsequent waitForResult checks for an already-set error and immediately throws
instead of suspending; also guard against double-resume by clearing
resultContinuation after resuming.

Comment on lines +187 to +188
await collector.recordToken("", isThinking: false) // Initialize timing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TTFB skew: drop the pre-stream recordToken() call

Calling recordToken("") before any real token sets firstTokenTime immediately and inflates tokenCount. This makes time-to-first-token ≈ 0 and corrupts metrics.

Apply this diff:

-                    // Start timing
-                    await collector.recordToken("", isThinking: false) // Initialize timing
+                    // Start timing is set when first real token arrives; no-op here
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await collector.recordToken("", isThinking: false) // Initialize timing
// Start timing is set when first real token arrives; no-op here
🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
around lines 187-188, remove the pre-stream call to await
collector.recordToken("", isThinking: false) because it sets firstTokenTime and
increments tokenCount prematurely; instead ensure firstTokenTime and token
counting are only initialized when the first real token is recorded (delete this
dummy call or guard it so it does not set timing/counting).

Comment on lines +194 to +229
Task {
if shouldParseThinking {
// Parse token for thinking content
let (tokenType, cleanToken) = ThinkingParser.parseStreamingToken(
token: token,
pattern: thinkingPattern,
buffer: &buffer,
inThinkingSection: &inThinkingSection
)

// Track thinking content
if tokenType == .thinking, let thinkingToken = cleanToken {
accumulatedThinking += thinkingToken
}

// Only yield non-thinking tokens
if tokenType == .content, let cleanToken = cleanToken {
continuation.yield(cleanToken)
// Record metrics
await collector.recordToken(token, isThinking: inThinkingSection)

// Only yield non-thinking tokens
if tokenType == .content, let cleanToken = cleanToken {
continuation.yield(cleanToken)
}
} else {
// No thinking parsing
await collector.recordToken(token, isThinking: false)
continuation.yield(token)
}
} else {
// No thinking parsing, yield token as-is
continuation.yield(token)
}
}
)

// Record thinking content if any
if !accumulatedThinking.isEmpty {
await collector.recordThinkingEnd(accumulatedThinking)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix data race and preserve token order; record thinking-end when the tag closes

Spawning a Task per token and mutating buffer/inThinkingSection/accumulatedThinking inside those Tasks races and can yield tokens out of order. Also, thinkingEnd is recorded only after the stream ends, inflating thinkingTime.

Process parsing/yield synchronously; update metrics in a background Task; and mark thinking end immediately when closing tag is detected.

Apply this diff:

-                        onToken: { token in
-                            Task {
-                                if shouldParseThinking {
-                                    // Parse token for thinking content
-                                    let (tokenType, cleanToken) = ThinkingParser.parseStreamingToken(
-                                        token: token,
-                                        pattern: thinkingPattern,
-                                        buffer: &buffer,
-                                        inThinkingSection: &inThinkingSection
-                                    )
-
-                                    // Track thinking content
-                                    if tokenType == .thinking, let thinkingToken = cleanToken {
-                                        accumulatedThinking += thinkingToken
-                                    }
-
-                                    // Record metrics
-                                    await collector.recordToken(token, isThinking: inThinkingSection)
-
-                                    // Only yield non-thinking tokens
-                                    if tokenType == .content, let cleanToken = cleanToken {
-                                        continuation.yield(cleanToken)
-                                    }
-                                } else {
-                                    // No thinking parsing
-                                    await collector.recordToken(token, isThinking: false)
-                                    continuation.yield(token)
-                                }
-                            }
-                        }
+                        onToken: { token in
+                            if shouldParseThinking {
+                                let (tokenType, cleanToken) = ThinkingParser.parseStreamingToken(
+                                    token: token,
+                                    pattern: thinkingPattern,
+                                    buffer: &buffer,
+                                    inThinkingSection: &inThinkingSection
+                                )
+
+                                // Track thinking content
+                                if tokenType == .thinking, let thinkingToken = cleanToken {
+                                    accumulatedThinking += thinkingToken
+                                    // Mark end of thinking immediately on closing tag
+                                    Task { await collector.recordThinkingEnd(accumulatedThinking) }
+                                }
+
+                                // Yield first to preserve ordering
+                                if tokenType == .content, let cleanToken = cleanToken {
+                                    continuation.yield(cleanToken)
+                                }
+
+                                // Update metrics in background (non-blocking)
+                                Task { await collector.recordToken(token, isThinking: inThinkingSection) }
+                            } else {
+                                // No thinking parsing
+                                continuation.yield(token)
+                                Task { await collector.recordToken(token, isThinking: false) }
+                            }
+                        }
                     )
 
-                    // Record thinking content if any
-                    if !accumulatedThinking.isEmpty {
-                        await collector.recordThinkingEnd(accumulatedThinking)
-                    }
+                    // Thinking end recorded above on close; no-op here

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
around lines 194 to 229, spawning a Task per token causes data races and
out-of-order yields because buffer, inThinkingSection, and accumulatedThinking
are mutated concurrently and thinking-end is only recorded at stream end; fix by
performing token parsing, buffer/inThinkingSection/accumulatedThinking mutation,
and yielding synchronously on the current actor/thread (no per-token Task) so
token order is preserved and shared state is updated safely, dispatch metric
calls (collector.recordToken) to a background Task if needed, and detect the
closing thinking tag during synchronous parsing to immediately call
collector.recordThinkingEnd(accumulatedThinking) right when the tag closes (then
reset accumulatedThinking) instead of waiting for the stream to finish.

Comment on lines +27 to 29
public let thinkingPattern: ThinkingTagPattern? // Custom thinking pattern (if supportsThinking)

// Optional metadata
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Persisting thinkingPattern requires DB migration and JSON mapping

  • Add a migration for models.thinkingPattern (e.g., TEXT) before writes.
  • Ensure GRDB can encode/decode ThinkingTagPattern:
    • Option A: mark property with @DatabaseJSON var thinkingPattern: ThinkingTagPattern? (GRDB wrapper).
    • Option B: make ThinkingTagPattern conform to DatabaseValueConvertible and store JSON.

Would you like a migration + DatabaseValueConvertible snippet generated?

Also applies to: 62-63, 81-82, 112-114, 148-149



---

🏁 Script executed:

```shell
#!/bin/bash
# Locate and inspect ModelInfo.swift
find . -name "ModelInfo.swift" -type f | head -5

Length of output: 77


🏁 Script executed:

#!/bin/bash
# Search for ThinkingTagPattern definition
rg -n "struct ThinkingTagPattern|class ThinkingTagPattern|enum ThinkingTagPattern" --type swift

Length of output: 157


🏁 Script executed:

#!/bin/bash
# Check for GRDB migrations and schema definitions
fd -e swift -e sql | xargs rg -l "migration|CreateTable" | head -10

Length of output: 293


🏁 Script executed:

#!/bin/bash
# Search for DatabaseJSON usage patterns in the codebase
rg -n "@DatabaseJSON" --type swift

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for DatabaseValueConvertible usage
rg -n "DatabaseValueConvertible" --type swift

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Examine the actual ModelInfo.swift file to see all the mentioned lines
fd "ModelInfo.swift" -x cat {}

Length of output: 5158



---

🏁 Script executed:

```shell
#!/bin/bash
# Inspect the migration to see schema definition
cat sdk/runanywhere-swift/Sources/RunAnywhere/Data/Storage/Database/Migrations/Migration001_InitialSchema.swift

Length of output: 8568


🏁 Script executed:

#!/bin/bash
# Search for how metadata (also Codable) is handled - to find patterns
rg -n "metadata" sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift -A2 -B2

Length of output: 1076


🏁 Script executed:

#!/bin/bash
# Search for other nested Codable properties in GRDB entities
rg -n "struct.*Codable.*PersistableRecord|class.*Codable.*PersistableRecord" --type swift -A30 | head -80

Length of output: 9119


🏁 Script executed:

#!/bin/bash
# Check GRDB version in Package.swift or dependencies
fd -e swift -e toml | xargs rg -n "GRDB|grdb" | grep -i "version\|package\|dependency"

Length of output: 217


🏁 Script executed:

#!/bin/bash
# Search for how arrays/complex types are stored in other models
rg -n "compatibleFrameworks" sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift -B5 -A5

Length of output: 2321


Add thinkingPattern JSON column via migration

  • Create a new migration that alters the models table to add a thinkingPattern BLOB column (JSON) immediately after supportsThinking.
  • GRDB’s built-in Codable support (used for other nested types) will automatically serialize/deserialize ThinkingTagPattern; no custom wrapper is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor thinking tokens architecture for better model support and analytics

2 participants