-
Notifications
You must be signed in to change notification settings - Fork 101
[iOS-SDK] Analytics consolidation - Move all metrics to SDK layer (#69) #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Moved ALL performance metrics calculation from app to SDK. Created TokenCounter for accurate token estimation, implemented streaming with built-in analytics via StreamingResult, removed ~105 lines of manual tracking from ChatViewModel. Fixed AsyncStream double-consumption bug using actor-based MetricsCollector.
|
@CodeRabbit review |
WalkthroughThis PR implements SDK-level analytics infrastructure for text generation and streaming, replacing manual app-level tracking. It adds thinking time and token separation, introduces model-specific thinking patterns, provides a token counter utility, and integrates comprehensive metrics into generation results while maintaining streaming functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant App as ChatViewModel
participant SDK as RunAnywhere SDK
participant Gen as GenerationService
participant Stream as StreamingService
participant Counter as TokenCounter
participant Collector as MetricsCollector
alt Non-Streaming Path
App->>SDK: generate(prompt)
SDK->>Gen: generateText()
Gen->>Counter: estimateTokenCount(text)
Gen->>Counter: splitTokenCounts()
Note over Gen: Track timing,<br/>thinkingTime, responseTime
Gen-->>SDK: GenerationResult<br/>(with metrics)
SDK-->>App: GenerationResult
App->>App: analyticsFromGenerationResult()
else Streaming Path
App->>SDK: generateStream(prompt)
SDK->>Stream: generateStreamWithMetrics()
Stream->>Collector: create actor
par Token Streaming
Stream-->>App: stream<String>
loop Each Token
App->>Stream: iterate tokens
Stream->>Collector: update metrics
end
and Async Metrics Collection
Collector->>Counter: token counting
Note over Collector: Accumulate thinking,<br/>timing, throughput
Collector-->>Stream: GenerationResult
end
Stream-->>App: StreamingResult<br/>{stream, result}
App->>App: await result.task
App->>App: analyticsFromGenerationResult()
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes span multiple systems (generation, streaming, voice, public API) with significant logic density. Key concerns include: new MetricsCollector actor concurrency patterns, thinking token parsing across multiple services, breaking API changes to Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
153-176: Compile error: tuple declared with let but assigned laterfinalText/thinkingContent are mutated; declare as var.
- let (finalText, thinkingContent): (String, String?) + var (finalText, thinkingContent): (String, String?)examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift (1)
800-869: Fix streaming analytics mode flag
analyticsFromGenerationResultalways setsgenerationMode: .nonStreaming, but the streaming path reuses this helper. Every streamed message is therefore logged as non-streaming, breaking segmentation of streaming vs non-streaming analytics. Please thread the actual mode through the helper and set the field accordingly (e.g. add agenerationModeparameter defaulting to.nonStreaming, pass.streamingfrom the streaming caller, and use that parameter when populatinggenerationMode).- private func analyticsFromGenerationResult( + private func analyticsFromGenerationResult( _ result: GenerationResult, messageId: String, conversationId: String, startTime: Date, inputText: String, - wasInterrupted: Bool = false, - options: RunAnywhereGenerationOptions + wasInterrupted: Bool = false, + options: RunAnywhereGenerationOptions, + generationMode: MessageAnalytics.GenerationMode = .nonStreaming ) -> MessageAnalytics? { ... - tokensPerSecondHistory: [], // Not tracked in non-streaming - generationMode: .nonStreaming, + tokensPerSecondHistory: [], + generationMode: generationMode, contextWindowUsage: 0.0, generationParameters: generationParameters ) }And in the streaming path:
- let analytics = analyticsFromGenerationResult( + let analytics = analyticsFromGenerationResult( sdkResult, messageId: messages[messageIndex].id.uuidString, conversationId: conversationId, startTime: Date(timeIntervalSinceNow: -(sdkResult.latencyMs / 1000)), inputText: prompt, - wasInterrupted: false, - options: options + wasInterrupted: false, + options: options, + generationMode: .streaming )
🧹 Nitpick comments (11)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift (5)
6-16: Mark StreamToken as SendableStreamToken crosses concurrency boundaries (AsyncThrowingStream). Add Sendable to avoid strict-concurrency warnings.
-public struct StreamToken { +public struct StreamToken: Sendable {
19-25: Make StructuredOutputStreamResult Sendable (conditional)Expose as Sendable so it’s safe to pass across tasks. Use a conditional where T: Sendable.
-public struct StructuredOutputStreamResult<T: Generatable> { +public struct StructuredOutputStreamResult<T: Generatable>: Sendable where T: Sendable {
130-137: Publish real metrics instead of zerosYou already have GenerationResult. Use its metrics in the event.
- // Generate the text - let generationResult = try await RunAnywhere.generate(userPrompt, options: effectiveOptions) + // Generate the text (captures metrics) + let generationResult = try await RunAnywhere.generate(userPrompt, options: effectiveOptions) ... - events.publish(SDKGenerationEvent.completed( - response: "Structured output generated for \(String(describing: type))", - tokensUsed: 0, - latencyMs: 0 - )) + events.publish(SDKGenerationEvent.completed( + response: "Structured output generated for \(String(describing: type))", + tokensUsed: generationResult.tokensUsed, + latencyMs: generationResult.latencyMs + ))Also applies to: 138-143
191-221: Hook stream cancellation to producer TaskAvoid background work after consumer cancels.
- let tokenStream = AsyncThrowingStream<StreamToken, Error> { continuation in - Task { + let tokenStream = AsyncThrowingStream<StreamToken, Error> { continuation in + let producer = Task { do { var tokenIndex = 0 @@ await accumulator.markComplete() continuation.finish(throwing: error) } } + continuation.onTermination = { _ in + producer.cancel() + } }
241-245: Use modern sleep APIPrefer Task.sleep(for:) in Swift 6.
- // Brief delay before retry - try? await Task.sleep(nanoseconds: 100_000_000) + // Brief delay before retry + try? await Task.sleep(for: .milliseconds(100))sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (2)
41-43: Remove dead codeThe Date() placeholder is unused.
- // Start performance tracking - _ = Date() // Will be used for performance metrics in future + // Start performance tracking (handled in per-path implementations)
189-199: Minor: centralize metric math to avoid drifttokensPerSecond/responseTimeMs duplicated across paths. Consider a small helper for consistency.
Also applies to: 200-217
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (1)
85-92: Ensure onToken is serialized to avoid data races on buffer statebuffer/inThinkingSection are mutated across callbacks. If llmService calls onToken concurrently, this races.
- Confirm onToken is invoked serially.
- If not, guard with an actor or a serial DispatchQueue around parseStreamingToken.
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (2)
101-107: Brittle text extraction leaves thinking tags in result.text and skews countsUsing replacingOccurrences(of: thinkingContent, with: "") on fullText:
- Leaves tags in result.text.
- Risks accidental removals if thinking text appears in response.
- Skews response token counts.
Recommended: accumulate responseText separately during streaming and feed that into GenerationResult and TokenCounter.
Example refactor (outline):
- In MetricsCollector:
- Add var responseText = ""
- Add func appendResponse(_ chunk: String) { responseText += chunk }
- Use responseText for result.text and responseContent in TokenCounter.splitTokenCounts.
- In onToken (content case): Task { await collector.appendResponse(cleanToken) }
If you prefer a minimal change: also strip the known thinking tags when building result:
- Pass thinkingPattern into MetricsCollector and remove opening/closing tags from fullText before counting.
I can provide a concrete patch either way.
Also applies to: 135-149
39-40: Unused state: tokenCounttokenCount is maintained but never used for metrics. Consider removing to avoid confusion.
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1)
17-40: Count words by all whitespace, not just spacesSplit by Character.isWhitespace to avoid undercounting when there are tabs/newlines/multiple spaces.
Apply this diff:
- let wordCount = text.split(separator: " ").count + let wordCount = text.split(whereSeparator: { $0.isWhitespace }).count
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift(6 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift(1 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift(3 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift(3 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift(1 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift(3 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift(5 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift(2 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift(4 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift(1 hunks)sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{sdk/runanywhere-swift,examples/ios}/**/*.swift
📄 CodeRabbit inference engine (CLAUDE.md)
{sdk/runanywhere-swift,examples/ios}/**/*.swift: Always use the latest Swift 6 APIs and syntax (including modern concurrency features)
Do not use NSLock in Swift code
Files:
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
sdk/runanywhere-swift/**/*.swift
📄 CodeRabbit inference engine (CLAUDE.md)
Use async/await for async APIs in the iOS SDK
Files:
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
🧬 Code graph analysis (8)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/RunAnywhere+StructuredOutput.swift (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
generate(37-105)sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (2)
generate(373-412)generateStream(438-456)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/StructuredOutput/Services/StructuredOutputHandler.swift (1)
parseStructuredOutput(85-99)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (1)
estimateTokenCount(692-694)examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/llm/GenerationResult.kt (1)
text(6-11)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/PerformanceMetrics.swift (1)
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/ui/models/components/EnhancedModelCard.kt (1)
tokensPerSecond(575-582)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
generate(37-105)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (1)
generateStreamWithMetrics(27-256)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1)
estimateTokenCount(8-40)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (2)
splitTokenCounts(49-63)calculateTokensPerSecond(43-46)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/ThinkingParser.swift (1)
parse(18-48)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
getCurrentModel(27-29)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/ThinkingParser.swift (1)
parseStreamingToken(51-113)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Operations/StreamingTTSOperation.swift (1)
processToken(34-70)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ChatViewModel.swift (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (3)
generateStream(438-456)generate(373-412)estimateTokenCount(692-694)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (1)
estimateTokenCount(8-40)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift (5)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/TokenCounter.swift (2)
splitTokenCounts(49-63)calculateTokensPerSecond(43-46)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationOptionsResolver.swift (2)
resolve(28-84)preparePrompt(91-111)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift (1)
getCurrentModel(27-29)sdk/runanywhere-swift/Sources/RunAnywhere/Components/LLM/LLMComponent.swift (1)
streamGenerate(470-508)sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/ThinkingParser.swift (1)
parseStreamingToken(51-113)
🔇 Additional comments (6)
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Models/ThinkingTagPattern.swift (1)
4-4: Good addition: Sendable conformanceMatches Swift 6 concurrency expectations; safe to pass across tasks.
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/Voice/Handlers/LLMHandler.swift (1)
173-177: Return content-only is correct; emit final event with sameLooks good and aligns with thinking-aware streaming.
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Models/GenerationResult.swift (2)
3-20: StreamingResult API looks goodPairing the token stream with a Task<GenerationResult, Error> is clean and ergonomic. Public init and types are appropriate.
60-99: GenerationResult token fields integration LGTMthinkingTokens/responseTokens are well-integrated with sensible defaults; keeping the initializer internal preserves API control.
sdk/runanywhere-swift/Sources/RunAnywhere/Public/RunAnywhere.swift (2)
439-456: generateStream API change LGTMReturning StreamingResult is a good simplification for callers and aligns with the new metrics pipeline.
676-695: Replace NSLock with an actor for device registration (guideline: no NSLock)This file still uses NSLock for registration coordination (see Line 136). Per coding guidelines, prefer Swift 6 concurrency. Suggest an actor (e.g., RegistrationCoordinator) to serialize registration and expose async methods with await and backoff logic. I can provide a concrete refactor if you want.
As per coding guidelines
⛔ Skipped due to learnings
Learnt from: CR PR: RunanywhereAI/runanywhere-sdks#0 File: CLAUDE.md:0-0 Timestamp: 2025-10-07T00:42:41.855Z Learning: Applies to {sdk/runanywhere-swift,examples/ios}/**/*.swift : Do not use NSLock in Swift code
| let modelInfo = loadedModel.model | ||
| let (finalText, thinkingContent): (String, String?) | ||
| var thinkingTimeMs: TimeInterval? = nil | ||
|
|
||
| if modelInfo.supportsThinking { | ||
| let pattern = ThinkingTagPattern.defaultPattern | ||
| // Use model-specific pattern or fall back to default | ||
| let pattern = modelInfo.thinkingPattern ?? ThinkingTagPattern.defaultPattern | ||
| let parseResult = ThinkingParser.parse(text: generatedText, pattern: pattern) | ||
| finalText = parseResult.content | ||
| thinkingContent = parseResult.thinkingContent | ||
|
|
||
| // Estimate thinking time if present | ||
| if thinkingContent != nil && !thinkingContent!.isEmpty { | ||
| thinkingTimeMs = latency * 0.6 | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same tuple mutability bug in hybrid path
Make the tuple mutable here as well.
- let (finalText, thinkingContent): (String, String?)
+ var (finalText, thinkingContent): (String, String?)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let modelInfo = loadedModel.model | |
| let (finalText, thinkingContent): (String, String?) | |
| var thinkingTimeMs: TimeInterval? = nil | |
| if modelInfo.supportsThinking { | |
| let pattern = ThinkingTagPattern.defaultPattern | |
| // Use model-specific pattern or fall back to default | |
| let pattern = modelInfo.thinkingPattern ?? ThinkingTagPattern.defaultPattern | |
| let parseResult = ThinkingParser.parse(text: generatedText, pattern: pattern) | |
| finalText = parseResult.content | |
| thinkingContent = parseResult.thinkingContent | |
| // Estimate thinking time if present | |
| if thinkingContent != nil && !thinkingContent!.isEmpty { | |
| thinkingTimeMs = latency * 0.6 | |
| } | |
| } else { | |
| let modelInfo = loadedModel.model | |
| var (finalText, thinkingContent): (String, String?) | |
| var thinkingTimeMs: TimeInterval? = nil | |
| if modelInfo.supportsThinking { | |
| // Use model-specific pattern or fall back to default | |
| let pattern = modelInfo.thinkingPattern ?? ThinkingTagPattern.defaultPattern | |
| let parseResult = ThinkingParser.parse(text: generatedText, pattern: pattern) | |
| finalText = parseResult.content | |
| thinkingContent = parseResult.thinkingContent | |
| // Estimate thinking time if present | |
| if thinkingContent != nil && !thinkingContent!.isEmpty { | |
| thinkingTimeMs = latency * 0.6 | |
| } | |
| } else { |
🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/GenerationService.swift
around lines 270 to 285, the tuple (finalText, thinkingContent) is declared with
let inside the hybrid path making it immutable which prevents later updates;
change the declaration to var (finalText, thinkingContent): (String, String?) so
the tuple is mutable, allowing assignment to finalText and thinkingContent later
(mirror the mutable tuple used in the other code path), and ensure any
subsequent assignments remain valid.
| func recordError(_ err: Error) { | ||
| error = err | ||
| resultContinuation?.resume(throwing: err) | ||
| resultContinuation = nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result task can hang on errors; ensure waitForResult throws immediately if an error occurred
If an error happens before waitForResult sets resultContinuation, recordError() resumes nothing and isComplete remains false. Later waiters will suspend forever.
Fix by marking completion on error and short‑circuiting in waitForResult:
func recordError(_ err: Error) {
error = err
- resultContinuation?.resume(throwing: err)
+ isComplete = true
+ resultContinuation?.resume(throwing: err)
resultContinuation = nil
}
func waitForResult() async throws -> GenerationResult {
- // If already complete, return immediately
- if isComplete, let modelName = modelName {
+ // If error occurred, throw immediately
+ if let err = error {
+ throw err
+ }
+ // If already complete, return immediately
+ if isComplete, let modelName = modelName {
return buildResultSync(modelUsed: modelName, framework: framework)
}
// Otherwise, wait for completion
return try await withCheckedThrowingContinuation { continuation in
resultContinuation = continuation
}
}Also applies to: 85-95
🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
around lines 66-71 (and likewise address lines 85-95), recordError currently
sets error and resumes the continuation only if present which can leave
isComplete false and later waitForResult suspended forever; update recordError
to set isComplete = true, store the error, resume any existing
resultContinuation with throwing the error and clear it, and ensure any
subsequent waitForResult checks for an already-set error and immediately throws
instead of suspending; also guard against double-resume by clearing
resultContinuation after resuming.
| await collector.recordToken("", isThinking: false) // Initialize timing | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTFB skew: drop the pre-stream recordToken() call
Calling recordToken("") before any real token sets firstTokenTime immediately and inflates tokenCount. This makes time-to-first-token ≈ 0 and corrupts metrics.
Apply this diff:
- // Start timing
- await collector.recordToken("", isThinking: false) // Initialize timing
+ // Start timing is set when first real token arrives; no-op here📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await collector.recordToken("", isThinking: false) // Initialize timing | |
| // Start timing is set when first real token arrives; no-op here | |
🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
around lines 187-188, remove the pre-stream call to await
collector.recordToken("", isThinking: false) because it sets firstTokenTime and
increments tokenCount prematurely; instead ensure firstTokenTime and token
counting are only initialized when the first real token is recorded (delete this
dummy call or guard it so it does not set timing/counting).
| Task { | ||
| if shouldParseThinking { | ||
| // Parse token for thinking content | ||
| let (tokenType, cleanToken) = ThinkingParser.parseStreamingToken( | ||
| token: token, | ||
| pattern: thinkingPattern, | ||
| buffer: &buffer, | ||
| inThinkingSection: &inThinkingSection | ||
| ) | ||
|
|
||
| // Track thinking content | ||
| if tokenType == .thinking, let thinkingToken = cleanToken { | ||
| accumulatedThinking += thinkingToken | ||
| } | ||
|
|
||
| // Only yield non-thinking tokens | ||
| if tokenType == .content, let cleanToken = cleanToken { | ||
| continuation.yield(cleanToken) | ||
| // Record metrics | ||
| await collector.recordToken(token, isThinking: inThinkingSection) | ||
|
|
||
| // Only yield non-thinking tokens | ||
| if tokenType == .content, let cleanToken = cleanToken { | ||
| continuation.yield(cleanToken) | ||
| } | ||
| } else { | ||
| // No thinking parsing | ||
| await collector.recordToken(token, isThinking: false) | ||
| continuation.yield(token) | ||
| } | ||
| } else { | ||
| // No thinking parsing, yield token as-is | ||
| continuation.yield(token) | ||
| } | ||
| } | ||
| ) | ||
|
|
||
| // Record thinking content if any | ||
| if !accumulatedThinking.isEmpty { | ||
| await collector.recordThinkingEnd(accumulatedThinking) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix data race and preserve token order; record thinking-end when the tag closes
Spawning a Task per token and mutating buffer/inThinkingSection/accumulatedThinking inside those Tasks races and can yield tokens out of order. Also, thinkingEnd is recorded only after the stream ends, inflating thinkingTime.
Process parsing/yield synchronously; update metrics in a background Task; and mark thinking end immediately when closing tag is detected.
Apply this diff:
- onToken: { token in
- Task {
- if shouldParseThinking {
- // Parse token for thinking content
- let (tokenType, cleanToken) = ThinkingParser.parseStreamingToken(
- token: token,
- pattern: thinkingPattern,
- buffer: &buffer,
- inThinkingSection: &inThinkingSection
- )
-
- // Track thinking content
- if tokenType == .thinking, let thinkingToken = cleanToken {
- accumulatedThinking += thinkingToken
- }
-
- // Record metrics
- await collector.recordToken(token, isThinking: inThinkingSection)
-
- // Only yield non-thinking tokens
- if tokenType == .content, let cleanToken = cleanToken {
- continuation.yield(cleanToken)
- }
- } else {
- // No thinking parsing
- await collector.recordToken(token, isThinking: false)
- continuation.yield(token)
- }
- }
- }
+ onToken: { token in
+ if shouldParseThinking {
+ let (tokenType, cleanToken) = ThinkingParser.parseStreamingToken(
+ token: token,
+ pattern: thinkingPattern,
+ buffer: &buffer,
+ inThinkingSection: &inThinkingSection
+ )
+
+ // Track thinking content
+ if tokenType == .thinking, let thinkingToken = cleanToken {
+ accumulatedThinking += thinkingToken
+ // Mark end of thinking immediately on closing tag
+ Task { await collector.recordThinkingEnd(accumulatedThinking) }
+ }
+
+ // Yield first to preserve ordering
+ if tokenType == .content, let cleanToken = cleanToken {
+ continuation.yield(cleanToken)
+ }
+
+ // Update metrics in background (non-blocking)
+ Task { await collector.recordToken(token, isThinking: inThinkingSection) }
+ } else {
+ // No thinking parsing
+ continuation.yield(token)
+ Task { await collector.recordToken(token, isThinking: false) }
+ }
+ }
)
- // Record thinking content if any
- if !accumulatedThinking.isEmpty {
- await collector.recordThinkingEnd(accumulatedThinking)
- }
+ // Thinking end recorded above on close; no-op hereCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
sdk/runanywhere-swift/Sources/RunAnywhere/Capabilities/TextGeneration/Services/StreamingService.swift
around lines 194 to 229, spawning a Task per token causes data races and
out-of-order yields because buffer, inThinkingSection, and accumulatedThinking
are mutated concurrently and thinking-end is only recorded at stream end; fix by
performing token parsing, buffer/inThinkingSection/accumulatedThinking mutation,
and yielding synchronously on the current actor/thread (no per-token Task) so
token order is preserved and shared state is updated safely, dispatch metric
calls (collector.recordToken) to a background Task if needed, and detect the
closing thinking tag during synchronous parsing to immediately call
collector.recordThinkingEnd(accumulatedThinking) right when the tag closes (then
reset accumulatedThinking) instead of waiting for the stream to finish.
| public let thinkingPattern: ThinkingTagPattern? // Custom thinking pattern (if supportsThinking) | ||
|
|
||
| // Optional metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Persisting thinkingPattern requires DB migration and JSON mapping
- Add a migration for models.thinkingPattern (e.g., TEXT) before writes.
- Ensure GRDB can encode/decode ThinkingTagPattern:
- Option A: mark property with @DatabaseJSON var thinkingPattern: ThinkingTagPattern? (GRDB wrapper).
- Option B: make ThinkingTagPattern conform to DatabaseValueConvertible and store JSON.
Would you like a migration + DatabaseValueConvertible snippet generated?
Also applies to: 62-63, 81-82, 112-114, 148-149
---
🏁 Script executed:
```shell
#!/bin/bash
# Locate and inspect ModelInfo.swift
find . -name "ModelInfo.swift" -type f | head -5
Length of output: 77
🏁 Script executed:
#!/bin/bash
# Search for ThinkingTagPattern definition
rg -n "struct ThinkingTagPattern|class ThinkingTagPattern|enum ThinkingTagPattern" --type swiftLength of output: 157
🏁 Script executed:
#!/bin/bash
# Check for GRDB migrations and schema definitions
fd -e swift -e sql | xargs rg -l "migration|CreateTable" | head -10Length of output: 293
🏁 Script executed:
#!/bin/bash
# Search for DatabaseJSON usage patterns in the codebase
rg -n "@DatabaseJSON" --type swiftLength of output: 0
🏁 Script executed:
#!/bin/bash
# Search for DatabaseValueConvertible usage
rg -n "DatabaseValueConvertible" --type swiftLength of output: 0
🏁 Script executed:
#!/bin/bash
# Examine the actual ModelInfo.swift file to see all the mentioned lines
fd "ModelInfo.swift" -x cat {}Length of output: 5158
---
🏁 Script executed:
```shell
#!/bin/bash
# Inspect the migration to see schema definition
cat sdk/runanywhere-swift/Sources/RunAnywhere/Data/Storage/Database/Migrations/Migration001_InitialSchema.swift
Length of output: 8568
🏁 Script executed:
#!/bin/bash
# Search for how metadata (also Codable) is handled - to find patterns
rg -n "metadata" sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift -A2 -B2Length of output: 1076
🏁 Script executed:
#!/bin/bash
# Search for other nested Codable properties in GRDB entities
rg -n "struct.*Codable.*PersistableRecord|class.*Codable.*PersistableRecord" --type swift -A30 | head -80Length of output: 9119
🏁 Script executed:
#!/bin/bash
# Check GRDB version in Package.swift or dependencies
fd -e swift -e toml | xargs rg -n "GRDB|grdb" | grep -i "version\|package\|dependency"Length of output: 217
🏁 Script executed:
#!/bin/bash
# Search for how arrays/complex types are stored in other models
rg -n "compatibleFrameworks" sdk/runanywhere-swift/Sources/RunAnywhere/Core/Models/Model/ModelInfo.swift -B5 -A5Length of output: 2321
Add thinkingPattern JSON column via migration
- Create a new migration that alters the
modelstable to add athinkingPatternBLOB column (JSON) immediately aftersupportsThinking. - GRDB’s built-in Codable support (used for other nested types) will automatically serialize/deserialize
ThinkingTagPattern; no custom wrapper is required.
Analytics Consolidation - Issue #69
Overview
This PR implements comprehensive analytics consolidation by moving ALL performance metrics calculation from the application layer to the SDK layer. The app now acts purely as a display layer, with zero analytics calculations happening in the application code.
Problem Statement
Previously, analytics responsibilities were split between SDK and app:
Solution
Consolidated all analytics into the SDK with the following approach:
1. Created TokenCounter for Accurate Estimation
TokenCounter.swiftwith improved heuristics (accounts for punctuation, whitespace, newlines)text.count / 4estimationRunAnywhere.estimateTokenCount()for consistent namespace2. Streaming with Built-in Analytics
StreamingResultstruct containing both token stream and metrics taskMetricsCollectoractor for thread-safe metrics accumulation during streaming3. API Simplification
generateStream()method returnsStreamingResult(includes both stream + metrics)4. Application Layer Cleanup
ChatViewModelcollectMessageAnalytics()method entirelyCode Changes
Before (App Layer Manual Tracking)
After (SDK Provides Everything)
Key Architecture Changes
SDK Layer Additions
TokenCounter.swift- Accurate token estimation and speed calculationsStreamingResultstruct - Container for stream + metrics taskMetricsCollectoractor - Thread-safe metrics accumulationPerformanceMetricswith comprehensive trackingRunAnywhere.*namespaceApplication Layer Simplifications
Bug Fixes
Fixed AsyncStream Double-Consumption Crash
Fatal error: attempt to await next() on more than one taskBenefits
✅ Single Source of Truth: All metrics calculated in one place (SDK)
✅ Consistency: Same calculations for streaming and non-streaming modes
✅ Accuracy: Improved token counting with proper heuristics
✅ Simplicity: App code reduced by ~105 lines
✅ Thread Safety: Actor-based metrics collection
✅ Clean API: One method, always includes metrics
Files Changed
SDK (11 files):
TokenCounter.swift(new) - Token estimation and speed calculationsGenerationResult.swift- AddedStreamingResultstructStreamingService.swift- Implemented streaming with metrics trackingRunAnywhere.swift- Simplified API, added convenience methodsRunAnywhere+StructuredOutput.swift- Updated to use new APIPerformanceMetrics.swift- Enhanced metrics trackingGenerationService.swift- Integrated TokenCounterThinkingTagPattern.swift- Minor adjustmentsLLMHandler.swift- Voice capability updatesModelInfo.swift- Model metadata enhancementsApp (1 file):
ChatViewModel.swift- Removed manual analytics, uses SDK metrics exclusivelyTesting
API Usage
Developers using the SDK now get analytics automatically:
Migration Notes
For SDK users:
generateStream()now returnsStreamingResultinstead ofAsyncThrowingStream.streamproperty.resulttaskFor app developers:
GenerationResultreturned by SDKRunAnywhere.estimateTokenCount()for token estimationRelated Issues
Closes #69
Summary by CodeRabbit
New Features
Bug Fixes