Skip to content

Conversation

@BrandonWeng
Copy link
Member

dotnet run --project samples/SimpleStreamingCheck -- GPU
OpenVINO.NET GenAI - Simple Streaming Check
===========================================
Device: GPU

Prompt: Count to 5:
Streaming response:
Okay, the user wants me to count to 5. Let's see. If I were to count to 5, it would take 5 steps. But since I don't have a specific task or goal, I should focus on helping them. The user might be looking for a way to count or solve a problem. Maybe they need to count to 5? But the example given is about counting to 5. So, the task is to count to 5. Let

✓ Streaming works! Received 98 tokens

Non-streaming response:

Okay, the user wants me to count to 5. Let's see. If I count to 5, that means starting from 1 and adding 1 each time until reaching 5. So starting at 1, then 2, 3, 4, 5. So the first time is 1, then 2, 3, 4, 5. So the answer is 5. But maybe the user wants me to count to 5

✓ Both streaming and non-streaming work!

@BrandonWeng BrandonWeng merged commit bf84ef7 into main Aug 7, 2025
3 checks passed
@BrandonWeng BrandonWeng deleted the streaming-llm branch August 7, 2025 23:24
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

Performance Benchmark Results - Windows x64

Generated: 20250807-232624
Platform: win-x64

Exit Code: 0, Time: 20030ms):
OpenVINO.NET Quick Demo
=======================
Model: FluidInference/qwen3-0.6b-int4-ov-npu
Temperature: 0.7, Max Tokens: 100

✓ Using model from environment variable: D:\a\Fluid.OpenVINO.GenAI\Fluid.OpenVINO.GenAI/samples/QuickDemo/Models/qwen3-0.6b-int4-ov-npu
Device: CPU

Setting DLL directory to: D:\a\Fluid.OpenVINO.GenAI\Fluid.OpenVINO.GenAI/build/native/runtimes/win-x64/native
Prompt 1: "Explain quantum computing in simple terms:"
Response: "<think>
Okay, the user wants a simple explanation of quantum computing. Let me start by breaking down what quantum computing is. It's a field that uses quantum particles, like superposits or qubits, to perform calculations. The key here is that it can solve specific problems faster than classical methods. I should explain the basics without getting too technical. Maybe start with what quantum computing does instead of solving problems. Use a metaphor: solving a puzzle or playing chess. Use a simple example. Use"
Performance: 16.6 tokens/sec, First token: 2989ms, Memory: 0.1MB

Prompt 2: "Write a short poem about artificial intelligence:"
Response: "<think>
Okay, the user wants a short poem about artificial intelligence. Let me start by breaking down the key elements. I need to capture the blend of science and human elements. Maybe use some metaphors or familiar terms to make it relatable. Let's think about the structure: maybe four lines or a stanza. The topic is important here.

First, I should think about the main aspects of AI. It could be about its potential for good or its limitations. Maybe mention both the benefits and"
Performance: 31.4 tokens/sec, First token: 231ms, Memory: 0.1MB

Prompt 3: "What are the benefits of renewable energy?"
Response: "<think>
Okay, the user is asking about the benefits of renewable energy. First, I need to recall what renewable energy means. It's a sustainable source of energy that can help the planet and future generations. The user might be looking for information on the benefits of renewable energy. Let me think about this topic.

The main benefits of renewable energy include its role in reducing greenhouse emissions and combating climate change. It also contributes to reducing long-term costs for energy production. These are three main benefits. The"
Performance: 31.4 tokens/sec, First token: 230ms, Memory: 0.1MB

Prompt 4: "Describe the process of making coffee:"
Response: "<think>
Okay, the user wants a description of the process of making coffee. I need to recall the steps involved. First, beans are roasted and ground. Then, they are usually roasted by a roaster, perhaps in a different way. They need to roast them, maybe in a different way. They need to brew it, maybe in a different way. Then they need to heat it, maybe in a different way. Then they need to clean it, maybe in a different way. Then"
Performance: 32.2 tokens/sec, First token: 229ms, Memory: 0.0MB

Summary Table:
==============
Prompt   | Tokens/sec |  First Token | Tokens |  Time (ms) |  Memory (MB)
----------------------------------------------------------------------
      1  |       16.6 |       2989ms |     98 |       5902 |          0.1
      2  |       31.4 |        231ms |     96 |       3057 |          0.1
      3  |       31.4 |        230ms |     96 |       3058 |          0.1
      4  |       32.2 |        229ms |     98 |       3045 |          0.0
----------------------------------------------------------------------
Average  |       27.9 |        920ms |      - |          - |          0.1


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants