Bot Analytics v4.2 Proposal

Priority 0: Address Bot Channel Service Top 4 Scenarios

We regularly receive issues about bots having issues and customers are left scratching their heads. The following are considered the top issues that cover a high percentage of issues encountered by customers' bots through the lens of the Channel Service:

500 errors on bot
Timeout (from channel to bot)
Bot Never responds
Bot works in ABS (Web Chat Test) but doesn’t work in channel X

In order to help customers quickly diagnose these issues (and enable UI/tools to surface to customers), the approach is to enable Application Insights for the Bot and beef up event data sent from the Channel Service (into the customer's Application Insights container) to completely diagnose issues.

Here is the proposal (Note: Ownership is shared across teams):

(This proposal was sent to Yochay/DanDris/Artur/Carlos/etc on a different thread)

Item	Owner	Description
Enable ABS to enable Application Insights	Bot Portal Team	Currently, if you provision a bot through the portal and opt into Application Insights and deploy the ABS template, Application Insights is not configured correctly. The portal team believes there's a way to enable this (bot configuration and appsettings.json)
Enable W3C distributed tracing	Bot Service Team	To correlate queries with Bot application insights Update! Craig claims this should be working.
Verify integration with Distributed Tracing	daveta/Bot Service Team	Need to test and verify operation id's are being transferred and correlated correctly. Ideally we set up an integration environment for testing.
Ensure Request/Dependency/Event/Trace telemetry being logged in Bot and correlated	daveta	Ensure the appropriate initializers installed.
Ensure ILogger telemetry being logged in Application Insights Trace	daveta	Application Insights has ability to intercept ILogger events.
Log Http Status code	Bot Service Team	To diagnose 500 errors on bot.
Log error on Channels into customer App Insights container	Bot Service Team	To diagnose never respond errors, and Channel X errors
Queries/Documentation to detect 500 errors	daveta/Bot Service Team	Once we have correlated data, we can write some documentation to demonstrate how to diagnose. For true "500" bot errors, we can demonstrate with existing tools and some queries. For Channel "500" errors, this will be a little more difficult depending where it fails within the channel. (Stretch) If 500 error is due to external dependency, clearly state the dependency and timeout in a query result. For example, if CosmosDB is timing out, it should state so.
Queries/Documentation to detect Bot Timeout errors	daveta/Bot Service Team	Once we have correlated data, we can write some documentation to demonstrate how to diagnose.
Queries/Documentation to detect Bot Never Responds errors	daveta/Bot Service Team	Once we have correlated data, we can write some documentation to demonstrate how to diagnose.
Queries/Documentation to detect Bot Works in Web Chat but not in Channel X	daveta/Bot Service Team	Once we have correlated data, we can write some documentation to demonstrate how to diagnose.

Priority 1: LUIS/QnA User Agent String

(Yochay's addition - John Taylor (?) doing work). Enable telemetry to understand usage of QnA and LUIS and be able to answer "How many distinct bots are using QnA or LUIS?"

(From Chris Mullins: Distinct? From Jim Lewallen: If we can! There's no PII concerns. From Carlos Castro: What about Http Headers?)

Here is the proposal:

Item	Deliverable	Description
LUIS Recognizer modifications	Product	Modify the recognizer user agent string
QnA Recognizer modifications	Product	Modify the recognizer user agent string

Priority 1: Enable Out-of-the-box reports

See the list of reports that should be enabled for bots by enabling Application Insights.

Here is the proposal:

Item	Deliverable	Description
Simplify Application Insights registration for Bot Component	Product	Application Insights is a train wreck configuring. Customers need carnal knowledge of their startup sequences which is terrible. Create a platform-level appropriate integration for ASP.Net Core/WebAPI/Node.js. Should be usable with MVC patterns. (Some checks: Ensure ILogger hooked up, ensure appsettings.json correct, ensure our initializer telemetry registered, ensure our ASP.Net Middleware properly registered, etc)
Simple Disable of Application Insights Feature	Product (Docs/Samples)	Disabling Application Insights should be simple (ideally single line) and any telemetry within the product should honor it.
AppInsight UserID/SessionID Telemetry	Product	Modifying the events to use Bot user/session ID's enables other Application Insights reports such as "Funnel" in show data in the appropriate user and session context.
LUIS Recognizer Telemetry	Sample	Keep a sample until such time that : (A) We're confident of our reporting scenarios that will satisfy 90%+ and/or (B) we create a flexible/configuration-based method of collecting data. (See below for details of a flexible storage system). By choosing to distribute as a sample, we won't be addressing endless customer enhancement issues of "Please add property X in LUIS intent telemetry because I need it for my report". The customer can modify for their telemetry themselves.
QnA Recognizer Telemetry	Sample	See LUIS recognizer above.
Bot Message Telemetry Middleware	Sample	See LUIS recognizer above.
(Stretch) Configuration-based mapping collector	Product	A configuration-based method of specifying which properties to pull from specific objects would allow customers flexibility to change the properties being logged, and where the properties will be logged. This will enable storage into the Application Insights "Custom Events" and "Custom Metrics" tables). This feature should support 1:M relationships (for example, storing all intent data from a single LUIS result as separate rows). (I have doubts whether this component should ever be developed - bang for the buck doesn't seem worth it at this time).

Priority 1: Satisfy Virtual Assistant scenarios

See nice writeup from Virtual Assistant team about the metrics they would like.

Item	Deliverable	Description
Telemetry Waterfall Dialog	Product	Enables logging to satisfy "Skills" analytics. More comfortable putting this into product, given the nature of the data being collected that can be enhanced over time.
Prioritize metrics	Doc	There are several metrics defined in the documentation. Need to prioritize and document order.
Review metrics with botmetrics@microsoft alias	Email	IPA team/Darren Jefford's team
Evaluate LUIS \ QnA Telemetry	Sample	Once metrics prioritized, reuse/modify existing telemetry components. Resulting telemetry should serve both sets of reports (superset)
New sample demonstrating Telemetry Waterfall	Sample	Something to generate data to drive the new PowerBI template
Build PowerBI Template	Sample	New reports showing off metrics
Review PowerBI Template	Meeting	Review with botmetrics@microsoft.com
(Stretch) Correlation/Causation for Skills	Sample	There are several "Surface to the user the reason why they aren't completing skills". This is semi related to some LUIS scenarios.

Priority 2: Enable close-the-loop scenarios for LUIS/QnA

LUIS team very interested in making a SAT/DSAT detector, segmentation/ recommender/etc.

From @Cez:

Item	Deliverable	Description
Baseline	Design Doc	Identification of apps we could use and collect/retain logs
Close the loop	Design doc	Once we have baseline measurement, attempt to close the loop to suggest labls for utterances.
Segmentation	Design doc	Additional feedback for improving the app and suggestions for content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bot_analytics_one_pager.MD

bot_analytics_one_pager.MD

Bot Analytics v4.2 Proposal

Priority 0: Address Bot Channel Service Top 4 Scenarios

Priority 1: LUIS/QnA User Agent String

Priority 1: Enable Out-of-the-box reports

Priority 1: Satisfy Virtual Assistant scenarios

Priority 2: Enable close-the-loop scenarios for LUIS/QnA

Files

bot_analytics_one_pager.MD

Latest commit

History

bot_analytics_one_pager.MD

File metadata and controls

Bot Analytics v4.2 Proposal

Priority 0: Address Bot Channel Service Top 4 Scenarios

Priority 1: LUIS/QnA User Agent String

Priority 1: Enable Out-of-the-box reports

Priority 1: Satisfy Virtual Assistant scenarios

Priority 2: Enable close-the-loop scenarios for LUIS/QnA