Success Metrics
How to measure the success of AI deployment?
Measuring the success of AI deployment is a critical but challenging topics. We normally use a combination of issue rate and accomplish rate to measure both the reliability and effectiveness of AI, which is often a trade-off in AI products.
Issue rate
Aissistant, built on the Generative AI, is very powerful and results in much higher resolution rate and higher quality of user experience compared with other technologies, but inevitable and occasionally makes imperfect response or tags, all of which we consider as issues. And we categorize the issues into three priorities according to their impact. Below are high-level definition of those priorities, while the detailed definition can be different by each business.
P0: the issue created significant and unrecoverable impact to the business. Issues in this category should be absolutely avoided and once happened should be fixed within 24 hours.
P1: the issue created noticeable impact to the business, but the situation is recoverable. Issues in this category, once happened, should be fixed within 48 hours.
P2: the issue might have created some but ignore-able impact to the business, and it is nice to resolve the issue some time.
In Aissist.io, one thing we cherish with all our heart is the "Trust of AI" which requires AI not to make catastrophic actions, such as "trash words", "saying negative things", or go far beyond its designed role say/do things that generates unethical or very negative impact to the business. We develop our core engine around "Trust of AI", and we also encourage all our users to work with us to enforce that.
Accomplish rate
We divide accomplish rates into two distinct metrics: contained rate and resolution rate, which is measured by % of system tags generated by our "self-aware" engine.
Contained Rate: 1 - percentage of sessions that has "sys_human_help" tag, which often means one or multiple of the following: (1) user is not happy with the service and specifically asked for human; (2) AI doesn't have sufficient information to answer user's questions or requests; (3) AI detects some scenario that don't know how to handle.
Resolution Rate: 1 - percentage of sessions that has "sys_human_help" or "sys_human_follow_up" tags. The incremental compared with contained rate is the latter tag which means one of multiple of the following: (1) there is some information that worth human pay attention to, like user submitted a availability; (2) there is an action for human team to execute then follow up; (3) there might be some unanswered questions (this is a little overlap with contained rate with subtle nuance).
The target for those rates vary by business, depending on the complexity of incoming traffic and quality of instructions & assets. Below are rough targets of each rate and actions to improve them.
Contained Rate
Sales lead qualification -> 80% - 90%
Service -> 70% - 90% (depends on complexity, eCommerce will be higher than technology service)
This normally means that there is a gap of either instruction or assets against the incoming traffic, enhancing which will lead to higher contained rate.
Resolution Rate
Sales lead qualification -> 70 - 80%
Service -> 60% - 85% (depends on complexity, eCommerce will be higher than technology service)
High contained rate and low resolution rate could be normal because in some scenarios that you do want human team to pay attention but not necessarily take actions. Fine-tuning AI to let AI output less response like "someone from the team will reach out", etc, will improve the resolution rate.
It is important to note again that the target of Accomplish rate depends on business and use case, therefore there is no golden standard. However, with sufficient documents, we see AI can achieve 70-80% resolution rate and 80-90% contained rate.
Last updated