WEBVTT 1 00:00:00.211 --> 00:00:02.794 (gentle music) 2 00:00:10.440 --> 00:00:12.660 I've always been a keen violinist. 3 00:00:12.660 --> 00:00:16.290 Mastering a musical instrument requires constant feedback, 4 00:00:16.290 --> 00:00:18.131 practice, and fine tuning. 5 00:00:18.131 --> 00:00:20.370 (gentle music) 6 00:00:20.370 --> 00:00:22.770 AI systems are surprisingly similar. 7 00:00:22.770 --> 00:00:26.490 They need rigorous testing, evaluation, and adjustment 8 00:00:26.490 --> 00:00:28.560 to perform at their best. 9 00:00:28.560 --> 00:00:32.280 It's why we created GenAI Evaluator by BCG X. 10 00:00:32.280 --> 00:00:34.200 Many companies have already developed 11 00:00:34.200 --> 00:00:36.870 innovative AI use cases and want to scale them. 12 00:00:36.870 --> 00:00:40.020 A scalable evaluation process will balance risks 13 00:00:40.020 --> 00:00:43.140 with impact to maximize value. 14 00:00:43.140 --> 00:00:46.650 The first task is for people to establish the parameters 15 00:00:46.650 --> 00:00:49.890 to test the capabilities of a specific solution 16 00:00:49.890 --> 00:00:51.600 in a specific context. 17 00:00:51.600 --> 00:00:55.320 GenAI Evaluator can then be deployed to generate test data 18 00:00:55.320 --> 00:00:59.730 and automate testing based on an existing test design. 19 00:00:59.730 --> 00:01:03.360 GenAI Evaluator can also offer automated assessments 20 00:01:03.360 --> 00:01:04.950 and then provide new test data 21 00:01:04.950 --> 00:01:07.890 for areas which still need improvement. 22 00:01:07.890 --> 00:01:11.520 I believe this sort of targeted testing and evaluation 23 00:01:11.520 --> 00:01:15.420 of GenAI solutions is instrumental to success. 24 00:01:15.420 --> 00:01:18.780 Our experience has shown that following this process 25 00:01:18.780 --> 00:01:21.660 can make it two or three times faster 26 00:01:21.660 --> 00:01:24.390 to deploy new AI solutions in production. 27 00:01:24.390 --> 00:01:26.790 Getting it wrong can be costly, 28 00:01:26.790 --> 00:01:30.390 whether through reputational damage from unintended bias 29 00:01:30.390 --> 00:01:32.610 or the mishandling of sensitive data 30 00:01:32.610 --> 00:01:35.820 resulting in legal and compliance consequences. 31 00:01:35.820 --> 00:01:37.230 The biggest risk often lies 32 00:01:37.230 --> 00:01:39.330 in building a flawed business case 33 00:01:39.330 --> 00:01:42.030 when solutions fall short of promised impact 34 00:01:42.030 --> 00:01:43.560 due to poor quality. 35 00:01:43.560 --> 00:01:46.800 Or what about the damage caused by a breakdown in trust? 36 00:01:46.800 --> 00:01:48.900 Have you ever spoken to a chatbot 37 00:01:48.900 --> 00:01:50.280 that couldn't answer your query 38 00:01:50.280 --> 00:01:52.380 and just left you feeling frustrated? 39 00:01:52.380 --> 00:01:55.200 Customers don't like poor quality service. 40 00:01:55.200 --> 00:01:58.830 In fact, 30% of customers abandon a brand 41 00:01:58.830 --> 00:02:00.840 after a bad chatbot experience. 42 00:02:00.840 --> 00:02:02.190 To build trust, 43 00:02:02.190 --> 00:02:05.910 AI use cases need to be proficient, safe, 44 00:02:05.910 --> 00:02:08.490 secure, and compliant. 45 00:02:08.490 --> 00:02:10.380 As with playing the violin, 46 00:02:10.380 --> 00:02:13.440 the quality of the output is what really matters. 47 00:02:13.440 --> 00:02:16.190 (powerful music) 48 00:02:21.286 --> 00:02:23.869 (gentle music)