WEBVTT 1 00:00:00.125 --> 00:00:04.629 I’m Randi Griffin, I’m a Lead Data Scientist in the Boston office with BCG X. 2 00:00:05.130 --> 00:00:08.216 Data scientists are constantly having to learn new tools and techniques 3 00:00:08.299 --> 00:00:10.802 but there are fundamentals that apply across every problem. 4 00:00:10.969 --> 00:00:12.637 Take GenAI as an example. 5 00:00:13.054 --> 00:00:16.057 Data scientists everywhere are having to ramp up on this new technology. 6 00:00:16.224 --> 00:00:17.559 And there is a lot about it that is new, 7 00:00:17.767 --> 00:00:19.602 but the fundamentals still apply. 8 00:00:20.353 --> 00:00:22.022 To develop applications at scale, 9 00:00:22.564 --> 00:00:25.734 you need high-quality data, robust infrastructure, 10 00:00:26.359 --> 00:00:29.029 evaluation and monitoring of system performance. 11 00:00:30.864 --> 00:00:33.575 One of the biggest challenges around GenAI 12 00:00:33.908 --> 00:00:38.121 is bridging the gap between proofs of concept and scaled solutions. 13 00:00:38.329 --> 00:00:43.585 A major driver of this is gaps in tooling and capabilities around testing 14 00:00:43.585 --> 00:00:45.670 and evaluating GenAI systems 15 00:00:45.879 --> 00:00:49.507 to ensure they are consistently delivering the intended value. 16 00:00:49.758 --> 00:00:54.012 To help solve this, we’ve been developing some tools and playbooks 17 00:00:54.179 --> 00:00:57.265 focused on testing and evaluation of GenAI systems. 18 00:00:58.099 --> 00:01:01.269 I’ll tell you about one that I’m particularly excited about. 19 00:01:01.978 --> 00:01:06.066 We’ve developed a software library for using GenAI 20 00:01:06.399 --> 00:01:10.570 to automate the process of testing and evaluating another GenAI system. 21 00:01:10.820 --> 00:01:11.738 When I say automate, 22 00:01:11.821 --> 00:01:14.407 I don’t mean you just push a button and it’s one and done. 23 00:01:14.699 --> 00:01:19.162 There’s still a lot of thought and expertise that has to go into developing targeted, 24 00:01:19.287 --> 00:01:22.999 effective test and evals for a specific GenAI use case. 25 00:01:23.500 --> 00:01:24.918 But what our tool does is 26 00:01:24.918 --> 00:01:28.088 it solves a lot of the engineering challenges, which allows teams 27 00:01:28.088 --> 00:01:33.218 to focus on developing a tailored solution for their business context.