WEBVTT 1 00:00:00.000 --> 00:00:02.200 2 00:00:02.200 --> 00:00:05.090 SAM RANSBOTHAM: In AI projects, perfection is impossible, so 3 00:00:05.090 --> 00:00:07.790 when inevitable errors happen, how do you manage them? 4 00:00:07.790 --> 00:00:10.030 Find out how Nasdaq does it when we 5 00:00:10.030 --> 00:00:15.330 talk with Douglas Hamilton, the company's head of AI research. 6 00:00:15.330 --> 00:00:18.120 Welcome to Me, Myself, and AI, a podcast 7 00:00:18.120 --> 00:00:20.280 on artificial intelligence in business. 8 00:00:20.280 --> 00:00:24.050 Each episode, we introduce you to someone innovating with AI. 9 00:00:24.050 --> 00:00:27.330 I'm Sam Ransbotham, professor of information systems 10 00:00:27.330 --> 00:00:28.670 at Boston College. 11 00:00:28.670 --> 00:00:32.020 I'm also the guest editor for the AI and Business Strategy 12 00:00:32.020 --> 00:00:35.600 Big Ideas program at MIT Sloan Management Review. 13 00:00:35.600 --> 00:00:38.190 SHERVIN KHODABANDEH: And I'm Shervin Khodabandeh, 14 00:00:38.190 --> 00:00:42.270 senior partner with BCG, and I colead BCG's AI practice 15 00:00:42.270 --> 00:00:43.450 in North America. 16 00:00:43.450 --> 00:00:46.980 Together, MIT SMR and BCG have been 17 00:00:46.980 --> 00:00:50.400 researching AI for five years, interviewing hundreds 18 00:00:50.400 --> 00:00:52.750 of practitioners and surveying thousands 19 00:00:52.750 --> 00:00:56.860 of companies on what it takes to build and to deploy and scale 20 00:00:56.860 --> 00:00:59.450 AI capabilities across the organization 21 00:00:59.450 --> 00:01:02.810 and really transform the way organizations operate. 22 00:01:02.810 --> 00:01:07.080 SAM RANSBOTHAM: Today we're talking with Douglas Hamilton. 23 00:01:07.080 --> 00:01:09.420 He's the associate vice president and head 24 00:01:09.420 --> 00:01:11.310 of AI research at Nasdaq. 25 00:01:11.310 --> 00:01:13.185 Doug, thanks for joining us. 26 00:01:13.185 --> 00:01:13.685 Welcome. 27 00:01:13.685 --> 00:01:13.875 28 00:01:13.875 --> 00:01:15.500 DOUG HAMILTON: Thanks, Sam and Shervin. 29 00:01:15.500 --> 00:01:16.690 Great to be here today. 30 00:01:16.690 --> 00:01:19.550 SAM RANSBOTHAM: So our podcast is Me, Myself, and AI, 31 00:01:19.550 --> 00:01:21.842 so let's start with can you tell us 32 00:01:21.842 --> 00:01:23.800 a little bit about your current role at Nasdaq? 33 00:01:23.800 --> 00:01:25.610 DOUG HAMILTON: In my current role, 34 00:01:25.610 --> 00:01:28.330 I head up AI research for Nasdaq at our Machine Intelligence 35 00:01:28.330 --> 00:01:28.940 Lab. 36 00:01:28.940 --> 00:01:30.930 The role itself here is a little bit unique, 37 00:01:30.930 --> 00:01:34.230 since many, many roles within Global Technology, which 38 00:01:34.230 --> 00:01:37.100 is our engineering organization, are very much 39 00:01:37.100 --> 00:01:39.960 so business-unit-aligned, so they'll 40 00:01:39.960 --> 00:01:42.390 work with one of our four core business units, 41 00:01:42.390 --> 00:01:45.540 whereas this role really services every single area 42 00:01:45.540 --> 00:01:46.570 of the business. 43 00:01:46.570 --> 00:01:49.900 That means that we're servicing market technology, which 44 00:01:49.900 --> 00:01:53.050 is the area of Nasdaq's business that produces software 45 00:01:53.050 --> 00:01:55.470 that powers 2,300 different companies in 50 46 00:01:55.470 --> 00:01:58.130 different countries, powers 47 different markets 47 00:01:58.130 --> 00:02:00.570 around the world, as well as bank and broker 48 00:02:00.570 --> 00:02:02.650 operations, compliance, and [regulatory] tech 49 00:02:02.650 --> 00:02:04.983 for making sure that they are compliant with their local 50 00:02:04.983 --> 00:02:05.540 authorities. 51 00:02:05.540 --> 00:02:08.509 We service, of course, our investor intelligence line 52 00:02:08.509 --> 00:02:12.000 of business, which is how we get data from the market 53 00:02:12.000 --> 00:02:14.090 into the hands of the buy and sell side, 54 00:02:14.090 --> 00:02:16.390 so they can build products and trading strategies 55 00:02:16.390 --> 00:02:17.490 on top of those. 56 00:02:17.490 --> 00:02:19.280 We service, of course, the big one 57 00:02:19.280 --> 00:02:22.405 that people think about mostly, which is market services, which 58 00:02:22.405 --> 00:02:24.530 is the markets themselves; that's our core equities 59 00:02:24.530 --> 00:02:27.310 markets and a handful of options and derivatives 60 00:02:27.310 --> 00:02:28.420 markets as well. 61 00:02:28.420 --> 00:02:30.400 And then finally, corporate services -- 62 00:02:30.400 --> 00:02:32.650 that actually deals with the companies that are listed 63 00:02:32.650 --> 00:02:34.900 on our markets and their investor relationship 64 00:02:34.900 --> 00:02:35.940 departments. 65 00:02:35.940 --> 00:02:37.670 So really, we get to work across all 66 00:02:37.670 --> 00:02:39.840 of these different lines of business, which 67 00:02:39.840 --> 00:02:43.600 means that we get to work on a huge number of very 68 00:02:43.600 --> 00:02:46.150 interesting and very diverse problems in AI. 69 00:02:46.150 --> 00:02:49.310 Really, the goal of the group is to leverage 70 00:02:49.310 --> 00:02:52.490 all aspects of cutting-edge artificial intelligence, 71 00:02:52.490 --> 00:02:55.340 machine learning, and statistical computing 72 00:02:55.340 --> 00:02:59.120 in order to find value in these lines of business, 73 00:02:59.120 --> 00:03:00.740 whether it's through productivity 74 00:03:00.740 --> 00:03:04.490 plays, differentiating capabilities, or just continued 75 00:03:04.490 --> 00:03:08.170 incremental innovation that keeps Nasdaq's products leading 76 00:03:08.170 --> 00:03:11.770 edge and keeps our markets at the forefront of the industry. 77 00:03:11.770 --> 00:03:14.240 In this role, I have a team of data scientists 78 00:03:14.240 --> 00:03:17.190 that are doing the work, writing the code, building the models, 79 00:03:17.190 --> 00:03:19.950 munging the data, wrapping it all up in optimizers, 80 00:03:19.950 --> 00:03:21.970 and creating automated decision systems. 81 00:03:21.970 --> 00:03:24.610 So my role, really, I think, day to day, 82 00:03:24.610 --> 00:03:26.320 is working with our business partners 83 00:03:26.320 --> 00:03:28.690 to find opportunities for AI. 84 00:03:28.690 --> 00:03:31.680 SHERVIN KHODABANDEH: Doug, maybe to bring this to life a bit, 85 00:03:31.680 --> 00:03:35.100 can you contextualize this in the context of a use case? 86 00:03:35.100 --> 00:03:38.380 DOUG HAMILTON: I'll talk about one of our favorite use cases, 87 00:03:38.380 --> 00:03:41.810 which is a minimum volatility index that we run. 88 00:03:41.810 --> 00:03:46.750 So the minimum volatility index is an AI-powered index that we 89 00:03:46.750 --> 00:03:48.970 partnered with an external [exchange-traded funds] 90 00:03:48.970 --> 00:03:50.550 provider, Victory Capital, on. 91 00:03:50.550 --> 00:03:53.560 The goal of this index is to basically mimic 92 00:03:53.560 --> 00:03:55.880 Nasdaq's version of the Russell 2000 -- 93 00:03:55.880 --> 00:03:58.840 it's a large and mid-cap index -- 94 00:03:58.840 --> 00:04:02.470 and then essentially play with the weights of that index, 95 00:04:02.470 --> 00:04:04.430 which are normally market-cap-weighted, 96 00:04:04.430 --> 00:04:08.590 in such a way that it minimizes the volatility exposure of that 97 00:04:08.590 --> 00:04:09.600 portfolio. 98 00:04:09.600 --> 00:04:12.160 What made that project really difficult 99 00:04:12.160 --> 00:04:15.100 is that minimizing volatility is actually 100 00:04:15.100 --> 00:04:17.300 a fairly easy and straightforward problem if you 101 00:04:17.300 --> 00:04:18.540 want to treat it linearly. 102 00:04:18.540 --> 00:04:20.723 That is, you look at a bunch of stocks, 103 00:04:20.723 --> 00:04:22.890 you look at their historical volatility performance, 104 00:04:22.890 --> 00:04:24.810 you pick a bunch of low-volatility shares, 105 00:04:24.810 --> 00:04:27.460 you slap them together, boom -- you get a pretty low-volatility 106 00:04:27.460 --> 00:04:28.960 portfolio. 107 00:04:28.960 --> 00:04:31.360 And that's actually fairly straightforward to solve, 108 00:04:31.360 --> 00:04:33.400 from using linear methods to solve 109 00:04:33.400 --> 00:04:35.740 it, numerical programming, etc., and you 110 00:04:35.740 --> 00:04:37.552 can wrap linear constraints around it 111 00:04:37.552 --> 00:04:39.510 to make sure that you're not deviating too much 112 00:04:39.510 --> 00:04:41.050 from the underlying portfolio. 113 00:04:41.050 --> 00:04:43.160 You're still capturing the general themes of it. 114 00:04:43.160 --> 00:04:46.120 You're not overexposing yourself to different industries. 115 00:04:46.120 --> 00:04:47.960 That's actually fairly easy to do. 116 00:04:47.960 --> 00:04:50.400 However, when this becomes really interesting 117 00:04:50.400 --> 00:04:52.778 is, wouldn't it be cool if you found two stocks that 118 00:04:52.778 --> 00:04:54.820 worked against each other, so they could actually 119 00:04:54.820 --> 00:04:57.850 be quite volatile, but the portfolio, when mixed together, 120 00:04:57.850 --> 00:05:00.308 actually becomes less volatile than even two 121 00:05:00.308 --> 00:05:02.350 low-volatility shares, because they're constantly 122 00:05:02.350 --> 00:05:03.475 working against each other? 123 00:05:03.475 --> 00:05:05.810 That is, they have this nice contravarying action 124 00:05:05.810 --> 00:05:07.870 that cancels each other out so you 125 00:05:07.870 --> 00:05:10.400 can capture the median growth without the volatility 126 00:05:10.400 --> 00:05:10.900 exposure. 127 00:05:10.900 --> 00:05:12.110 That'd be great. 128 00:05:12.110 --> 00:05:13.980 Now, that becomes a nonlinear problem. 129 00:05:13.980 --> 00:05:17.470 And it becomes a very noisy, almost nonconvex problem 130 00:05:17.470 --> 00:05:18.320 at that point too. 131 00:05:18.320 --> 00:05:20.160 But you still have all these constraints 132 00:05:20.160 --> 00:05:21.450 you need to wrap around it. 133 00:05:21.450 --> 00:05:24.320 These are simulated annealing, genetic algorithms, 134 00:05:24.320 --> 00:05:26.530 [Markov Chain Monte Carlo-style] optimizers. 135 00:05:26.530 --> 00:05:29.430 And those also behave pretty well 136 00:05:29.430 --> 00:05:31.310 when we have soft constraints that 137 00:05:31.310 --> 00:05:35.530 generally guide the solutions back into the feasibility zone. 138 00:05:35.530 --> 00:05:38.370 The problem they have is when you give them hard constraints. 139 00:05:38.370 --> 00:05:40.580 They don't like hard constraints; they break a lot. 140 00:05:40.580 --> 00:05:43.273 So, what we had to do is rearchitect 141 00:05:43.273 --> 00:05:45.190 a lot of these algorithms to be able to handle 142 00:05:45.190 --> 00:05:46.710 these hard constraints as well. 143 00:05:46.710 --> 00:05:48.918 SHERVIN KHODABANDEH: What would be a hard constraint? 144 00:05:48.918 --> 00:05:51.480 DOUG HAMILTON: I'll give you an example of a soft constraint 145 00:05:51.480 --> 00:05:52.520 and a hard constraint. 146 00:05:52.520 --> 00:05:55.030 It would be really nice if you have a portfolio, when 147 00:05:55.030 --> 00:05:58.310 you go to rebalance it, if its total turnover was 148 00:05:58.310 --> 00:06:00.820 less than 30%, let's say, because it 149 00:06:00.820 --> 00:06:03.200 gets really expensive to rebalance it otherwise. 150 00:06:03.200 --> 00:06:07.480 A hard constraint might be that no holding can 151 00:06:07.480 --> 00:06:11.010 vary by more than 2% between the optimized portfolio 152 00:06:11.010 --> 00:06:12.340 and the parent portfolio. 153 00:06:12.340 --> 00:06:15.300 So if the parent portfolio is 10% Microsoft, 154 00:06:15.300 --> 00:06:18.140 let's say, then the optimized portfolio 155 00:06:18.140 --> 00:06:21.900 has to be between 8% and 12%, right? 156 00:06:21.900 --> 00:06:23.650 So that's an example of a hard constraint. 157 00:06:23.650 --> 00:06:27.430 If it's 7.9%, we're in violation of the governing 158 00:06:27.430 --> 00:06:29.900 documents of the index, and everybody 159 00:06:29.900 --> 00:06:31.053 gets into a lot of trouble. 160 00:06:31.053 --> 00:06:31.353 161 00:06:31.353 --> 00:06:32.520 SHERVIN KHODABANDEH: Got it. 162 00:06:32.520 --> 00:06:33.050 That's a good one. 163 00:06:33.050 --> 00:06:33.550 OK. 164 00:06:33.550 --> 00:06:38.150 So you're saying hard and soft constraints together 165 00:06:38.150 --> 00:06:40.250 form a tougher problem. 166 00:06:40.250 --> 00:06:42.570 DOUG HAMILTON: A considerably tougher problem, 167 00:06:42.570 --> 00:06:46.250 because these algorithms deal well with nonlinearity. 168 00:06:46.250 --> 00:06:50.130 Particularly, these Monte Carlo Markov Chain-style algos 169 00:06:50.130 --> 00:06:52.560 do not deal well with those hard constraints, where 170 00:06:52.560 --> 00:06:54.840 they must meet these criteria. 171 00:06:54.840 --> 00:06:55.900 And when you have -- 172 00:06:55.900 --> 00:06:58.060 I think in that one, we had 4,000 constraints, 173 00:06:58.060 --> 00:07:00.730 something like that -- almost nothing meets them. 174 00:07:00.730 --> 00:07:03.630 So if you take this hard culling approach, 175 00:07:03.630 --> 00:07:06.780 then you're left with no viable solutions 176 00:07:06.780 --> 00:07:08.560 to gain density around. 177 00:07:08.560 --> 00:07:11.820 So we had to spend a lot of time working with the team to figure 178 00:07:11.820 --> 00:07:14.480 out what the appropriate solution architecture should be 179 00:07:14.480 --> 00:07:17.770 -- algorithmically, etc. -- to overcome that challenge, 180 00:07:17.770 --> 00:07:20.520 how we set up those experiments, what sort of experiments we 181 00:07:20.520 --> 00:07:22.730 need to set up, how we test it, and, of course, 182 00:07:22.730 --> 00:07:25.420 how we actually communicate to the client that the solution is 183 00:07:25.420 --> 00:07:27.140 better than what they currently have. 184 00:07:27.140 --> 00:07:30.540 SHERVIN KHODABANDEH: Doug, this example that you talked about 185 00:07:30.540 --> 00:07:34.860 on volatility -- is [it] one of hundreds of use cases that 186 00:07:34.860 --> 00:07:37.910 your team does, or one of tens of use cases? 187 00:07:37.910 --> 00:07:40.730 [I'm] just trying to get a sense of the scale of the operations 188 00:07:40.730 --> 00:07:41.230 here. 189 00:07:41.230 --> 00:07:43.600 DOUG HAMILTON: Within Nasdaq, what we 190 00:07:43.600 --> 00:07:45.500 represent is the center of excellence 191 00:07:45.500 --> 00:07:47.160 for artificial intelligence. 192 00:07:47.160 --> 00:07:51.700 So this is one of I'd say it's in the dozens of use cases 193 00:07:51.700 --> 00:07:53.240 that are either live or that we're 194 00:07:53.240 --> 00:07:54.910 exploring at any point in time. 195 00:07:54.910 --> 00:07:57.600 On top of that, obviously, we have robust relationships 196 00:07:57.600 --> 00:08:00.880 across the business with third-party vendors that help 197 00:08:00.880 --> 00:08:03.970 us with all sorts of internal use cases -- 198 00:08:03.970 --> 00:08:06.400 where maybe it's not something we're looking to sell 199 00:08:06.400 --> 00:08:09.010 to the outside world, or something where we can leverage 200 00:08:09.010 --> 00:08:13.040 existing technology in a better way than building it in-house 201 00:08:13.040 --> 00:08:15.630 -- that also certainly are part of our AI story as well. 202 00:08:15.630 --> 00:08:19.030 SAM RANSBOTHAM: I was thinking about your example of finding 203 00:08:19.030 --> 00:08:20.470 the matching [stocks]. 204 00:08:20.470 --> 00:08:22.270 We think about digital twins; it's 205 00:08:22.270 --> 00:08:24.240 almost a digital un-twin stock that you're 206 00:08:24.240 --> 00:08:25.570 trying to match with. 207 00:08:25.570 --> 00:08:27.380 That has to change, though, at some point. 208 00:08:27.380 --> 00:08:28.910 How often are you revisiting these? 209 00:08:28.910 --> 00:08:30.368 How are you keeping them up to date 210 00:08:30.368 --> 00:08:33.052 so that you don't end up with things suddenly moving together 211 00:08:33.052 --> 00:08:35.260 when you thought they were moving the opposite [way]? 212 00:08:35.260 --> 00:08:38.490 DOUG HAMILTON: The nice thing about the world of indexing 213 00:08:38.490 --> 00:08:40.210 is that it's almost statutory how 214 00:08:40.210 --> 00:08:43.450 you do that, in that when we look at other models 215 00:08:43.450 --> 00:08:45.310 that we have in production, we usually 216 00:08:45.310 --> 00:08:46.750 do this in one of two ways. 217 00:08:46.750 --> 00:08:49.500 We usually do it either in an ad hoc way, 218 00:08:49.500 --> 00:08:51.910 through telemetry, looking at model performance 219 00:08:51.910 --> 00:08:54.970 and looking for some sort of persistent degradation 220 00:08:54.970 --> 00:08:57.410 in performance, as well as, of course, 221 00:08:57.410 --> 00:08:59.950 having some sort of regularly scheduled maintenance 222 00:08:59.950 --> 00:09:01.470 for many of our products. 223 00:09:01.470 --> 00:09:04.590 For indexes, we're basically told, 224 00:09:04.590 --> 00:09:07.075 "Here's how often you rebalance, and here's 225 00:09:07.075 --> 00:09:08.950 how often you're allowed to make the change." 226 00:09:08.950 --> 00:09:11.670 So in this case, we rebalance twice a year, 227 00:09:11.670 --> 00:09:14.180 so every six months is when we go back and take a look. 228 00:09:14.180 --> 00:09:15.888 SAM RANSBOTHAM: Let's switch a little bit 229 00:09:15.888 --> 00:09:17.538 to say, how did you end up doing this? 230 00:09:17.538 --> 00:09:19.080 What in your background led you to be 231 00:09:19.080 --> 00:09:20.590 able to do all those things? 232 00:09:20.590 --> 00:09:24.070 DOUG HAMILTON: I'm fortunate in that I got my first data 233 00:09:24.070 --> 00:09:26.220 science job in 2015. 234 00:09:26.220 --> 00:09:28.570 I'll tell you how I ended up there. 235 00:09:28.570 --> 00:09:31.140 My very first job was in the Air Force. 236 00:09:31.140 --> 00:09:35.160 I was enlisted in the Air Force in an operational position 237 00:09:35.160 --> 00:09:37.490 as an electronics technician; I spent a lot of time 238 00:09:37.490 --> 00:09:38.400 shocking myself. 239 00:09:38.400 --> 00:09:41.120 It was not the most fun thing in the world, 240 00:09:41.120 --> 00:09:44.970 but I was 22, so it was hard not to have fun. 241 00:09:44.970 --> 00:09:47.590 And what I realized ... 242 00:09:47.590 --> 00:09:51.170 I have this exposure to an operational world 243 00:09:51.170 --> 00:09:54.800 and was able to gain some leadership experience 244 00:09:54.800 --> 00:09:56.570 early on through that as well. 245 00:09:56.570 --> 00:09:58.410 I used the GI Bill to go to school -- 246 00:09:58.410 --> 00:10:00.410 the University of Illinois -- [where] I finished 247 00:10:00.410 --> 00:10:01.870 an undergraduate degree in math. 248 00:10:01.870 --> 00:10:04.010 I was very convinced I wanted to go 249 00:10:04.010 --> 00:10:06.520 become a professional mathematician, a professor. 250 00:10:06.520 --> 00:10:09.330 I had some great professors there that I was working with 251 00:10:09.330 --> 00:10:13.860 and was on the theoretical math track: real analysis, topology, 252 00:10:13.860 --> 00:10:14.700 etc. 253 00:10:14.700 --> 00:10:16.840 And that was great until the summer 254 00:10:16.840 --> 00:10:19.780 before I graduated: I had this wonderful internship 255 00:10:19.780 --> 00:10:22.320 in an astronomy lab, where we were studying 256 00:10:22.320 --> 00:10:24.230 a star in the last phase of its life, 257 00:10:24.230 --> 00:10:27.380 and it was going to have no earthly application whatsoever, 258 00:10:27.380 --> 00:10:29.010 and I was just bored and realized I 259 00:10:29.010 --> 00:10:30.670 didn't want to be in academia. 260 00:10:30.670 --> 00:10:32.540 As many people do who are in quant fields 261 00:10:32.540 --> 00:10:34.667 and faced with such an existential crisis, 262 00:10:34.667 --> 00:10:37.000 I decided I was going to go become a software developer. 263 00:10:37.000 --> 00:10:38.590 And what being a software developer 264 00:10:38.590 --> 00:10:41.500 mainly helped me figure out was that I 265 00:10:41.500 --> 00:10:43.920 didn't want to be a software developer, so I 266 00:10:43.920 --> 00:10:47.030 went to MIT to study systems engineering and management 267 00:10:47.030 --> 00:10:51.050 and really focused a lot of my effort in operations research 268 00:10:51.050 --> 00:10:52.200 while I was there. 269 00:10:52.200 --> 00:10:54.980 I had a colleague in the class at Boeing, 270 00:10:54.980 --> 00:10:57.170 who was looking to start up a data science group, 271 00:10:57.170 --> 00:10:59.420 so he suggested my name, and that's 272 00:10:59.420 --> 00:11:03.580 how I got started working at Boeing in manufacturing quality 273 00:11:03.580 --> 00:11:06.290 and standing up an advanced analytics and data science 274 00:11:06.290 --> 00:11:06.860 group there. 275 00:11:06.860 --> 00:11:09.500 I worked there for a couple of years 276 00:11:09.500 --> 00:11:12.740 and then, like many people who go and try 277 00:11:12.740 --> 00:11:15.653 to operate in the real world, became a little disillusioned 278 00:11:15.653 --> 00:11:17.320 by the real world and decided to retreat 279 00:11:17.320 --> 00:11:19.770 into the world of finance, where I found Nasdaq. 280 00:11:19.770 --> 00:11:21.860 I worked as a data scientist here for a few years 281 00:11:21.860 --> 00:11:24.080 before moving into a management position. 282 00:11:24.080 --> 00:11:26.452 I think that's the story in a nutshell. 283 00:11:26.452 --> 00:11:27.660 284 00:11:27.660 --> 00:11:29.790 SHERVIN KHODABANDEH: So Doug, from airplanes 285 00:11:29.790 --> 00:11:33.080 to financial markets, it seems like all 286 00:11:33.080 --> 00:11:37.250 of the examples you gave are where the stakes are 287 00:11:37.250 --> 00:11:38.000 quite high, right? 288 00:11:38.000 --> 00:11:38.792 DOUG HAMILTON: Yes. 289 00:11:38.792 --> 00:11:41.800 SHERVIN KHODABANDEH: I mean, the cost of being wrong or an error 290 00:11:41.800 --> 00:11:45.730 or a failure -- maybe not a catastrophic failure, 291 00:11:45.730 --> 00:11:49.350 but even that, I mean -- any kind of error is quite high. 292 00:11:49.350 --> 00:11:52.590 So how do you manage that in the projects 293 00:11:52.590 --> 00:11:55.200 and in the formulization of the projects? 294 00:11:55.200 --> 00:11:55.700 295 00:11:55.700 --> 00:11:57.617 DOUG HAMILTON: I'm really glad you asked that, 296 00:11:57.617 --> 00:11:59.500 because this is my opportunity to talk smack 297 00:11:59.500 --> 00:12:01.670 about academic AI a little while, 298 00:12:01.670 --> 00:12:03.310 so I'm going to start off doing that. 299 00:12:03.310 --> 00:12:03.505 300 00:12:03.505 --> 00:12:04.630 SAM RANSBOTHAM: Be careful. 301 00:12:04.630 --> 00:12:06.047 There's a professor here, so -- 302 00:12:06.047 --> 00:12:07.380 SHERVIN KHODABANDEH: Keep going. 303 00:12:07.380 --> 00:12:08.500 Sam would love that. 304 00:12:08.500 --> 00:12:09.150 Keep going. 305 00:12:09.150 --> 00:12:11.460 DOUG HAMILTON: Really, I think it all 306 00:12:11.460 --> 00:12:14.340 starts with being more concerned about your error rather 307 00:12:14.340 --> 00:12:15.600 than your accuracy. 308 00:12:15.600 --> 00:12:18.130 One of the things I've been really disappointed about 309 00:12:18.130 --> 00:12:21.840 in academic AI over the last couple of years is that -- 310 00:12:21.840 --> 00:12:25.280 really, it's related to this AI ethics talk that we have these 311 00:12:25.280 --> 00:12:29.100 days, where people were shocked to find out that when you build 312 00:12:29.100 --> 00:12:32.630 a model to, let's say, classify some things, 313 00:12:32.630 --> 00:12:35.650 and you look at some minority cohort within the data, 314 00:12:35.650 --> 00:12:38.300 that the model doesn't classify that all that well. 315 00:12:38.300 --> 00:12:41.067 And it's like, "Yeah" -- because that's oftentimes, 316 00:12:41.067 --> 00:12:43.150 if you're not careful about it, what models learn. 317 00:12:43.150 --> 00:12:46.690 And you're absolutely right; the stakes here are quite high, 318 00:12:46.690 --> 00:12:50.640 so what we want to be very conscious of is not just trying 319 00:12:50.640 --> 00:12:53.140 to get the high score -- which, when I read a lot of papers, 320 00:12:53.140 --> 00:12:55.650 it seems like we're in high-score land rather than 321 00:12:55.650 --> 00:12:56.930 in utility land. 322 00:12:56.930 --> 00:12:58.930 Even when I talk to many entry-level candidates, 323 00:12:58.930 --> 00:13:01.610 a lot of them talk about trying to get the high score through 324 00:13:01.610 --> 00:13:05.073 juicing the data rather than being really careful about how 325 00:13:05.073 --> 00:13:06.740 they think about the modeling process -- 326 00:13:06.740 --> 00:13:09.290 so they're very focused on the score: "What's the accuracy? 327 00:13:09.290 --> 00:13:10.153 What's the accuracy? 328 00:13:10.153 --> 00:13:11.570 How do we get the accuracy higher? 329 00:13:11.570 --> 00:13:13.153 Let's get rid of the outliers; that'll 330 00:13:13.153 --> 00:13:14.242 make the accuracy higher." 331 00:13:14.242 --> 00:13:15.700 Well, it turns out the outliers are 332 00:13:15.700 --> 00:13:17.130 the only thing that matters. 333 00:13:17.130 --> 00:13:19.510 So, what we are very concerned about, 334 00:13:19.510 --> 00:13:21.880 of course, is making sure our accuracy is very high, 335 00:13:21.880 --> 00:13:24.450 making sure our square scores, whatever, are very high; 336 00:13:24.450 --> 00:13:26.110 making sure that the metrics that 337 00:13:26.110 --> 00:13:28.610 are associated with business value are incredibly high. 338 00:13:28.610 --> 00:13:31.760 However, in order to make sure we're hedging our risks, what 339 00:13:31.760 --> 00:13:33.850 is as important, if not more important, 340 00:13:33.850 --> 00:13:36.390 is being keenly aware of the distribution of the error 341 00:13:36.390 --> 00:13:38.150 associated with your model. 342 00:13:38.150 --> 00:13:40.950 No matter what project we're working on, whether 343 00:13:40.950 --> 00:13:44.220 it's in our index space, whether it's in our corporate services 344 00:13:44.220 --> 00:13:47.150 space, whether it's in productivity and automation, 345 00:13:47.150 --> 00:13:51.240 or if it's in new capabilities, we want to make sure that 346 00:13:51.240 --> 00:13:54.550 our error is distributed very uniformly, 347 00:13:54.550 --> 00:13:56.670 or at least reasonably uniformly, 348 00:13:56.670 --> 00:13:59.220 across all the constituent groups that we might be 349 00:13:59.220 --> 00:14:01.980 unleashing this model on -- making sure that if there are 350 00:14:01.980 --> 00:14:03.480 areas where it doesn't perform well, 351 00:14:03.480 --> 00:14:06.700 we have a good understanding of the calibrated interval 352 00:14:06.700 --> 00:14:09.840 of our models and systems, so that when we're outside of that 353 00:14:09.840 --> 00:14:12.630 calibrated interval, frankly, at the very least, 354 00:14:12.630 --> 00:14:14.518 we can give somebody a warning to let 355 00:14:14.518 --> 00:14:16.310 them know that they're in the Wild West now 356 00:14:16.310 --> 00:14:18.070 and they should do this at their own risk. 357 00:14:18.070 --> 00:14:19.850 And maybe it's a little caveat emptor 358 00:14:19.850 --> 00:14:22.398 at that point, but at least you know. 359 00:14:22.398 --> 00:14:24.690 Really, I think those are the two most important things 360 00:14:24.690 --> 00:14:28.180 to help manage those risks: being eminently concerned about 361 00:14:28.180 --> 00:14:30.710 the distribution of your error, and being really, 362 00:14:30.710 --> 00:14:33.350 really well aware about where your model works and where it 363 00:14:33.350 --> 00:14:34.110 doesn't. 364 00:14:34.110 --> 00:14:37.330 There's a number of other things that everybody does these days 365 00:14:37.330 --> 00:14:40.090 around [personally identifiable information] protection 366 00:14:40.090 --> 00:14:44.000 and making sure that there's a robust review process involved. 367 00:14:44.000 --> 00:14:46.300 More recently, we've been able to make sure 368 00:14:46.300 --> 00:14:48.050 that every single project we're working on 369 00:14:48.050 --> 00:14:49.650 has at least one other person on it, 370 00:14:49.650 --> 00:14:52.270 so that two people have to agree that this is the best 371 00:14:52.270 --> 00:14:54.520 path forward and that these are the right numbers that 372 00:14:54.520 --> 00:14:55.145 are coming out. 373 00:14:55.145 --> 00:14:58.310 SHERVIN KHODABANDEH: So you gave a very good series 374 00:14:58.310 --> 00:15:01.860 of examples about algorithmically and technically 375 00:15:01.860 --> 00:15:06.670 and mindset-wise some of the steps that folks 376 00:15:06.670 --> 00:15:09.410 need to take to manage and understand the errors 377 00:15:09.410 --> 00:15:12.600 and be ahead of them rather than being surprised by them. 378 00:15:12.600 --> 00:15:16.140 I mean, on one hand ... so you have to have an eye toward 379 00:15:16.140 --> 00:15:19.700 the riskiness of it and how that could be managed. 380 00:15:19.700 --> 00:15:21.910 And on the other hand, you talked 381 00:15:21.910 --> 00:15:24.230 about being the center of excellence 382 00:15:24.230 --> 00:15:27.140 and the place within Nasdaq where 383 00:15:27.140 --> 00:15:32.150 the state of the art in this space is being defined. 384 00:15:32.150 --> 00:15:35.920 How do you balance the need to watch out 385 00:15:35.920 --> 00:15:39.040 for all those pitfalls and errors and conservatism, 386 00:15:39.040 --> 00:15:42.030 with pushing the art forward? 387 00:15:42.030 --> 00:15:44.838 In terms of a managerial orientation, 388 00:15:44.838 --> 00:15:45.630 how do you do that? 389 00:15:45.630 --> 00:15:49.500 DOUG HAMILTON: I think preaching that conservatism internally 390 00:15:49.500 --> 00:15:50.620 to your own team. 391 00:15:50.620 --> 00:15:53.168 When I first started, I had this great manager at Boeing. 392 00:15:53.168 --> 00:15:55.210 On the one hand, when she was reviewing our work, 393 00:15:55.210 --> 00:15:57.770 it was always very, very critical of what we were doing 394 00:15:57.770 --> 00:16:01.040 -- very careful about making sure we're being very careful 395 00:16:01.040 --> 00:16:01.830 and cautious. 396 00:16:01.830 --> 00:16:04.840 And then, as soon as we went to a business partner or a client, 397 00:16:04.840 --> 00:16:06.440 "Oh, this is the greatest thing ever. 398 00:16:06.440 --> 00:16:08.190 You're not going to believe it." 399 00:16:08.190 --> 00:16:11.290 And I think that's a very important part of this; 400 00:16:11.290 --> 00:16:13.720 those two angles of internal conservatism 401 00:16:13.720 --> 00:16:17.450 and external optimism are really very necessary to making sure 402 00:16:17.450 --> 00:16:21.790 that you don't just build high-performing, risk-averse AI 403 00:16:21.790 --> 00:16:25.630 systems, but also that you see rapid and robust maturation 404 00:16:25.630 --> 00:16:27.174 and adoption of the technology. 405 00:16:27.174 --> 00:16:27.980 406 00:16:27.980 --> 00:16:29.190 SAM RANSBOTHAM: Well, it ties back to your talking 407 00:16:29.190 --> 00:16:30.870 about understanding the error distribution. 408 00:16:30.870 --> 00:16:32.328 You can't really get a hold of that 409 00:16:32.328 --> 00:16:35.658 unless you do understand that error distribution well. 410 00:16:35.658 --> 00:16:37.700 Shervin and I have been talking recently about -- 411 00:16:37.700 --> 00:16:40.283 it's come up a few times; he'll remember better than I have -- 412 00:16:40.283 --> 00:16:43.320 about just this whole idea of noninferiority. 413 00:16:43.320 --> 00:16:46.920 That the goal of perfection is just unattainable, 414 00:16:46.920 --> 00:16:50.030 and if we set that out for any of these AI systems, 415 00:16:50.030 --> 00:16:52.010 then we're never going to adopt any of them. 416 00:16:52.010 --> 00:16:53.840 And the question is, it's like you 417 00:16:53.840 --> 00:16:56.650 say, it's a balancing thing of "How much off 418 00:16:56.650 --> 00:16:58.930 of that perfection do we accept?" 419 00:16:58.930 --> 00:17:01.390 We certainly want improvements over humans, 420 00:17:01.390 --> 00:17:03.960 but we also want improvements over humans eventually. 421 00:17:03.960 --> 00:17:05.910 It doesn't have to be improvement right out 422 00:17:05.910 --> 00:17:07.493 of the gate, if you think that there's 423 00:17:07.493 --> 00:17:08.550 some potential for that. 424 00:17:08.550 --> 00:17:11.220 SHERVIN KHODABANDEH: Let me use that as a segue to ask 425 00:17:11.220 --> 00:17:12.560 my next question. 426 00:17:12.560 --> 00:17:16.970 So you've been in the AI business for some time. 427 00:17:16.970 --> 00:17:19.410 How do you think the state of the art 428 00:17:19.410 --> 00:17:23.640 is evolving, or has evolved, or is going to evolve in the years 429 00:17:23.640 --> 00:17:24.420 to come? 430 00:17:24.420 --> 00:17:27.609 Obviously, technically it has been [evolving], and it will. 431 00:17:27.609 --> 00:17:31.290 But I'm more interested in [the] nontechnical aspects 432 00:17:31.290 --> 00:17:32.150 of that evolution. 433 00:17:32.150 --> 00:17:32.983 How do you see that? 434 00:17:32.983 --> 00:17:35.300 DOUG HAMILTON: When I first got started, 435 00:17:35.300 --> 00:17:37.660 the big papers that came out were probably [on] 436 00:17:37.660 --> 00:17:39.390 the [generative adversarial network] and [residual neural 437 00:17:39.390 --> 00:17:41.598 network]; both came out actually about the same time. 438 00:17:41.598 --> 00:17:44.550 [In a ] lot of ways, to me that represented the pinnacle 439 00:17:44.550 --> 00:17:46.393 of technical achievement in AI. 440 00:17:46.393 --> 00:17:48.060 Obviously, there's been more since then, 441 00:17:48.060 --> 00:17:49.910 obviously we've done a lot, obviously a lot of things 442 00:17:49.910 --> 00:17:50.618 have been solved. 443 00:17:50.618 --> 00:17:53.580 But at that point, we figured a lot of things out. 444 00:17:53.580 --> 00:17:58.550 And it opened the door to a lot of really good AI and machine 445 00:17:58.550 --> 00:17:59.810 learning solutions. 446 00:17:59.810 --> 00:18:02.280 When I look at the way the technology has progressed 447 00:18:02.280 --> 00:18:06.220 since then, I see it as a maturing ecosystem 448 00:18:06.220 --> 00:18:08.460 that enables business use. 449 00:18:08.460 --> 00:18:10.960 So whether this is things like transfer learning, 450 00:18:10.960 --> 00:18:13.280 to make sure that when we solve one problem, 451 00:18:13.280 --> 00:18:15.320 we can solve another problem, which 452 00:18:15.320 --> 00:18:18.310 is incredibly important for achieving economies of scale 453 00:18:18.310 --> 00:18:23.280 with AI groups, or it's things like AutoML 454 00:18:23.280 --> 00:18:27.530 that help to make everybody at least this kind of idea 455 00:18:27.530 --> 00:18:30.720 of a citizen data scientist, where software engineers 456 00:18:30.720 --> 00:18:33.420 and analysts can do enough machine learning research 457 00:18:33.420 --> 00:18:36.110 or machine learning work that they can prove something out 458 00:18:36.110 --> 00:18:37.800 before they bring it to a team like ours 459 00:18:37.800 --> 00:18:40.030 or their software engineering team. 460 00:18:40.030 --> 00:18:43.740 I think these are the sorts of maturing technologies 461 00:18:43.740 --> 00:18:47.730 that we've seen come along that make machine learning much more 462 00:18:47.730 --> 00:18:51.010 usable in business cases. 463 00:18:51.010 --> 00:18:54.490 I think beyond that, historically what 464 00:18:54.490 --> 00:18:56.950 we've seen is the traditional business 465 00:18:56.950 --> 00:19:00.930 case for artificial intelligence have been all-scale plays. 466 00:19:00.930 --> 00:19:03.180 I think these maturing technologies 467 00:19:03.180 --> 00:19:04.870 are these technologies that are allowing 468 00:19:04.870 --> 00:19:07.830 us to mature models, reuse them, and achieve economies of scale 469 00:19:07.830 --> 00:19:09.950 around the AI development cycle. 470 00:19:09.950 --> 00:19:11.580 As these get better and better, we're 471 00:19:11.580 --> 00:19:13.420 going to see more use cases open up 472 00:19:13.420 --> 00:19:15.590 for "Computers are good at it." 473 00:19:15.590 --> 00:19:17.450 And we've certainly seen it when we 474 00:19:17.450 --> 00:19:21.130 look at how hedge funds and high-frequency traders operate. 475 00:19:21.130 --> 00:19:23.380 They're all using machine learning all over the place, 476 00:19:23.380 --> 00:19:26.980 because it's better for research purposes than ad hoc trial 477 00:19:26.980 --> 00:19:28.980 and error and ad hoc rules. 478 00:19:28.980 --> 00:19:32.390 By the same token, we've seen it in game-playing machines 479 00:19:32.390 --> 00:19:33.240 for years. 480 00:19:33.240 --> 00:19:36.570 So the idea that we'll have more and more of these situations 481 00:19:36.570 --> 00:19:38.320 where [the] computer is just better at it, 482 00:19:38.320 --> 00:19:40.010 I think we're going to see that more and more. 483 00:19:40.010 --> 00:19:41.800 Certainly, this is, I think, the thesis 484 00:19:41.800 --> 00:19:43.600 behind self-driving cars, right? 485 00:19:43.600 --> 00:19:45.230 Driving is the thing that people do 486 00:19:45.230 --> 00:19:48.650 worst, that we do most often, and, provided that you can work 487 00:19:48.650 --> 00:19:50.595 out the edge cases, which is really hard, 488 00:19:50.595 --> 00:19:52.720 there's no reason why computers shouldn't be better 489 00:19:52.720 --> 00:19:54.440 at driving than people are. 490 00:19:54.440 --> 00:19:55.085 491 00:19:55.085 --> 00:19:56.960 SHERVIN KHODABANDEH: I was going to ask, what 492 00:19:56.960 --> 00:20:03.120 about those problems where computers alone or humans alone 493 00:20:03.120 --> 00:20:05.640 can't be as good, but the two of them 494 00:20:05.640 --> 00:20:08.160 together are far better than each of them on their own? 495 00:20:08.160 --> 00:20:12.120 DOUG HAMILTON: When there is a computer-aided process 496 00:20:12.120 --> 00:20:16.070 or an AI-aided process, we can then usually break that down 497 00:20:16.070 --> 00:20:18.490 into two things -- at least two processes. 498 00:20:18.490 --> 00:20:20.970 One is a process that the person is good at doing, 499 00:20:20.970 --> 00:20:23.370 and the other is a thing that the computer is doing. 500 00:20:23.370 --> 00:20:25.680 But if you can imagine computer-aided design, 501 00:20:25.680 --> 00:20:27.670 there's many things that a computer 502 00:20:27.670 --> 00:20:29.900 is good at in computer-aided design 503 00:20:29.900 --> 00:20:31.590 that it is helping the person with. 504 00:20:31.590 --> 00:20:34.410 One of them is not coming up with creative solutions 505 00:20:34.410 --> 00:20:36.860 and creative ways to draw out the part that they're 506 00:20:36.860 --> 00:20:39.070 trying to design, but it's very good at things 507 00:20:39.070 --> 00:20:41.430 like keeping track of which pixels are populated 508 00:20:41.430 --> 00:20:44.992 and which aren't, the 3D spatial geometry of it, etc. 509 00:20:44.992 --> 00:20:46.700 And that's what it's good at -- and then, 510 00:20:46.700 --> 00:20:49.680 the actual creative part is what the person's good at. 511 00:20:49.680 --> 00:20:54.090 Maybe a person is not so good at generating new and novel 512 00:20:54.090 --> 00:20:56.770 designs for, let's say, furniture. 513 00:20:56.770 --> 00:21:00.350 Maybe you're Ikea and you want to design new furniture. 514 00:21:00.350 --> 00:21:01.830 So maybe people aren't particularly 515 00:21:01.830 --> 00:21:03.660 good at generating these things out of the blue, 516 00:21:03.660 --> 00:21:05.430 but they're pretty good at looking at it and saying, 517 00:21:05.430 --> 00:21:06.670 "Well, hang on a second. 518 00:21:06.670 --> 00:21:08.340 If you design the chair that way, 519 00:21:08.340 --> 00:21:09.970 it's got a giant spike in the back, 520 00:21:09.970 --> 00:21:11.637 and it's going to be very uncomfortable, 521 00:21:11.637 --> 00:21:14.610 so let's get rid of that, and then let's try again." 522 00:21:14.610 --> 00:21:17.990 So there's this process of generating and fixing, 523 00:21:17.990 --> 00:21:22.100 or generating and editing, that we can break it down to. 524 00:21:22.100 --> 00:21:24.130 And the computer might be better at generating 525 00:21:24.130 --> 00:21:25.710 and the person is better at editing 526 00:21:25.710 --> 00:21:29.090 for these real-world or these latent requirements that 527 00:21:29.090 --> 00:21:30.392 are very difficult to encode. 528 00:21:30.392 --> 00:21:31.077 529 00:21:31.077 --> 00:21:32.160 SAM RANSBOTHAM: All right. 530 00:21:32.160 --> 00:21:34.350 Well, thanks for taking the time to talk with us 531 00:21:34.350 --> 00:21:38.010 and to learn about all that you, and in particular Nasdaq, 532 00:21:38.010 --> 00:21:38.570 are doing. 533 00:21:38.570 --> 00:21:41.750 We've heard about, for example, project selection, 534 00:21:41.750 --> 00:21:44.550 balancing risk, and how you pick those projects. 535 00:21:44.550 --> 00:21:48.410 We learned about how important understanding error is 536 00:21:48.410 --> 00:21:50.950 and all the different possible cases that you see 537 00:21:50.950 --> 00:21:52.320 for artificial intelligence. 538 00:21:52.320 --> 00:21:56.015 It's a pretty healthy bit to cover in just one session. 539 00:21:56.015 --> 00:21:57.890 We appreciate your input on all those topics. 540 00:21:57.890 --> 00:21:58.075 541 00:21:58.075 --> 00:21:59.200 DOUG HAMILTON: Thanks, Sam. 542 00:21:59.200 --> 00:22:00.205 Thanks, Shervin. 543 00:22:00.205 --> 00:22:01.830 It's been a pleasure speaking with you. 544 00:22:01.830 --> 00:22:04.220 SAM RANSBOTHAM: Please join us next time. 545 00:22:04.220 --> 00:22:07.530 We'll talk with Paula Goldman, chief ethical and humane use 546 00:22:07.530 --> 00:22:09.180 officer at Salesforce. 547 00:22:09.180 --> 00:22:12.510 ALLISON RYDER: Thanks for listening 548 00:22:12.510 --> 00:22:14.030 to Me, Myself, and AI. 549 00:22:14.030 --> 00:22:16.460 We believe, like you, that the conversation 550 00:22:16.460 --> 00:22:18.680 about AI implementation doesn't start and stop 551 00:22:18.680 --> 00:22:19.860 with this podcast. 552 00:22:19.860 --> 00:22:22.350 That's why we've created a group on LinkedIn, specifically 553 00:22:22.350 --> 00:22:23.470 for leaders like you. 554 00:22:23.470 --> 00:22:26.220 It's called AI for Leaders, and if you join us, 555 00:22:26.220 --> 00:22:28.240 you can chat with show creators and hosts, 556 00:22:28.240 --> 00:22:31.870 ask your own questions, share insights, and gain access 557 00:22:31.870 --> 00:22:34.350 to valuable resources about AI implementation 558 00:22:34.350 --> 00:22:36.440 from MIT SMR and BCG. 559 00:22:36.440 --> 00:22:41.510 You can access it by visiting mitsmr.com/AIforLeaders. 560 00:22:41.510 --> 00:22:44.280 We'll put that link in the show notes, 561 00:22:44.280 --> 00:22:46.720 and we hope to see you there. 562 00:22:46.720 --> 00:22:52.000