WEBVTT

1
00:00:00.000 --> 00:00:02.200


2
00:00:02.200 --> 00:00:05.090
SAM RANSBOTHAM: In AI projects,
perfection is impossible, so

3
00:00:05.090 --> 00:00:07.790
when inevitable errors happen,
how do you manage them?

4
00:00:07.790 --> 00:00:10.030
Find out how Nasdaq
does it when we

5
00:00:10.030 --> 00:00:15.330
talk with Douglas Hamilton, the
company's head of AI research.

6
00:00:15.330 --> 00:00:18.120
Welcome to Me, Myself,
and AI, a podcast

7
00:00:18.120 --> 00:00:20.280
on artificial
intelligence in business.

8
00:00:20.280 --> 00:00:24.050
Each episode, we introduce you
to someone innovating with AI.

9
00:00:24.050 --> 00:00:27.330
I'm Sam Ransbotham, professor
of information systems

10
00:00:27.330 --> 00:00:28.670
at Boston College.

11
00:00:28.670 --> 00:00:32.020
I'm also the guest editor for
the AI and Business Strategy

12
00:00:32.020 --> 00:00:35.600
Big Ideas program at MIT
Sloan Management Review.

13
00:00:35.600 --> 00:00:38.190
SHERVIN KHODABANDEH: And
I'm Shervin Khodabandeh,

14
00:00:38.190 --> 00:00:42.270
senior partner with BCG, and
I colead BCG's AI practice

15
00:00:42.270 --> 00:00:43.450
in North America.

16
00:00:43.450 --> 00:00:46.980
Together, MIT SMR
and BCG have been

17
00:00:46.980 --> 00:00:50.400
researching AI for five
years, interviewing hundreds

18
00:00:50.400 --> 00:00:52.750
of practitioners and
surveying thousands

19
00:00:52.750 --> 00:00:56.860
of companies on what it takes
to build and to deploy and scale

20
00:00:56.860 --> 00:00:59.450
AI capabilities across
the organization

21
00:00:59.450 --> 00:01:02.810
and really transform the
way organizations operate.

22
00:01:02.810 --> 00:01:07.080
SAM RANSBOTHAM: Today we're
talking with Douglas Hamilton.

23
00:01:07.080 --> 00:01:09.420
He's the associate
vice president and head

24
00:01:09.420 --> 00:01:11.310
of AI research at Nasdaq.

25
00:01:11.310 --> 00:01:13.185
Doug, thanks for joining us.

26
00:01:13.185 --> 00:01:13.685
Welcome.

27
00:01:13.685 --> 00:01:13.875


28
00:01:13.875 --> 00:01:15.500
DOUG HAMILTON: Thanks,
Sam and Shervin.

29
00:01:15.500 --> 00:01:16.690
Great to be here today.

30
00:01:16.690 --> 00:01:19.550
SAM RANSBOTHAM: So our
podcast is Me, Myself, and AI,

31
00:01:19.550 --> 00:01:21.842
so let's start with
can you tell us

32
00:01:21.842 --> 00:01:23.800
a little bit about your
current role at Nasdaq?

33
00:01:23.800 --> 00:01:25.610
DOUG HAMILTON: In
my current role,

34
00:01:25.610 --> 00:01:28.330
I head up AI research for Nasdaq
at our Machine Intelligence

35
00:01:28.330 --> 00:01:28.940
Lab.

36
00:01:28.940 --> 00:01:30.930
The role itself here
is a little bit unique,

37
00:01:30.930 --> 00:01:34.230
since many, many roles within
Global Technology, which

38
00:01:34.230 --> 00:01:37.100
is our engineering
organization, are very much

39
00:01:37.100 --> 00:01:39.960
so business-unit-aligned,
so they'll

40
00:01:39.960 --> 00:01:42.390
work with one of our
four core business units,

41
00:01:42.390 --> 00:01:45.540
whereas this role really
services every single area

42
00:01:45.540 --> 00:01:46.570
of the business.

43
00:01:46.570 --> 00:01:49.900
That means that we're servicing
market technology, which

44
00:01:49.900 --> 00:01:53.050
is the area of Nasdaq's
business that produces software

45
00:01:53.050 --> 00:01:55.470
that powers 2,300
different companies in 50

46
00:01:55.470 --> 00:01:58.130
different countries,
powers 47 different markets

47
00:01:58.130 --> 00:02:00.570
around the world, as
well as bank and broker

48
00:02:00.570 --> 00:02:02.650
operations, compliance,
and [regulatory] tech

49
00:02:02.650 --> 00:02:04.983
for making sure that they are
compliant with their local

50
00:02:04.983 --> 00:02:05.540
authorities.

51
00:02:05.540 --> 00:02:08.509
We service, of course, our
investor intelligence line

52
00:02:08.509 --> 00:02:12.000
of business, which is how
we get data from the market

53
00:02:12.000 --> 00:02:14.090
into the hands of the
buy and sell side,

54
00:02:14.090 --> 00:02:16.390
so they can build products
and trading strategies

55
00:02:16.390 --> 00:02:17.490
on top of those.

56
00:02:17.490 --> 00:02:19.280
We service, of
course, the big one

57
00:02:19.280 --> 00:02:22.405
that people think about mostly,
which is market services, which

58
00:02:22.405 --> 00:02:24.530
is the markets themselves;
that's our core equities

59
00:02:24.530 --> 00:02:27.310
markets and a handful of
options and derivatives

60
00:02:27.310 --> 00:02:28.420
markets as well.

61
00:02:28.420 --> 00:02:30.400
And then finally,
corporate services --

62
00:02:30.400 --> 00:02:32.650
that actually deals with the
companies that are listed

63
00:02:32.650 --> 00:02:34.900
on our markets and their
investor relationship

64
00:02:34.900 --> 00:02:35.940
departments.

65
00:02:35.940 --> 00:02:37.670
So really, we get
to work across all

66
00:02:37.670 --> 00:02:39.840
of these different
lines of business, which

67
00:02:39.840 --> 00:02:43.600
means that we get to work
on a huge number of very

68
00:02:43.600 --> 00:02:46.150
interesting and very
diverse problems in AI.

69
00:02:46.150 --> 00:02:49.310
Really, the goal of the
group is to leverage

70
00:02:49.310 --> 00:02:52.490
all aspects of cutting-edge
artificial intelligence,

71
00:02:52.490 --> 00:02:55.340
machine learning, and
statistical computing

72
00:02:55.340 --> 00:02:59.120
in order to find value in
these lines of business,

73
00:02:59.120 --> 00:03:00.740
whether it's
through productivity

74
00:03:00.740 --> 00:03:04.490
plays, differentiating
capabilities, or just continued

75
00:03:04.490 --> 00:03:08.170
incremental innovation that
keeps Nasdaq's products leading

76
00:03:08.170 --> 00:03:11.770
edge and keeps our markets at
the forefront of the industry.

77
00:03:11.770 --> 00:03:14.240
In this role, I have a
team of data scientists

78
00:03:14.240 --> 00:03:17.190
that are doing the work, writing
the code, building the models,

79
00:03:17.190 --> 00:03:19.950
munging the data, wrapping
it all up in optimizers,

80
00:03:19.950 --> 00:03:21.970
and creating automated
decision systems.

81
00:03:21.970 --> 00:03:24.610
So my role, really,
I think, day to day,

82
00:03:24.610 --> 00:03:26.320
is working with our
business partners

83
00:03:26.320 --> 00:03:28.690
to find opportunities for AI.

84
00:03:28.690 --> 00:03:31.680
SHERVIN KHODABANDEH: Doug, maybe
to bring this to life a bit,

85
00:03:31.680 --> 00:03:35.100
can you contextualize this
in the context of a use case?

86
00:03:35.100 --> 00:03:38.380
DOUG HAMILTON: I'll talk about
one of our favorite use cases,

87
00:03:38.380 --> 00:03:41.810
which is a minimum
volatility index that we run.

88
00:03:41.810 --> 00:03:46.750
So the minimum volatility index
is an AI-powered index that we

89
00:03:46.750 --> 00:03:48.970
partnered with an external
[exchange-traded funds]

90
00:03:48.970 --> 00:03:50.550
provider, Victory Capital, on.

91
00:03:50.550 --> 00:03:53.560
The goal of this index
is to basically mimic

92
00:03:53.560 --> 00:03:55.880
Nasdaq's version of
the Russell 2000 --

93
00:03:55.880 --> 00:03:58.840
it's a large and
mid-cap index --

94
00:03:58.840 --> 00:04:02.470
and then essentially play with
the weights of that index,

95
00:04:02.470 --> 00:04:04.430
which are normally
market-cap-weighted,

96
00:04:04.430 --> 00:04:08.590
in such a way that it minimizes
the volatility exposure of that

97
00:04:08.590 --> 00:04:09.600
portfolio.

98
00:04:09.600 --> 00:04:12.160
What made that project
really difficult

99
00:04:12.160 --> 00:04:15.100
is that minimizing
volatility is actually

100
00:04:15.100 --> 00:04:17.300
a fairly easy and
straightforward problem if you

101
00:04:17.300 --> 00:04:18.540
want to treat it linearly.

102
00:04:18.540 --> 00:04:20.723
That is, you look at
a bunch of stocks,

103
00:04:20.723 --> 00:04:22.890
you look at their historical
volatility performance,

104
00:04:22.890 --> 00:04:24.810
you pick a bunch of
low-volatility shares,

105
00:04:24.810 --> 00:04:27.460
you slap them together, boom --
you get a pretty low-volatility

106
00:04:27.460 --> 00:04:28.960
portfolio.

107
00:04:28.960 --> 00:04:31.360
And that's actually fairly
straightforward to solve,

108
00:04:31.360 --> 00:04:33.400
from using linear
methods to solve

109
00:04:33.400 --> 00:04:35.740
it, numerical
programming, etc., and you

110
00:04:35.740 --> 00:04:37.552
can wrap linear
constraints around it

111
00:04:37.552 --> 00:04:39.510
to make sure that you're
not deviating too much

112
00:04:39.510 --> 00:04:41.050
from the underlying portfolio.

113
00:04:41.050 --> 00:04:43.160
You're still capturing
the general themes of it.

114
00:04:43.160 --> 00:04:46.120
You're not overexposing yourself
to different industries.

115
00:04:46.120 --> 00:04:47.960
That's actually
fairly easy to do.

116
00:04:47.960 --> 00:04:50.400
However, when this
becomes really interesting

117
00:04:50.400 --> 00:04:52.778
is, wouldn't it be cool if
you found two stocks that

118
00:04:52.778 --> 00:04:54.820
worked against each other,
so they could actually

119
00:04:54.820 --> 00:04:57.850
be quite volatile, but the
portfolio, when mixed together,

120
00:04:57.850 --> 00:05:00.308
actually becomes less
volatile than even two

121
00:05:00.308 --> 00:05:02.350
low-volatility shares,
because they're constantly

122
00:05:02.350 --> 00:05:03.475
working against each other?

123
00:05:03.475 --> 00:05:05.810
That is, they have this
nice contravarying action

124
00:05:05.810 --> 00:05:07.870
that cancels each
other out so you

125
00:05:07.870 --> 00:05:10.400
can capture the median
growth without the volatility

126
00:05:10.400 --> 00:05:10.900
exposure.

127
00:05:10.900 --> 00:05:12.110
That'd be great.

128
00:05:12.110 --> 00:05:13.980
Now, that becomes a
nonlinear problem.

129
00:05:13.980 --> 00:05:17.470
And it becomes a very noisy,
almost nonconvex problem

130
00:05:17.470 --> 00:05:18.320
at that point too.

131
00:05:18.320 --> 00:05:20.160
But you still have
all these constraints

132
00:05:20.160 --> 00:05:21.450
you need to wrap around it.

133
00:05:21.450 --> 00:05:24.320
These are simulated
annealing, genetic algorithms,

134
00:05:24.320 --> 00:05:26.530
[Markov Chain Monte
Carlo-style] optimizers.

135
00:05:26.530 --> 00:05:29.430
And those also
behave pretty well

136
00:05:29.430 --> 00:05:31.310
when we have soft
constraints that

137
00:05:31.310 --> 00:05:35.530
generally guide the solutions
back into the feasibility zone.

138
00:05:35.530 --> 00:05:38.370
The problem they have is when
you give them hard constraints.

139
00:05:38.370 --> 00:05:40.580
They don't like hard
constraints; they break a lot.

140
00:05:40.580 --> 00:05:43.273
So, what we had to
do is rearchitect

141
00:05:43.273 --> 00:05:45.190
a lot of these algorithms
to be able to handle

142
00:05:45.190 --> 00:05:46.710
these hard constraints as well.

143
00:05:46.710 --> 00:05:48.918
SHERVIN KHODABANDEH: What
would be a hard constraint?

144
00:05:48.918 --> 00:05:51.480
DOUG HAMILTON: I'll give you
an example of a soft constraint

145
00:05:51.480 --> 00:05:52.520
and a hard constraint.

146
00:05:52.520 --> 00:05:55.030
It would be really nice if
you have a portfolio, when

147
00:05:55.030 --> 00:05:58.310
you go to rebalance it,
if its total turnover was

148
00:05:58.310 --> 00:06:00.820
less than 30%, let's
say, because it

149
00:06:00.820 --> 00:06:03.200
gets really expensive to
rebalance it otherwise.

150
00:06:03.200 --> 00:06:07.480
A hard constraint might
be that no holding can

151
00:06:07.480 --> 00:06:11.010
vary by more than 2% between
the optimized portfolio

152
00:06:11.010 --> 00:06:12.340
and the parent portfolio.

153
00:06:12.340 --> 00:06:15.300
So if the parent portfolio
is 10% Microsoft,

154
00:06:15.300 --> 00:06:18.140
let's say, then the
optimized portfolio

155
00:06:18.140 --> 00:06:21.900
has to be between
8% and 12%, right?

156
00:06:21.900 --> 00:06:23.650
So that's an example
of a hard constraint.

157
00:06:23.650 --> 00:06:27.430
If it's 7.9%, we're in
violation of the governing

158
00:06:27.430 --> 00:06:29.900
documents of the
index, and everybody

159
00:06:29.900 --> 00:06:31.053
gets into a lot of trouble.

160
00:06:31.053 --> 00:06:31.353


161
00:06:31.353 --> 00:06:32.520
SHERVIN KHODABANDEH: Got it.

162
00:06:32.520 --> 00:06:33.050
That's a good one.

163
00:06:33.050 --> 00:06:33.550
OK.

164
00:06:33.550 --> 00:06:38.150
So you're saying hard and
soft constraints together

165
00:06:38.150 --> 00:06:40.250
form a tougher problem.

166
00:06:40.250 --> 00:06:42.570
DOUG HAMILTON: A
considerably tougher problem,

167
00:06:42.570 --> 00:06:46.250
because these algorithms
deal well with nonlinearity.

168
00:06:46.250 --> 00:06:50.130
Particularly, these Monte
Carlo Markov Chain-style algos

169
00:06:50.130 --> 00:06:52.560
do not deal well with those
hard constraints, where

170
00:06:52.560 --> 00:06:54.840
they must meet these criteria.

171
00:06:54.840 --> 00:06:55.900
And when you have --

172
00:06:55.900 --> 00:06:58.060
I think in that one, we
had 4,000 constraints,

173
00:06:58.060 --> 00:07:00.730
something like that --
almost nothing meets them.

174
00:07:00.730 --> 00:07:03.630
So if you take this
hard culling approach,

175
00:07:03.630 --> 00:07:06.780
then you're left with
no viable solutions

176
00:07:06.780 --> 00:07:08.560
to gain density around.

177
00:07:08.560 --> 00:07:11.820
So we had to spend a lot of time
working with the team to figure

178
00:07:11.820 --> 00:07:14.480
out what the appropriate
solution architecture should be

179
00:07:14.480 --> 00:07:17.770
-- algorithmically, etc. --
to overcome that challenge,

180
00:07:17.770 --> 00:07:20.520
how we set up those experiments,
what sort of experiments we

181
00:07:20.520 --> 00:07:22.730
need to set up, how we
test it, and, of course,

182
00:07:22.730 --> 00:07:25.420
how we actually communicate to
the client that the solution is

183
00:07:25.420 --> 00:07:27.140
better than what
they currently have.

184
00:07:27.140 --> 00:07:30.540
SHERVIN KHODABANDEH: Doug, this
example that you talked about

185
00:07:30.540 --> 00:07:34.860
on volatility -- is [it] one
of hundreds of use cases that

186
00:07:34.860 --> 00:07:37.910
your team does, or one
of tens of use cases?

187
00:07:37.910 --> 00:07:40.730
[I'm] just trying to get a sense
of the scale of the operations

188
00:07:40.730 --> 00:07:41.230
here.

189
00:07:41.230 --> 00:07:43.600
DOUG HAMILTON: Within
Nasdaq, what we

190
00:07:43.600 --> 00:07:45.500
represent is the
center of excellence

191
00:07:45.500 --> 00:07:47.160
for artificial intelligence.

192
00:07:47.160 --> 00:07:51.700
So this is one of I'd say it's
in the dozens of use cases

193
00:07:51.700 --> 00:07:53.240
that are either
live or that we're

194
00:07:53.240 --> 00:07:54.910
exploring at any point in time.

195
00:07:54.910 --> 00:07:57.600
On top of that, obviously,
we have robust relationships

196
00:07:57.600 --> 00:08:00.880
across the business with
third-party vendors that help

197
00:08:00.880 --> 00:08:03.970
us with all sorts of
internal use cases --

198
00:08:03.970 --> 00:08:06.400
where maybe it's not something
we're looking to sell

199
00:08:06.400 --> 00:08:09.010
to the outside world, or
something where we can leverage

200
00:08:09.010 --> 00:08:13.040
existing technology in a better
way than building it in-house

201
00:08:13.040 --> 00:08:15.630
-- that also certainly are
part of our AI story as well.

202
00:08:15.630 --> 00:08:19.030
SAM RANSBOTHAM: I was thinking
about your example of finding

203
00:08:19.030 --> 00:08:20.470
the matching [stocks].

204
00:08:20.470 --> 00:08:22.270
We think about
digital twins; it's

205
00:08:22.270 --> 00:08:24.240
almost a digital un-twin
stock that you're

206
00:08:24.240 --> 00:08:25.570
trying to match with.

207
00:08:25.570 --> 00:08:27.380
That has to change,
though, at some point.

208
00:08:27.380 --> 00:08:28.910
How often are you
revisiting these?

209
00:08:28.910 --> 00:08:30.368
How are you keeping
them up to date

210
00:08:30.368 --> 00:08:33.052
so that you don't end up with
things suddenly moving together

211
00:08:33.052 --> 00:08:35.260
when you thought they were
moving the opposite [way]?

212
00:08:35.260 --> 00:08:38.490
DOUG HAMILTON: The nice thing
about the world of indexing

213
00:08:38.490 --> 00:08:40.210
is that it's almost
statutory how

214
00:08:40.210 --> 00:08:43.450
you do that, in that when
we look at other models

215
00:08:43.450 --> 00:08:45.310
that we have in
production, we usually

216
00:08:45.310 --> 00:08:46.750
do this in one of two ways.

217
00:08:46.750 --> 00:08:49.500
We usually do it either
in an ad hoc way,

218
00:08:49.500 --> 00:08:51.910
through telemetry, looking
at model performance

219
00:08:51.910 --> 00:08:54.970
and looking for some sort
of persistent degradation

220
00:08:54.970 --> 00:08:57.410
in performance, as
well as, of course,

221
00:08:57.410 --> 00:08:59.950
having some sort of regularly
scheduled maintenance

222
00:08:59.950 --> 00:09:01.470
for many of our products.

223
00:09:01.470 --> 00:09:04.590
For indexes, we're
basically told,

224
00:09:04.590 --> 00:09:07.075
"Here's how often you
rebalance, and here's

225
00:09:07.075 --> 00:09:08.950
how often you're allowed
to make the change."

226
00:09:08.950 --> 00:09:11.670
So in this case, we
rebalance twice a year,

227
00:09:11.670 --> 00:09:14.180
so every six months is when
we go back and take a look.

228
00:09:14.180 --> 00:09:15.888
SAM RANSBOTHAM: Let's
switch a little bit

229
00:09:15.888 --> 00:09:17.538
to say, how did you
end up doing this?

230
00:09:17.538 --> 00:09:19.080
What in your background
led you to be

231
00:09:19.080 --> 00:09:20.590
able to do all those things?

232
00:09:20.590 --> 00:09:24.070
DOUG HAMILTON: I'm fortunate
in that I got my first data

233
00:09:24.070 --> 00:09:26.220
science job in 2015.

234
00:09:26.220 --> 00:09:28.570
I'll tell you how
I ended up there.

235
00:09:28.570 --> 00:09:31.140
My very first job
was in the Air Force.

236
00:09:31.140 --> 00:09:35.160
I was enlisted in the Air Force
in an operational position

237
00:09:35.160 --> 00:09:37.490
as an electronics technician;
I spent a lot of time

238
00:09:37.490 --> 00:09:38.400
shocking myself.

239
00:09:38.400 --> 00:09:41.120
It was not the most
fun thing in the world,

240
00:09:41.120 --> 00:09:44.970
but I was 22, so it was
hard not to have fun.

241
00:09:44.970 --> 00:09:47.590
And what I realized ...

242
00:09:47.590 --> 00:09:51.170
I have this exposure
to an operational world

243
00:09:51.170 --> 00:09:54.800
and was able to gain some
leadership experience

244
00:09:54.800 --> 00:09:56.570
early on through that as well.

245
00:09:56.570 --> 00:09:58.410
I used the GI Bill
to go to school --

246
00:09:58.410 --> 00:10:00.410
the University of Illinois
-- [where] I finished

247
00:10:00.410 --> 00:10:01.870
an undergraduate degree in math.

248
00:10:01.870 --> 00:10:04.010
I was very convinced
I wanted to go

249
00:10:04.010 --> 00:10:06.520
become a professional
mathematician, a professor.

250
00:10:06.520 --> 00:10:09.330
I had some great professors
there that I was working with

251
00:10:09.330 --> 00:10:13.860
and was on the theoretical math
track: real analysis, topology,

252
00:10:13.860 --> 00:10:14.700
etc.

253
00:10:14.700 --> 00:10:16.840
And that was great
until the summer

254
00:10:16.840 --> 00:10:19.780
before I graduated: I had
this wonderful internship

255
00:10:19.780 --> 00:10:22.320
in an astronomy lab,
where we were studying

256
00:10:22.320 --> 00:10:24.230
a star in the last
phase of its life,

257
00:10:24.230 --> 00:10:27.380
and it was going to have no
earthly application whatsoever,

258
00:10:27.380 --> 00:10:29.010
and I was just
bored and realized I

259
00:10:29.010 --> 00:10:30.670
didn't want to be in academia.

260
00:10:30.670 --> 00:10:32.540
As many people do who
are in quant fields

261
00:10:32.540 --> 00:10:34.667
and faced with such
an existential crisis,

262
00:10:34.667 --> 00:10:37.000
I decided I was going to go
become a software developer.

263
00:10:37.000 --> 00:10:38.590
And what being a
software developer

264
00:10:38.590 --> 00:10:41.500
mainly helped me
figure out was that I

265
00:10:41.500 --> 00:10:43.920
didn't want to be a
software developer, so I

266
00:10:43.920 --> 00:10:47.030
went to MIT to study systems
engineering and management

267
00:10:47.030 --> 00:10:51.050
and really focused a lot of my
effort in operations research

268
00:10:51.050 --> 00:10:52.200
while I was there.

269
00:10:52.200 --> 00:10:54.980
I had a colleague in
the class at Boeing,

270
00:10:54.980 --> 00:10:57.170
who was looking to start
up a data science group,

271
00:10:57.170 --> 00:10:59.420
so he suggested my
name, and that's

272
00:10:59.420 --> 00:11:03.580
how I got started working at
Boeing in manufacturing quality

273
00:11:03.580 --> 00:11:06.290
and standing up an advanced
analytics and data science

274
00:11:06.290 --> 00:11:06.860
group there.

275
00:11:06.860 --> 00:11:09.500
I worked there for
a couple of years

276
00:11:09.500 --> 00:11:12.740
and then, like many
people who go and try

277
00:11:12.740 --> 00:11:15.653
to operate in the real world,
became a little disillusioned

278
00:11:15.653 --> 00:11:17.320
by the real world and
decided to retreat

279
00:11:17.320 --> 00:11:19.770
into the world of finance,
where I found Nasdaq.

280
00:11:19.770 --> 00:11:21.860
I worked as a data scientist
here for a few years

281
00:11:21.860 --> 00:11:24.080
before moving into a
management position.

282
00:11:24.080 --> 00:11:26.452
I think that's the
story in a nutshell.

283
00:11:26.452 --> 00:11:27.660


284
00:11:27.660 --> 00:11:29.790
SHERVIN KHODABANDEH:
So Doug, from airplanes

285
00:11:29.790 --> 00:11:33.080
to financial markets,
it seems like all

286
00:11:33.080 --> 00:11:37.250
of the examples you gave
are where the stakes are

287
00:11:37.250 --> 00:11:38.000
quite high, right?

288
00:11:38.000 --> 00:11:38.792
DOUG HAMILTON: Yes.

289
00:11:38.792 --> 00:11:41.800
SHERVIN KHODABANDEH: I mean, the
cost of being wrong or an error

290
00:11:41.800 --> 00:11:45.730
or a failure -- maybe not
a catastrophic failure,

291
00:11:45.730 --> 00:11:49.350
but even that, I mean -- any
kind of error is quite high.

292
00:11:49.350 --> 00:11:52.590
So how do you manage
that in the projects

293
00:11:52.590 --> 00:11:55.200
and in the formulization
of the projects?

294
00:11:55.200 --> 00:11:55.700


295
00:11:55.700 --> 00:11:57.617
DOUG HAMILTON: I'm really
glad you asked that,

296
00:11:57.617 --> 00:11:59.500
because this is my
opportunity to talk smack

297
00:11:59.500 --> 00:12:01.670
about academic AI
a little while,

298
00:12:01.670 --> 00:12:03.310
so I'm going to
start off doing that.

299
00:12:03.310 --> 00:12:03.505


300
00:12:03.505 --> 00:12:04.630
SAM RANSBOTHAM: Be careful.

301
00:12:04.630 --> 00:12:06.047
There's a professor here, so --

302
00:12:06.047 --> 00:12:07.380
SHERVIN KHODABANDEH: Keep going.

303
00:12:07.380 --> 00:12:08.500
Sam would love that.

304
00:12:08.500 --> 00:12:09.150
Keep going.

305
00:12:09.150 --> 00:12:11.460
DOUG HAMILTON:
Really, I think it all

306
00:12:11.460 --> 00:12:14.340
starts with being more concerned
about your error rather

307
00:12:14.340 --> 00:12:15.600
than your accuracy.

308
00:12:15.600 --> 00:12:18.130
One of the things I've been
really disappointed about

309
00:12:18.130 --> 00:12:21.840
in academic AI over the last
couple of years is that --

310
00:12:21.840 --> 00:12:25.280
really, it's related to this AI
ethics talk that we have these

311
00:12:25.280 --> 00:12:29.100
days, where people were shocked
to find out that when you build

312
00:12:29.100 --> 00:12:32.630
a model to, let's say,
classify some things,

313
00:12:32.630 --> 00:12:35.650
and you look at some minority
cohort within the data,

314
00:12:35.650 --> 00:12:38.300
that the model doesn't
classify that all that well.

315
00:12:38.300 --> 00:12:41.067
And it's like, "Yeah" --
because that's oftentimes,

316
00:12:41.067 --> 00:12:43.150
if you're not careful about
it, what models learn.

317
00:12:43.150 --> 00:12:46.690
And you're absolutely right;
the stakes here are quite high,

318
00:12:46.690 --> 00:12:50.640
so what we want to be very
conscious of is not just trying

319
00:12:50.640 --> 00:12:53.140
to get the high score -- which,
when I read a lot of papers,

320
00:12:53.140 --> 00:12:55.650
it seems like we're in
high-score land rather than

321
00:12:55.650 --> 00:12:56.930
in utility land.

322
00:12:56.930 --> 00:12:58.930
Even when I talk to many
entry-level candidates,

323
00:12:58.930 --> 00:13:01.610
a lot of them talk about trying
to get the high score through

324
00:13:01.610 --> 00:13:05.073
juicing the data rather than
being really careful about how

325
00:13:05.073 --> 00:13:06.740
they think about the
modeling process --

326
00:13:06.740 --> 00:13:09.290
so they're very focused on the
score: "What's the accuracy?

327
00:13:09.290 --> 00:13:10.153
What's the accuracy?

328
00:13:10.153 --> 00:13:11.570
How do we get the
accuracy higher?

329
00:13:11.570 --> 00:13:13.153
Let's get rid of the
outliers; that'll

330
00:13:13.153 --> 00:13:14.242
make the accuracy higher."

331
00:13:14.242 --> 00:13:15.700
Well, it turns out
the outliers are

332
00:13:15.700 --> 00:13:17.130
the only thing that matters.

333
00:13:17.130 --> 00:13:19.510
So, what we are very
concerned about,

334
00:13:19.510 --> 00:13:21.880
of course, is making sure
our accuracy is very high,

335
00:13:21.880 --> 00:13:24.450
making sure our square scores,
whatever, are very high;

336
00:13:24.450 --> 00:13:26.110
making sure that
the metrics that

337
00:13:26.110 --> 00:13:28.610
are associated with business
value are incredibly high.

338
00:13:28.610 --> 00:13:31.760
However, in order to make sure
we're hedging our risks, what

339
00:13:31.760 --> 00:13:33.850
is as important, if
not more important,

340
00:13:33.850 --> 00:13:36.390
is being keenly aware of the
distribution of the error

341
00:13:36.390 --> 00:13:38.150
associated with your model.

342
00:13:38.150 --> 00:13:40.950
No matter what project
we're working on, whether

343
00:13:40.950 --> 00:13:44.220
it's in our index space, whether
it's in our corporate services

344
00:13:44.220 --> 00:13:47.150
space, whether it's in
productivity and automation,

345
00:13:47.150 --> 00:13:51.240
or if it's in new capabilities,
we want to make sure that

346
00:13:51.240 --> 00:13:54.550
our error is distributed
very uniformly,

347
00:13:54.550 --> 00:13:56.670
or at least
reasonably uniformly,

348
00:13:56.670 --> 00:13:59.220
across all the constituent
groups that we might be

349
00:13:59.220 --> 00:14:01.980
unleashing this model on --
making sure that if there are

350
00:14:01.980 --> 00:14:03.480
areas where it
doesn't perform well,

351
00:14:03.480 --> 00:14:06.700
we have a good understanding
of the calibrated interval

352
00:14:06.700 --> 00:14:09.840
of our models and systems, so
that when we're outside of that

353
00:14:09.840 --> 00:14:12.630
calibrated interval,
frankly, at the very least,

354
00:14:12.630 --> 00:14:14.518
we can give somebody
a warning to let

355
00:14:14.518 --> 00:14:16.310
them know that they're
in the Wild West now

356
00:14:16.310 --> 00:14:18.070
and they should do
this at their own risk.

357
00:14:18.070 --> 00:14:19.850
And maybe it's a
little caveat emptor

358
00:14:19.850 --> 00:14:22.398
at that point, but
at least you know.

359
00:14:22.398 --> 00:14:24.690
Really, I think those are
the two most important things

360
00:14:24.690 --> 00:14:28.180
to help manage those risks:
being eminently concerned about

361
00:14:28.180 --> 00:14:30.710
the distribution of your
error, and being really,

362
00:14:30.710 --> 00:14:33.350
really well aware about where
your model works and where it

363
00:14:33.350 --> 00:14:34.110
doesn't.

364
00:14:34.110 --> 00:14:37.330
There's a number of other things
that everybody does these days

365
00:14:37.330 --> 00:14:40.090
around [personally identifiable
information] protection

366
00:14:40.090 --> 00:14:44.000
and making sure that there's a
robust review process involved.

367
00:14:44.000 --> 00:14:46.300
More recently, we've
been able to make sure

368
00:14:46.300 --> 00:14:48.050
that every single
project we're working on

369
00:14:48.050 --> 00:14:49.650
has at least one
other person on it,

370
00:14:49.650 --> 00:14:52.270
so that two people have to
agree that this is the best

371
00:14:52.270 --> 00:14:54.520
path forward and that these
are the right numbers that

372
00:14:54.520 --> 00:14:55.145
are coming out.

373
00:14:55.145 --> 00:14:58.310
SHERVIN KHODABANDEH: So
you gave a very good series

374
00:14:58.310 --> 00:15:01.860
of examples about
algorithmically and technically

375
00:15:01.860 --> 00:15:06.670
and mindset-wise some
of the steps that folks

376
00:15:06.670 --> 00:15:09.410
need to take to manage
and understand the errors

377
00:15:09.410 --> 00:15:12.600
and be ahead of them rather
than being surprised by them.

378
00:15:12.600 --> 00:15:16.140
I mean, on one hand ... so
you have to have an eye toward

379
00:15:16.140 --> 00:15:19.700
the riskiness of it and
how that could be managed.

380
00:15:19.700 --> 00:15:21.910
And on the other
hand, you talked

381
00:15:21.910 --> 00:15:24.230
about being the
center of excellence

382
00:15:24.230 --> 00:15:27.140
and the place
within Nasdaq where

383
00:15:27.140 --> 00:15:32.150
the state of the art in
this space is being defined.

384
00:15:32.150 --> 00:15:35.920
How do you balance
the need to watch out

385
00:15:35.920 --> 00:15:39.040
for all those pitfalls and
errors and conservatism,

386
00:15:39.040 --> 00:15:42.030
with pushing the art forward?

387
00:15:42.030 --> 00:15:44.838
In terms of a
managerial orientation,

388
00:15:44.838 --> 00:15:45.630
how do you do that?

389
00:15:45.630 --> 00:15:49.500
DOUG HAMILTON: I think preaching
that conservatism internally

390
00:15:49.500 --> 00:15:50.620
to your own team.

391
00:15:50.620 --> 00:15:53.168
When I first started, I had
this great manager at Boeing.

392
00:15:53.168 --> 00:15:55.210
On the one hand, when she
was reviewing our work,

393
00:15:55.210 --> 00:15:57.770
it was always very, very
critical of what we were doing

394
00:15:57.770 --> 00:16:01.040
-- very careful about making
sure we're being very careful

395
00:16:01.040 --> 00:16:01.830
and cautious.

396
00:16:01.830 --> 00:16:04.840
And then, as soon as we went to
a business partner or a client,

397
00:16:04.840 --> 00:16:06.440
"Oh, this is the
greatest thing ever.

398
00:16:06.440 --> 00:16:08.190
You're not going to believe it."

399
00:16:08.190 --> 00:16:11.290
And I think that's a very
important part of this;

400
00:16:11.290 --> 00:16:13.720
those two angles of
internal conservatism

401
00:16:13.720 --> 00:16:17.450
and external optimism are really
very necessary to making sure

402
00:16:17.450 --> 00:16:21.790
that you don't just build
high-performing, risk-averse AI

403
00:16:21.790 --> 00:16:25.630
systems, but also that you see
rapid and robust maturation

404
00:16:25.630 --> 00:16:27.174
and adoption of the technology.

405
00:16:27.174 --> 00:16:27.980


406
00:16:27.980 --> 00:16:29.190
SAM RANSBOTHAM: Well, it
ties back to your talking

407
00:16:29.190 --> 00:16:30.870
about understanding
the error distribution.

408
00:16:30.870 --> 00:16:32.328
You can't really
get a hold of that

409
00:16:32.328 --> 00:16:35.658
unless you do understand
that error distribution well.

410
00:16:35.658 --> 00:16:37.700
Shervin and I have been
talking recently about --

411
00:16:37.700 --> 00:16:40.283
it's come up a few times; he'll
remember better than I have --

412
00:16:40.283 --> 00:16:43.320
about just this whole
idea of noninferiority.

413
00:16:43.320 --> 00:16:46.920
That the goal of perfection
is just unattainable,

414
00:16:46.920 --> 00:16:50.030
and if we set that out for
any of these AI systems,

415
00:16:50.030 --> 00:16:52.010
then we're never going
to adopt any of them.

416
00:16:52.010 --> 00:16:53.840
And the question
is, it's like you

417
00:16:53.840 --> 00:16:56.650
say, it's a balancing
thing of "How much off

418
00:16:56.650 --> 00:16:58.930
of that perfection
do we accept?"

419
00:16:58.930 --> 00:17:01.390
We certainly want
improvements over humans,

420
00:17:01.390 --> 00:17:03.960
but we also want improvements
over humans eventually.

421
00:17:03.960 --> 00:17:05.910
It doesn't have to be
improvement right out

422
00:17:05.910 --> 00:17:07.493
of the gate, if you
think that there's

423
00:17:07.493 --> 00:17:08.550
some potential for that.

424
00:17:08.550 --> 00:17:11.220
SHERVIN KHODABANDEH: Let me
use that as a segue to ask

425
00:17:11.220 --> 00:17:12.560
my next question.

426
00:17:12.560 --> 00:17:16.970
So you've been in the AI
business for some time.

427
00:17:16.970 --> 00:17:19.410
How do you think
the state of the art

428
00:17:19.410 --> 00:17:23.640
is evolving, or has evolved, or
is going to evolve in the years

429
00:17:23.640 --> 00:17:24.420
to come?

430
00:17:24.420 --> 00:17:27.609
Obviously, technically it has
been [evolving], and it will.

431
00:17:27.609 --> 00:17:31.290
But I'm more interested in
[the] nontechnical aspects

432
00:17:31.290 --> 00:17:32.150
of that evolution.

433
00:17:32.150 --> 00:17:32.983
How do you see that?

434
00:17:32.983 --> 00:17:35.300
DOUG HAMILTON: When
I first got started,

435
00:17:35.300 --> 00:17:37.660
the big papers that came
out were probably [on]

436
00:17:37.660 --> 00:17:39.390
the [generative adversarial
network] and [residual neural

437
00:17:39.390 --> 00:17:41.598
network]; both came out
actually about the same time.

438
00:17:41.598 --> 00:17:44.550
[In a ] lot of ways, to me
that represented the pinnacle

439
00:17:44.550 --> 00:17:46.393
of technical achievement in AI.

440
00:17:46.393 --> 00:17:48.060
Obviously, there's
been more since then,

441
00:17:48.060 --> 00:17:49.910
obviously we've done a lot,
obviously a lot of things

442
00:17:49.910 --> 00:17:50.618
have been solved.

443
00:17:50.618 --> 00:17:53.580
But at that point, we
figured a lot of things out.

444
00:17:53.580 --> 00:17:58.550
And it opened the door to a lot
of really good AI and machine

445
00:17:58.550 --> 00:17:59.810
learning solutions.

446
00:17:59.810 --> 00:18:02.280
When I look at the way the
technology has progressed

447
00:18:02.280 --> 00:18:06.220
since then, I see it
as a maturing ecosystem

448
00:18:06.220 --> 00:18:08.460
that enables business use.

449
00:18:08.460 --> 00:18:10.960
So whether this is things
like transfer learning,

450
00:18:10.960 --> 00:18:13.280
to make sure that when
we solve one problem,

451
00:18:13.280 --> 00:18:15.320
we can solve another
problem, which

452
00:18:15.320 --> 00:18:18.310
is incredibly important for
achieving economies of scale

453
00:18:18.310 --> 00:18:23.280
with AI groups, or
it's things like AutoML

454
00:18:23.280 --> 00:18:27.530
that help to make everybody
at least this kind of idea

455
00:18:27.530 --> 00:18:30.720
of a citizen data scientist,
where software engineers

456
00:18:30.720 --> 00:18:33.420
and analysts can do enough
machine learning research

457
00:18:33.420 --> 00:18:36.110
or machine learning work that
they can prove something out

458
00:18:36.110 --> 00:18:37.800
before they bring it
to a team like ours

459
00:18:37.800 --> 00:18:40.030
or their software
engineering team.

460
00:18:40.030 --> 00:18:43.740
I think these are the sorts
of maturing technologies

461
00:18:43.740 --> 00:18:47.730
that we've seen come along that
make machine learning much more

462
00:18:47.730 --> 00:18:51.010
usable in business cases.

463
00:18:51.010 --> 00:18:54.490
I think beyond that,
historically what

464
00:18:54.490 --> 00:18:56.950
we've seen is the
traditional business

465
00:18:56.950 --> 00:19:00.930
case for artificial intelligence
have been all-scale plays.

466
00:19:00.930 --> 00:19:03.180
I think these
maturing technologies

467
00:19:03.180 --> 00:19:04.870
are these technologies
that are allowing

468
00:19:04.870 --> 00:19:07.830
us to mature models, reuse them,
and achieve economies of scale

469
00:19:07.830 --> 00:19:09.950
around the AI development cycle.

470
00:19:09.950 --> 00:19:11.580
As these get better
and better, we're

471
00:19:11.580 --> 00:19:13.420
going to see more
use cases open up

472
00:19:13.420 --> 00:19:15.590
for "Computers are good at it."

473
00:19:15.590 --> 00:19:17.450
And we've certainly
seen it when we

474
00:19:17.450 --> 00:19:21.130
look at how hedge funds and
high-frequency traders operate.

475
00:19:21.130 --> 00:19:23.380
They're all using machine
learning all over the place,

476
00:19:23.380 --> 00:19:26.980
because it's better for research
purposes than ad hoc trial

477
00:19:26.980 --> 00:19:28.980
and error and ad hoc rules.

478
00:19:28.980 --> 00:19:32.390
By the same token, we've seen
it in game-playing machines

479
00:19:32.390 --> 00:19:33.240
for years.

480
00:19:33.240 --> 00:19:36.570
So the idea that we'll have more
and more of these situations

481
00:19:36.570 --> 00:19:38.320
where [the] computer
is just better at it,

482
00:19:38.320 --> 00:19:40.010
I think we're going to
see that more and more.

483
00:19:40.010 --> 00:19:41.800
Certainly, this is,
I think, the thesis

484
00:19:41.800 --> 00:19:43.600
behind self-driving cars, right?

485
00:19:43.600 --> 00:19:45.230
Driving is the
thing that people do

486
00:19:45.230 --> 00:19:48.650
worst, that we do most often,
and, provided that you can work

487
00:19:48.650 --> 00:19:50.595
out the edge cases,
which is really hard,

488
00:19:50.595 --> 00:19:52.720
there's no reason why
computers shouldn't be better

489
00:19:52.720 --> 00:19:54.440
at driving than people are.

490
00:19:54.440 --> 00:19:55.085


491
00:19:55.085 --> 00:19:56.960
SHERVIN KHODABANDEH: I
was going to ask, what

492
00:19:56.960 --> 00:20:03.120
about those problems where
computers alone or humans alone

493
00:20:03.120 --> 00:20:05.640
can't be as good,
but the two of them

494
00:20:05.640 --> 00:20:08.160
together are far better than
each of them on their own?

495
00:20:08.160 --> 00:20:12.120
DOUG HAMILTON: When there
is a computer-aided process

496
00:20:12.120 --> 00:20:16.070
or an AI-aided process, we can
then usually break that down

497
00:20:16.070 --> 00:20:18.490
into two things -- at
least two processes.

498
00:20:18.490 --> 00:20:20.970
One is a process that the
person is good at doing,

499
00:20:20.970 --> 00:20:23.370
and the other is a thing
that the computer is doing.

500
00:20:23.370 --> 00:20:25.680
But if you can imagine
computer-aided design,

501
00:20:25.680 --> 00:20:27.670
there's many things
that a computer

502
00:20:27.670 --> 00:20:29.900
is good at in
computer-aided design

503
00:20:29.900 --> 00:20:31.590
that it is helping
the person with.

504
00:20:31.590 --> 00:20:34.410
One of them is not coming
up with creative solutions

505
00:20:34.410 --> 00:20:36.860
and creative ways to draw
out the part that they're

506
00:20:36.860 --> 00:20:39.070
trying to design, but
it's very good at things

507
00:20:39.070 --> 00:20:41.430
like keeping track of
which pixels are populated

508
00:20:41.430 --> 00:20:44.992
and which aren't, the 3D
spatial geometry of it, etc.

509
00:20:44.992 --> 00:20:46.700
And that's what it's
good at -- and then,

510
00:20:46.700 --> 00:20:49.680
the actual creative part is
what the person's good at.

511
00:20:49.680 --> 00:20:54.090
Maybe a person is not so good
at generating new and novel

512
00:20:54.090 --> 00:20:56.770
designs for, let's
say, furniture.

513
00:20:56.770 --> 00:21:00.350
Maybe you're Ikea and you
want to design new furniture.

514
00:21:00.350 --> 00:21:01.830
So maybe people
aren't particularly

515
00:21:01.830 --> 00:21:03.660
good at generating these
things out of the blue,

516
00:21:03.660 --> 00:21:05.430
but they're pretty good at
looking at it and saying,

517
00:21:05.430 --> 00:21:06.670
"Well, hang on a second.

518
00:21:06.670 --> 00:21:08.340
If you design the
chair that way,

519
00:21:08.340 --> 00:21:09.970
it's got a giant
spike in the back,

520
00:21:09.970 --> 00:21:11.637
and it's going to be
very uncomfortable,

521
00:21:11.637 --> 00:21:14.610
so let's get rid of that,
and then let's try again."

522
00:21:14.610 --> 00:21:17.990
So there's this process
of generating and fixing,

523
00:21:17.990 --> 00:21:22.100
or generating and editing,
that we can break it down to.

524
00:21:22.100 --> 00:21:24.130
And the computer might
be better at generating

525
00:21:24.130 --> 00:21:25.710
and the person is
better at editing

526
00:21:25.710 --> 00:21:29.090
for these real-world or these
latent requirements that

527
00:21:29.090 --> 00:21:30.392
are very difficult to encode.

528
00:21:30.392 --> 00:21:31.077


529
00:21:31.077 --> 00:21:32.160
SAM RANSBOTHAM: All right.

530
00:21:32.160 --> 00:21:34.350
Well, thanks for taking
the time to talk with us

531
00:21:34.350 --> 00:21:38.010
and to learn about all that
you, and in particular Nasdaq,

532
00:21:38.010 --> 00:21:38.570
are doing.

533
00:21:38.570 --> 00:21:41.750
We've heard about, for
example, project selection,

534
00:21:41.750 --> 00:21:44.550
balancing risk, and how
you pick those projects.

535
00:21:44.550 --> 00:21:48.410
We learned about how important
understanding error is

536
00:21:48.410 --> 00:21:50.950
and all the different
possible cases that you see

537
00:21:50.950 --> 00:21:52.320
for artificial intelligence.

538
00:21:52.320 --> 00:21:56.015
It's a pretty healthy bit to
cover in just one session.

539
00:21:56.015 --> 00:21:57.890
We appreciate your input
on all those topics.

540
00:21:57.890 --> 00:21:58.075


541
00:21:58.075 --> 00:21:59.200
DOUG HAMILTON: Thanks, Sam.

542
00:21:59.200 --> 00:22:00.205
Thanks, Shervin.

543
00:22:00.205 --> 00:22:01.830
It's been a pleasure
speaking with you.

544
00:22:01.830 --> 00:22:04.220
SAM RANSBOTHAM: Please
join us next time.

545
00:22:04.220 --> 00:22:07.530
We'll talk with Paula Goldman,
chief ethical and humane use

546
00:22:07.530 --> 00:22:09.180
officer at Salesforce.

547
00:22:09.180 --> 00:22:12.510
ALLISON RYDER:
Thanks for listening

548
00:22:12.510 --> 00:22:14.030
to Me, Myself, and AI.

549
00:22:14.030 --> 00:22:16.460
We believe, like you,
that the conversation

550
00:22:16.460 --> 00:22:18.680
about AI implementation
doesn't start and stop

551
00:22:18.680 --> 00:22:19.860
with this podcast.

552
00:22:19.860 --> 00:22:22.350
That's why we've created a
group on LinkedIn, specifically

553
00:22:22.350 --> 00:22:23.470
for leaders like you.

554
00:22:23.470 --> 00:22:26.220
It's called AI for Leaders,
and if you join us,

555
00:22:26.220 --> 00:22:28.240
you can chat with show
creators and hosts,

556
00:22:28.240 --> 00:22:31.870
ask your own questions, share
insights, and gain access

557
00:22:31.870 --> 00:22:34.350
to valuable resources
about AI implementation

558
00:22:34.350 --> 00:22:36.440
from MIT SMR and BCG.

559
00:22:36.440 --> 00:22:41.510
You can access it by visiting
mitsmr.com/AIforLeaders.

560
00:22:41.510 --> 00:22:44.280
We'll put that link
in the show notes,

561
00:22:44.280 --> 00:22:46.720
and we hope to see you there.

562
00:22:46.720 --> 00:22:52.000