WEBVTT

1
00:00:00.000 --> 00:00:04.960
The explosion of the number of parameters
in the large language model,

2
00:00:04.960 --> 00:00:06.560
does this really matter?

3
00:00:06.560 --> 00:00:08.280
Does that make a difference?

4
00:00:08.280 --> 00:00:10.400
Should users care?

5
00:00:15.440 --> 00:00:19.480
Well, today's wisdom says: the more
parameters, the better.

6
00:00:19.480 --> 00:00:21.960
But, of course, more parameters

7
00:00:21.960 --> 00:00:25.000
sometimes come at a cost,
because in order to train models

8
00:00:25.000 --> 00:00:27.360
with a lot of parameters,
you need to have a lot of data.

9
00:00:27.360 --> 00:00:29.960
So, all the advances we talk
about today came from OpenAI

10
00:00:29.960 --> 00:00:33.640
training on the very, very large models,
on the very large dataset,

11
00:00:33.640 --> 00:00:38.080
GPT-3, GPT-4, GPT-3
having 75 billion parameters,

12
00:00:38.080 --> 00:00:41.080
and we haven't seen such a great
performance on the smaller models.

13
00:00:41.080 --> 00:00:42.560
But at the same time,

14
00:00:42.560 --> 00:00:45.480
what we are observing recently
is that researchers are capable

15
00:00:45.480 --> 00:00:49.040
of actually building
and training much smaller models,

16
00:00:49.040 --> 00:00:54.040
but maybe providing
80% quality of that large model,

17
00:00:54.840 --> 00:00:59.840
and the models of the size several
billions are actually pretty good.

18
00:01:00.040 --> 00:01:01.880
There are more interesting developments happening.

19
00:01:01.880 --> 00:01:03.600
There was a paper recently published

20
00:01:03.600 --> 00:01:08.040
saying that all you need are textbooks,
where researchers train the model

21
00:01:08.040 --> 00:01:12.560
on several billion parameters, but they
train it on a very, very high quality of data.

22
00:01:12.560 --> 00:01:14.720
And those models are also exhibiting

23
00:01:14.720 --> 00:01:18.600
very good properties comparable
to large models.

24
00:01:18.600 --> 00:01:20.560
So there is a big question,

25
00:01:20.560 --> 00:01:23.960
the larger model you want to build,
the more parameters you need to train,

26
00:01:23.960 --> 00:01:27.000
the more data you need,
the more energy it takes.

27
00:01:27.000 --> 00:01:32.000
And so the question is, do you really need such a big model for your problem?

28
00:01:32.000 --> 00:01:35.160
What impact on business decision-makers?

29
00:01:35.160 --> 00:01:38.320
Should there be small
or medium models?

30
00:01:38.320 --> 00:01:39.800
This is a very practical question.

31
00:01:39.800 --> 00:01:44.800
Businesses should look into why they really need those models.

32
00:01:45.560 --> 00:01:47.840
Do you really need your model
to speak to 200 languages,

33
00:01:47.840 --> 00:01:50.600
or do you want your model
to be a chatbot

34
00:01:50.600 --> 00:01:55.000
and be able to work as a customer service representative?

35
00:01:55.000 --> 00:01:59.320
Or do you want your model
to help maintenance workers?

36
00:01:59.320 --> 00:02:02.320
Depending on that, you might need a different type of model.

37
00:02:02.320 --> 00:02:04.720
Whether you want the universal model

38
00:02:04.720 --> 00:02:06.680
or the model adjusted for your business.

39
00:02:06.680 --> 00:02:10.400
The model size, does this have
an impact on the cost for users?

40
00:02:10.400 --> 00:02:13.400
Well, using these large language
models can be quite costly.

41
00:02:13.400 --> 00:02:15.560
So to train them costs a lot.

42
00:02:15.560 --> 00:02:18.680
Plus, if you're using the model trained

43
00:02:18.680 --> 00:02:22.800
by the model provider, they will
still charge you for those costs.

44
00:02:22.800 --> 00:02:24.000
What you shouldn't forget

45
00:02:24.000 --> 00:02:28.280
is the cost of inference or calling
the model to make prediction.

46
00:02:28.280 --> 00:02:30.560
Every time you want the model to make
prediction,

47
00:02:30.560 --> 00:02:33.720
it goes through all those billions
of parameters and doing all

48
00:02:33.720 --> 00:02:38.720
that multiplication, which takes a lot
of energy and of course, costs you.

49
00:02:38.760 --> 00:02:42.040
And so this cost aggregates very,
very, very quickly.

50
00:02:42.040 --> 00:02:47.040
So as a result, you need to really balance
your business needs and the size

51
00:02:47.480 --> 00:02:50.680
of the model to make
overall to make economic sense.