WEBVTT 1 00:00:00.000 --> 00:00:04.960 The explosion of the number of parameters in the large language model, 2 00:00:04.960 --> 00:00:06.560 does this really matter? 3 00:00:06.560 --> 00:00:08.280 Does that make a difference? 4 00:00:08.280 --> 00:00:10.400 Should users care? 5 00:00:15.440 --> 00:00:19.480 Well, today's wisdom says: the more parameters, the better. 6 00:00:19.480 --> 00:00:21.960 But, of course, more parameters 7 00:00:21.960 --> 00:00:25.000 sometimes come at a cost, because in order to train models 8 00:00:25.000 --> 00:00:27.360 with a lot of parameters, you need to have a lot of data. 9 00:00:27.360 --> 00:00:29.960 So, all the advances we talk about today came from OpenAI 10 00:00:29.960 --> 00:00:33.640 training on the very, very large models, on the very large dataset, 11 00:00:33.640 --> 00:00:38.080 GPT-3, GPT-4, GPT-3 having 75 billion parameters, 12 00:00:38.080 --> 00:00:41.080 and we haven't seen such a great performance on the smaller models. 13 00:00:41.080 --> 00:00:42.560 But at the same time, 14 00:00:42.560 --> 00:00:45.480 what we are observing recently is that researchers are capable 15 00:00:45.480 --> 00:00:49.040 of actually building and training much smaller models, 16 00:00:49.040 --> 00:00:54.040 but maybe providing 80% quality of that large model, 17 00:00:54.840 --> 00:00:59.840 and the models of the size several billions are actually pretty good. 18 00:01:00.040 --> 00:01:01.880 There are more interesting developments happening. 19 00:01:01.880 --> 00:01:03.600 There was a paper recently published 20 00:01:03.600 --> 00:01:08.040 saying that all you need are textbooks, where researchers train the model 21 00:01:08.040 --> 00:01:12.560 on several billion parameters, but they train it on a very, very high quality of data. 22 00:01:12.560 --> 00:01:14.720 And those models are also exhibiting 23 00:01:14.720 --> 00:01:18.600 very good properties comparable to large models. 24 00:01:18.600 --> 00:01:20.560 So there is a big question, 25 00:01:20.560 --> 00:01:23.960 the larger model you want to build, the more parameters you need to train, 26 00:01:23.960 --> 00:01:27.000 the more data you need, the more energy it takes. 27 00:01:27.000 --> 00:01:32.000 And so the question is, do you really need such a big model for your problem? 28 00:01:32.000 --> 00:01:35.160 What impact on business decision-makers? 29 00:01:35.160 --> 00:01:38.320 Should there be small or medium models? 30 00:01:38.320 --> 00:01:39.800 This is a very practical question. 31 00:01:39.800 --> 00:01:44.800 Businesses should look into why they really need those models. 32 00:01:45.560 --> 00:01:47.840 Do you really need your model to speak to 200 languages, 33 00:01:47.840 --> 00:01:50.600 or do you want your model to be a chatbot 34 00:01:50.600 --> 00:01:55.000 and be able to work as a customer service representative? 35 00:01:55.000 --> 00:01:59.320 Or do you want your model to help maintenance workers? 36 00:01:59.320 --> 00:02:02.320 Depending on that, you might need a different type of model. 37 00:02:02.320 --> 00:02:04.720 Whether you want the universal model 38 00:02:04.720 --> 00:02:06.680 or the model adjusted for your business. 39 00:02:06.680 --> 00:02:10.400 The model size, does this have an impact on the cost for users? 40 00:02:10.400 --> 00:02:13.400 Well, using these large language models can be quite costly. 41 00:02:13.400 --> 00:02:15.560 So to train them costs a lot. 42 00:02:15.560 --> 00:02:18.680 Plus, if you're using the model trained 43 00:02:18.680 --> 00:02:22.800 by the model provider, they will still charge you for those costs. 44 00:02:22.800 --> 00:02:24.000 What you shouldn't forget 45 00:02:24.000 --> 00:02:28.280 is the cost of inference or calling the model to make prediction. 46 00:02:28.280 --> 00:02:30.560 Every time you want the model to make prediction, 47 00:02:30.560 --> 00:02:33.720 it goes through all those billions of parameters and doing all 48 00:02:33.720 --> 00:02:38.720 that multiplication, which takes a lot of energy and of course, costs you. 49 00:02:38.760 --> 00:02:42.040 And so this cost aggregates very, very, very quickly. 50 00:02:42.040 --> 00:02:47.040 So as a result, you need to really balance your business needs and the size 51 00:02:47.480 --> 00:02:50.680 of the model to make overall to make economic sense.