AI wars: the model wars
If you think you've suddenly woken up in the Neuromancer novel, it's just that you are in 2023. The news is: LLMs are getting mainstream.
If you've been keeping up to date, you know that a few companies are fighting against each other in what looks like an AI arms race.
Current winner is OpenAI with GPT-4 and its nice public and famous user interface: chatGPT. OpenAI has partnered with MicroSoft and that company is currently putting GPT-4 in almost all of its products, starting with Bing.
Thanks but no thanks
I would have loved to try out Bing-GPT, but it looks like you need to install MicroSoft's web browser Edge in order to do that. I'm perfectly fine with giving $20 per month to OpenAI to use chatGPT premium. But installing Edge to use Bing-GPT is something I won't lower myself to.
I'm a veteran of the MicroSoft Network vs AOL vs Compuserve vs Internet wars, and I joined the Internet army on day one. We fought this war and we won it. Almost 30 years later, I'm not going to swallow MicroSoft's web browser to access Bing-GPT, because I very well know that as soon as Edge has enough market shares, MicroSoft will start poisoning the web with Edge-only features, and will try to break the web into pieces and ...
Oops, wrongly bitching again
Well, actually, I've just tried again and, when you scroll below the big "unlock the full experience with Edge" message, you actually can use Bing-GPT with any web browser.
Sorry lads, I might have over reacted here.
... well, actually, I've retried, there is no way you can actually chat with Bing-GPT without installing Edge ... ok, MicroSoft using Bing to force you to install Edge, ... you clearly see that Micro$oft hasn't left all of its bad habits.
Just imagine a world where Micro$oft would be the only big corp to have access to the best LLM!! 😱
Back to the models
Yeah, that's right, I was talking about GPT-4 as the best LLM (Large Language Model) around as of March 2023.
But Google is fighting back, with Bard ... not tried yet, but does not look as good.
The thing is, these models have billions (actually, GPT-3 had 170 billion parameters, but GPT-4 is said to have a 100 trillion params) of parameters, and seem to only run on giant computers.
Are LLMs a big-corporation only playing field? Can't anyone hope to run one of these wonderful models on their own computer?
Well, currently, we don't even know if we can, because these expensively trained models are kept private. So we cannot even try.
Here comes LLaMa
LLaMa is FaceBook's Large Language Model, and guess what, it has recently been released for research use. Thank you Mark. That's an unexpected but kind move. So if you are a researcher, or if you happen to have the proper link because some coward leaked the LLaMa model on Bitorrent, you can start to play with it. And actually, LLaMa comes in different sizes (from 7 billion parameters to much more ...) you can run it on any decent computer.
Well, good for the democratisation of LLMs, but it seems like LlaMa is far behind GPT. Crap, what can we do? Is there a way to improve LlaMa? Could we continue his training in any kind of way?
Welcome to Alpaca
That's exactly what a small team at Stanford University did: they took LlaMa and trained it. But LLM training is terribly expensive. GPT-3 cost more than 4 million dollars to train. Can a small Stanford team afford that?
No they cannot. So, instead of feeding LlaMa with millions of documents to train him, they ... used GPT-4 to train LlaMa.
This detail put aside, it only took about $600 of computing power to improve FaceBook's LlaMa model, into a super-llama model called Alpaca.
It seems that Alpaca is better than LlaMa, but still way behind GPT-4. Alpaca is also rumoured to sometimes hallucinate (we say a model hallucinates when it makes things up).
LLMs going mainstream
The question here is whether LLMs will stay the oligopoly of a restricted number of big corporations, or will LLMs be democratised to the point where anybody can run a decent LLM on its own computer.
It seems that the second option is what is happening right now.
Let's try it then
// my-own-private-LLM.sh npm install -g dalai npx dalai alpaca add 7B npx dalai serve
The first line did not work for me, even with accepting the xcodebuilder license agreement. Looks like node-gyp does not manage to get properly installed because some low level python 2.7 version is messing with the install script. This is just exactly what I hate about Python: you always end up not having the right version, even if you actually have the right version, ... but your system is not able to correctly use it.
So there is another way, if you happen to have Docker installed:
// try-with-docker-instead.sh # first clone dalai repository git clone https://github.com/cocktailpeanut/dalai.git # go into the new directory cd dalai # then build the docker image docker compose build # then install alpaca 7B (7 billion parameters Alpaca LLM) docker compose run dalai npx dalai alpaca install 7B # then launch the bloody thing docker compose up -d
Once you've run these commands, you can access dalai's web interface at http://localhost:3000/
It worked for me. The result isn't that impressive:
- the GUI is much more cryptic than chatGPT
- the output is in no way as user friendly
- the response are not that good (base on my one test)
So, if you are used to chatGPT-4, you'll feel like browsing the internet with a 56k modem after being used to fibre. But I only downloaded the alpaca 7B model (while GPT-4 is a 100T model) and it's just a very first step in the direction of "LLMs for the masses".
Here is my test below, in case you are curious:
Obviously, this code won't run, and GPT-3.5 had a much smarter approach to the same problem when I asked him the same question.
Personally, I'm just amazed that this thing exists and is running on my MacBook.
Here is a more detailed presentation of Alpaca:
LLMs are a major breakthrough in AI and for humanity in general. It would be a pain if only a handful of big corps could control that technology, but it looks like any small company, state, person, organisation, will soon have access to decent LLMs and built on top of them.
Of course, bad guys will use them (think fake news campaign, mass hacking, mass manipulation, ...) but good guys might end up using them too (think automated debunk, fake news flagging, hack protection, productivity, automation, AI powered personal assistants, a spam filter that finally works, being able to run your own local mini-GPT, ...).
And if what is missing is human training, I'm pretty sure we'll end up with an army of volunteers from all around the world that will train a wikipedia-like LLM.
Some more links
Just because I don't have the time to write a full new storie about these new things (becaue I want to try out auto-GPT):
- GPT-J: a 2 year old open source model
- Dolly: a model trained on top of GPT-J
- Vicuna: a model trained on top of LLaMA, using conversations from ShareGPT, with GPT-4 as evaluator. Training cost: $300. 90% as accurate as chatGPT-3.5. Online demo of Vicuna.
- FastChat: open software that can train, serve and evaluate LLM based chatbots. So basically, it's a chatbot, but you have to add the LLM for it to work.
- ShareGPT: a website where people share their conversations with chatGPT. Very good to find good prompts, and look at what people talk about with chatGPT. As a side effect, these conversations can be used to train another LLM. It has a shareGPT browser extension so it's very easy to share your conversations with chatGPT on it.