What do they mean by instruction? Is it just regular LLM?

LLM just predicts the next token given the previous tokens(this can be trained without manual labelling by humans).

Instruct GPT and ChatGPT use reinforcement learning from human feedback to align the model with human intents so it understands instructions.

https://huggingface.co/blog/rlhf

Note that Alpaca is NOT using RLHF. It explicitly states it used supervised finetuning.

It says

> We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003

Which leads to self-instruct https://github.com/yizhongw/self-instruct

From a glimpse they used a LM to classify instructions & train the model which IMHO is very similar to RLHF