why supervised fine-tuning is infeasible

We hoped that fine-tuning is best-suited for a niche purpose. However I don’t see it as the future of adapting generative AI in industry. Here’s why:

1. Dependency on high-quality annotation. Cliche, but it is hard and expensive to get. The process is not always controllable, and often very time consuming (e.g. the annotation team may mis-interpret the annotation guideline)

2. The foundational capability of LLMs are only getting better. With appropriate prompts, it could solve a large chunk of problems.

3. Fine-tuning can make a model worse. If the training data contains misleading / bad signals, it could actually mislead the models and “erase” some alignment / intelligence originally possessed by the foundational model. A version of “garbage in, garbage out”.

4. Human cost is much higher compared to the computational cost. With tools and techniques (e.g. LoRA, llama-factory), the computational cost of fine-tuning is actually low; however, the human time involved in updating the annotation guideline, debating the wordings and standards, etc are the larger cost.

5. The marginal cost of obtaining new annotations is just much higher than updating the prompts. Fixes an imperfection in annotation guideline could take weeks to turn around. Fixes on prompts, by contrast, could be much lower.

6. It is much easier to manage a library of prompts compared to a library of fine-tuned models.

where zichang writes

reading, opinions, experience

why supervised fine-tuning is infeasible