Big tech company fine-tunes generative AI with 235 domain experts
How to differentiate a GenAI open-source LLM: have it fine-tuned by data annotators who are qualified experts in their field
Our client wanted to fine-tune its GenAI open-source large language model (LLM) to increase its accuracy, safety and robustness. Realizing those goals would be hard to achieve with a conventional crowdsourcing approach to data annotation, the company reached out to RWS who leveraged its TrainAI team to quickly recruit, train and manage a scalable team of qualified subject-matter experts as data annotators to complete the work.
TrainAI by RWS follows the principles of responsible AI to deliver dependable LLM training and fine-tuning data that’s ethically sourced, fair, accurate and reliable, transparent and explainable, private and secure.
Challenges
- Maximize LLM accuracy by training it on specific topic areas
- Improve safety and security by mitigating the risk of generating hallucinations or harmful content
- Achieve a standard that makes the LLM a resource for professionals
Solution
- TrainAI from RWS
- Generative AI data services
- Domain expertise: recruiting, training and managing subject-matter experts as data annotators
- Content creation: prompt engineering
- Model fine-tuning: prompt-response QA, fact extraction and verification
- Risk mitigation: red teaming
Results
- 4-week project ramp-up
- 235 domain experts recruited as part-time RWS employees
- 32,000 hours of work done in the first 3 months
- Supported training and rollout of the client's latest LLM version