Structuring legal data faster: a multilingual data annotation success story
TrainAI validates and scales a complex workflow for a digital contract management technology provider
A global provider of cloud-based electronic contract workflow solutions needed to improve how it arranged data within its workflows across legal documents. The initiative represented a sophisticated AI contract analysis project designed to develop an automated model capable of reviewing legal documents and extracting 10-15 critical data points.
The goal was to replicate capabilities already achieved in English across additional languages. This would enable the organization’s software to autonomously identify contract dates, participating legal entities and specific liability clauses.
The organization partnered with TrainAI by RWS to validate this new workflow and expand production at scale. Here’s how our team delivered above and beyond expectations, setting new operational standards for the client.
800+ legal contracts annotated in French and German
3-4 days average turnaround time per batch
90%minimum quality threshold maintained
800+ legal contracts annotated in French and German
3-4 days average turnaround time per batch
90%minimum quality threshold maintained
Key benefits
Established a scalable model for multilingual legal data annotation
Overcame significant technical limitations through custom tracking solutions
Identified and resolved critical gaps in client guidelines
Delivered high-volume batches rapidly without sacrificing accuracy
Provided a flexible workforce that adapted to changing requirements
The client, a global electronic contract workflow provider, required a more effective way to structure critical metadata within legal documents. This demanded smarter search, improved analytics and more efficient digital operations. It also called for accurate data annotation of key entities across multilingual content.
These entities included contracting parties, contract dates and document classifications as the company needed accurate annotations of the most critical data points in each contract submitted to its solution. All of these variables needed to be accurate and consistently represented across each digital document.
The organization partnered with TrainAI to validate a new data annotation workflow and subsequently scale up production.
The client wanted to extract critical metadata, such as contracting parties and contract dates, from legal documents in French and German. Their internal team had successfully handled English documents but lacked the localized setup and know-how to scale into other languages.
Scaling the model to French and German presented significant difficulties that extended beyond simple translation. The challenge required navigating fundamentally different legal systems where contract structures and data placement vary significantly by country.
Unlike typical localization projects, this demanded data annotators with legal experience. They had to be able to interpret complex legal documents and correctly label data for the model.
They required a partner who could manage the complexities of foreign legal texts while navigating a specific, developing toolset.
Building a Foundation
Over a two-month project window, TrainAI mobilized a talented legal data annotation team, established a structured quality assurance process and delivered more than 800 annotated contracts in French and German.
The project began with a focus on validating the annotation workflow before moving to full-scale production. TrainAI assembled a team of experienced annotators and reviewers to handle the specific linguistic nuances of the target languages.
This initial phase was crucial for setting the standards that would guide the rest of the engagement.
To ensure success, the team implemented a rigorous two-week training program in which potential team members completed validation tasks before working on the project. These tasks included studying guidelines, watching instructional recordings, engaging in project briefings with the client and completing test tasks in the client’s preferred tool.
Only data annotators who demonstrated a strong grasp of the requirements were onboarded to the live project. The engagement set a strong foundation for future multilingual expansion and enhanced document intelligence.
Challenges
Train an AI model to complete sophisticated contract analysis in French and German.
The existing data annotation tool lacked API access, making data tracking difficult.
Poor User Experience (UX) and stability issues caused frequent delays.
Source documents often contained Optical Character Recognition (OCR) errors.
Initial guidelines did not account for specific grammar rules in French and German.
Quality reviews from the client side were sometimes delayed.
The client had no administrative view to manage vendor access directly.
Created semi-automated tracking sheets using raw data exports to compensate for tool limitations.
Developed custom scripts to monitor production and quality metrics.
Provided detailed feedback to help the client refine their annotation guidelines.
Implemented a multi-stage internal review process to catch errors early.
Maintained constant communication to resolve "edge cases" quickly.
Results
Delivered over 800 fully annotated contracts.
Achieved consistent delivery times despite tool instability.
Helped the client improve their own process documentation.
Proven scalability for potential future language expansion.
Maintained a strong partnership through transparent collaboration.
Overcoming Technical Hurdles
The project faced significant hurdles related to the client’s chosen data annotation platform. The tool did not support direct PDF annotation, forcing the team to work with raw text generated by OCR.
This text was often riddled with errors, sometimes making documents unintelligible. Additionally, the lack of API access meant managers could not easily track progress or quality data automatically.
Rather than accepting these limitations, the TrainAI team innovated.
Our team built a manual workaround using Google Sheets and created scripts to process raw JSON data exports. This allowed them to track production metrics and flag quality issues programmatically.
By catching obvious errors before they reached the client, the team ensured a higher standard of delivery.
The linguistic complexity of legal documents also presented unique difficulties. For instance, the team discovered discrepancies in how specific articles, like the French "Les," should be treated in different contexts.
These "edge cases" were not covered in the original instructions. The team proactively flagged these issues, leading the client to update their guidelines for better clarity.
Through regular meetings and open feedback loops, the partnership grew stronger. The client was impressed by the team’s ability to navigate technical instability without missing deadlines. Even when the tool timed out or access was blocked, the TrainAI team found ways to keep the work moving.
This resilience turned a potentially chaotic pilot into a coordinated, reliable operation.
Rigorous data annotator training
Proactive quality management
Transparent communication
"Thank you for the great collaboration during the year…and looking forward to working together in the year ahead."
Project Lead
Turning data annotations into operational insight
We ultimately delivered more than 800 annotated contracts, laying a solid groundwork for the client’s document intelligence goals. An internal analysis revealed that quality variations between languages were often due to guideline ambiguity rather than data annotator skill.
This insight was valuable for the client, as it highlighted the need for language-specific instructions on future projects. By solving these problems, TrainAI helped the client mature their own internal processes.
The project also highlighted a long-term strategic need among organizations creating AI-powered products. There is a high demand for niche domain experts across multiple languages to drive future collaborations. Professionals involved in localization work in the past, can deliver immense value by assisting with the training and authentication of LLMs.
Overall, the client expressed great satisfaction with the attention to detail and the thorough approach to AI data quality management. They specifically cited this diligence as a primary reason for expanding the partnership after the initial pilot.
Despite the tight deadlines and technical roadblocks, the project was a resounding success. The work performed provided the client with the clean, structured AI data necessary to power their next generation of smart features.
Contact us
We provide a range of specialized services and advanced technologies to help you take global further.