Comparing Translation Tools

With a variety of translation tools available - from Google Translate to ChatGPT - it's important to understand which tool is best suited for different tasks. Can customising a prompt make a difference, or can standard translation apps effectively translate something like a product description?

translation tool comparison

14 October 2024 6-minute read

Translation Landscape

Translation apps provide fast and user-friendly options for general-purpose translations. On the other hand, AI models using advanced prompts deliver more accurate and context-aware translations, especially for complex content. However, they often require more input and processing time.

AI Task

The goal of this AI task is to translate a product description for a premium electric kettle from English to Dutch, aimed at environmentally conscious Dutch people. Our objective is to compare various translation tools based on their effectiveness in translating product descriptions from English to Dutch.

Methodology

Research Design

We analysed 10 product descriptions for electric kettles and translated them using two common apps (Google Translate and DeepL) and three AI models (ChatGPT 4o, Google Gemini, and Windows Copilot) under two different conditions: a simple one-line prompt and a detailed prompt. The descriptions contained a mix of legal, technical, and promotional content. The prompts used are described in detail in our article on translation prompts.

Evaluation Criteria

We used 16 criteria to evaluate the translation results:

  • Language pair specificity: Ensuring the tools recognise the task as translating from English to Dutch.
  • Contextual understanding: Assessing whether the tools capture the purpose of the description for an eco-conscious audience.
  • Tone and style: Evaluating if the translation reflects the intended promotional and engaging tone.
  • Terminology consistency: Assessing how consistently product-specific terms are translated.
  • Accuracy: Verifying that the meaning and key details are preserved in the translation.
  • Fluency and readability: Ensuring the translation flows naturally in Dutch.
  • Handling of complex sentences: Checking how well the tools manage complex sentences without losing clarity.
  • Cultural adaptation: Assessing whether the translation is appropriately localised for a Dutch audience.
  • Tone and style consistency: Ensuring the tone and style remain uniform throughout the description.
  • Error rate: Identifying grammar, spelling, or syntax errors.
  • Instructions adaptability: Comparing how well the tools handle simple versus detailed prompts.
  • Conciseness: Ensuring the translation is concise while retaining key information.
  • Handling of special characters/HTML: Checking whether special characters and formatting are preserved correctly.
  • Scalability: Testing consistency of quality across multiple product descriptions.
  • Iteration and refinement: Evaluating the ease of refining translations based on feedback.
  • Speed and ease of use: Assessing how character limits affect speed, especially for longer texts that need to be divided into smaller sections.
Table 1: Maximum character limits for each translation method
Translation Method Max Characters per Translation
Google Translate 5000
DeepL (free version) 1500
Google Gemini 5000 (estimated)
Windows Copilot 1500
GPT-4 Approx. 5000 (or more, no strict limit)

Results

Key Findings

  1. GPT-4 (advanced prompt):
    • Contextual understanding and accuracy: Consistently provided the most accurate translations across technical, legal, instructional, and promotional texts. It excelled at managing long and complex sentence structures while preserving clarity, making it the top performer.
    • Cultural adaptation: Adapted well to the Dutch context, especially for promotional content, showing a localised understanding in its translations.
    • Iteration and refinement: Easiest to refine based on feedback, making it ideal for users who need to tweak translations for precision.
  2. Google Translate:
    • Speed: Delivered near-instant translations, even for texts up to 5000 characters. It's the best choice for users prioritising speed over accuracy.
    • Ease of use: With a high character limit and simplicity, Google Translate was the most convenient for general-purpose translations but struggled with complex or technical content.
  3. Google Gemini (advanced prompt):
    • Contextual understanding and terminology consistency: Performed strongly in legal and technical content, handling detailed prompts and providing highly accurate, context-sensitive translations.
    • Cultural adaptation: Adapted well to Dutch cultural expectations, especially in promotional and instructional texts.
  4. DeepL (free):
    • Accuracy in short texts: Delivered highly accurate translations for shorter, simpler texts, particularly for technical descriptions and instructions. However, its character limit (1500) was a drawback for longer texts.
    • Fluency and readability: Demonstrated remarkable fluency in short texts, making it ideal for concise technical or promotional content where clarity is key.
  5. Windows Copilot:
    • Prompt adaptability: Showed significant improvement with advanced prompts. While it struggled with complex content in simple prompt mode, it performed better when detailed prompts were used, particularly regarding consistency and terminology.

Notable Weaknesses

  • Character limit impact: Both DeepL (free) and Windows Copilot were hindered by their low character limits, requiring longer texts to be split into multiple sections, which slowed down the process.
  • Inconsistent results: Google Translate provided inconsistent results across different text complexities. While fast, it lacked the accuracy and fluency needed for technical or legal content.

Results by Criteria

A summarised evaluation can be found in this table with the overall results across all cases, organised by criteria for the eight translation methods.

Basic vs. Detailed Prompts

  • Basic prompt:
    • Strengths: Quick and simple to use, ideal for straightforward translations of short or non-specialised texts.
    • Weaknesses: Lacks the nuance, adaptability, and precision needed for complex or specialised content.
  • Advanced prompt:
    • Strengths: Best for high-quality, accurate translations where tone, context, and detail are critical. Excellent for complex or specialised texts such as legal, technical, or promotional content.
    • Weaknesses: Requires more time and user input, making it slower and more effort-intensive compared to easy prompts.

Overall Summary

  1. Best overall performance: GPT-4 (advanced prompt) consistently performed the best across all criteria, handling complex sentences, providing accurate terminology, and adapting well to tone and cultural nuances. It was also the most scalable and easy to refine, especially in detailed technical and legal texts.
  2. Strong performance for most cases:
    • Google Gemini (advanced prompt) followed closely behind, particularly excelling in technical contexts where accuracy, tone, and cultural adaptation were essential.
    • GPT-4 (easy prompt) offered a good balance between accuracy, speed, and fluency, making it suitable for most moderately complex use cases.
  3. Moderate performance: DeepL (free) performed well in shorter texts and simple technical content, but its character limit hindered performance with longer documents.
  4. Inconsistent performance: Google Translate was fast and user-friendly but struggled in complex cases, particularly with terminology consistency, contextual understanding, and handling long or complex sentences.

Best Tanslation Tools for Specific Needs

  • For speed and simplicity: Google Translate is the quickest and easiest for general, non-complex texts.
  • For high-quality translations: GPT-4 (advanced prompt) and Google Gemini (advanced prompt) excel in legal, technical, and detailed promotional content, though they require more user input.
  • For short, high-accuracy translations: DeepL (free) is excellent for concise technical or instructional content but is limited by its character count.
On-the-Job AI Coaching »