The Great AI Debate: Which Models Craft the Best Arguments?

This study contrasts some of the leading AI models in a debate about logic and persuasion. The task was simple: convincingly argue why prompt engineering is not a job, but an essential skill for anyone in the AI field. The arena featured widely used models, including Windows Copilot, Google Gemini and different versions of ChatGPT.

ai debate models comparison

15 October 2024 4-minute read

The Challenge

Each AI model was given the same prompt. This challenge was designed to push each AI to its rhetorical limits, leveraging a range of persuasive techniques and demonstrating their ability to engage with complex, nuanced reasoning.

Prompt
Write the most convincing argument that being a prompt engineer is not a real job but a necessary skill for anyone working with AI technology.

Evaluation and refinement of this argumentative prompt »

The Arena

The models' responses were evaluated across thirteen meticulously chosen criteria that tested their argument structure, persuasive techniques, writing quality, and overall effectiveness. These criteria were designed to dissect how each model constructed its argument, the logical coherence of its points, and its ability to persuade a knowledgeable audience.

The Results

Overall Results

  • Top performers: ChatGPT 4o-mini and ChatGPT o1-preview were the standout performers, each with almost the maximum score. Their responses were noted for their clarity, logical structure, and compelling persuasive techniques that resonated strongly with an audience well-versed in AI technologies. They exemplified the best of AI's ability to combine logical reasoning with emotional appeal to deliver powerful arguments.
  • Honourable mentions: Not far behind were Meta Llama 3.1 and Anthropic Claude 3.5. These models showed robust argumentative structures and were effective in their logical flow, yet they lacked the creative flair needed to truly captivate and persuade the audience.
  • Creative but less focused: Windows Copilot, despite a lower overall score, was praised for its creative use of analogies, which made its arguments engaging and relatable. However, this creativity sometimes came at the cost of clarity and focus, affecting the overall strength of its argumentation.

Key Insights

  • Balancing act: The best-performing models demonstrated a keen ability to balance logical and emotional appeals, crafting arguments that were not only sound but also engaging. This balance is crucial in making a compelling case to a professional audience.
  • Creativity vs. coherence: The study highlighted a trade-off between creativity and coherence. While creative approaches like those employed by Windows Copilot can make arguments more engaging, they must not sacrifice the logical structure and clarity needed in professional discourse.
  • Evolving AI capabilities: The discourse also highlighted the rapidly evolving capabilities of AI, suggesting that as AI becomes more intuitive and integrated, the role of specialised prompt engineers might diminish, making the skill an integral part of broader AI proficiency.

Individual Results

The individual results and evaluations, from highest to lowest score (Table 1), can be found here:

  1. ChatGPT 4o-mini
  2. ChatGPT o1-preview
  3. Meta Llama 3.1
  4. Anthropic Claude 3.5
  5. Windows Copilot
  6. Google Gemini
  7. ChatGPT 4o with canvas
Table 1: Detailed results of the evaluation criteria (maximum score = 5)
Criteria ChatGPT 4o Mini ChatGPT o1-preview Meta Llama 3.1 Anthropic Claude 3.5 Windows Copilot Google Gemini ChatGPT 4o with Canvas
Clarity of thesis 5 5 4 4 4 4 3
Logical coherence 5 5 4 4 3 4 3
Relevance 5 5 5 4 3 4 3
Evidence and support 5 5 4 4 3 3 3
Counterarguments 5 5 4 4 2 3 3
Use of rhetorical devices 5 5 4 4 4 3 3
Audience awareness 5 5 5 4 3 4 4
Persuasiveness 5 5 4 4 3 3 3
Language and tone 5 5 4 4 3 4 3
Organisation 5 5 4 4 3 4 3
Creativity and originality 4 5 4 4 5 3 4
Conclusion 5 5 4 4 3 3 3
Brevity and focus 5 5 4 4 3 4 3

Conclusion

This comparative analysis not only sheds light on the current capabilities of AI models in constructing sophisticated arguments but also hints at the potential future roles AI might play in professional settings. As AI continues to advance, its ability to understand and engage with complex argumentative structures will be crucial in fields that rely heavily on effective communication and persuasion. The possibilities are vast and varied, from training professionals in argumentation to helping with policy-making.

Our version on this thesis includes two different styles: an analytical article and a narrative version.

On-the-Job AI Coaching »