Detecting AI-Generated Content

GLTR is a collaborative tool developed by the MIT-IBM Watson AI lab, HarvardNLP, and researchers from Harvard University and the Massachusetts Institute of Technology (MIT). It’s designed to detect and highlight text generated by artificial intelligence, primarily focusing on the GPT-2 117M language model by OpenAI. With the proliferation of AI-written content on the internet, tools like GLTR become essential in determining the authenticity of online content.

How GLTR Works:

  • Visual Analysis: Upon input, GLTR examines each word in the text to determine the likelihood that it was produced by an AI language model. Each word is color-coded based on its likelihood:
    • Green: Top 10 predicted words
    • Yellow: Top 100 predicted words
    • Red: Top 1000 predicted words
    • Purple: Words not fitting the above categories
  • Histogram Analysis: GLTR provides three histograms as part of its forensic examination:
    • Distribution of word categories
    • Ratio between the probabilities of the top predicted word and the next word
    • Distribution over the prediction entropies
    • Applications of GLTR:
    • Detect fake reviews, comments, or news articles generated by AI
    • Identify AI-generated content on social media platforms like Facebook and Twitter
  • Assist website owners in ensuring content authenticity to avoid potential SEO penalties from Google.

Advantages of GLTR:

  • Offers direct visual feedback on the likelihood of AI-generated content
  • Uses advanced statistical methods to improve detection rates from 54% to 72%
  • Open-sourced and publicly available for use and further research

Associated Concerns with AI-Generated Content:

  • AI-generated content might be perceived as unoriginal since it is based on existing information
  • Such content can lack depth, emotion, and personal insight
  • While it can save time and cost in content creation, it might contain inaccuracies or false information damaging a brand’s reputation
  • AI tools can sometimes hinder human creativity


 GLTR, leveraging the GPT-2 language model, offers a robust method to detect artificially generated content. It ensures that the unpredictability and nuance of human writing remain discernible amidst the ever-advancing realm of AI content generation. With increasing reliance on digital content for decision-making, tools like GLTR provide an essential check against misinformation and maintain content integrity. Access to GLTR is available through a live demo, its source code is on Github, and more detailed research findings can be found in the ACL 2019 demo track paper.

Leave a Comment