Benchmarking AI chatbots: Assessing their accuracy in identifying hijacked medical journals
Publication Name: Diagnosis
Publication Date: 2025-01-01
Volume: Unknown
Issue: Unknown
Page Range: Unknown
Description:
The challenges posed by questionable journals to academia are very real, and being able to detect hijacked journals would be valuable to the research community. Using an artificial intelligence (AI) chatbot may be a promising approach to early detection. The purpose of this research is to analyze and benchmark the performance of different AI chatbots in identifying hijacked medical journals. This study utilized a dataset comprising 21 previously identified hijacked journals and 10 newly detected hijacked journals, alongside their respective legitimate versions. ChatGPT, Gemini, Copilot, DeepSeek, Qwen, Perplexity, and Claude were selected for benchmarking. Three question types were developed to assess AI chatbots' performance in providing information about hijacked journals, identifying hijacked websites, and verifying legitimate ones. The results show that current AI chatbots can provide general information about hijacked journals, but cannot reliably identify either real or hijacked journal titles. While Copilot performed better than others, it was not error-free. Current AI chatbots are not yet reliable for detecting hijacked journals and may inadvertently promote them.
Open Access: Yes
DOI: 10.1515/dx-2025-0043