At ACL 2025, a premier conference for natural language processing (NLP), a paper on AI safety vulnerabilities showcased a rigorous methodology and novel findings.
The paper ranked among the top 8.2% of submissions. It introduced Tempest, a framework that systematically compromises safety boundaries in large language models (LLMs) through natural conversations, achieving a 100% success rate on GPT-3.5-turbo and 97% on GPT-4.
But what made it remarkable was that an AI system conducted the research called Zochi, developed by the company Intology. A preliminary version of this work, previously known as Siege, was accepted at the workshops of The International Conference on Learning Representations.
Intology defines Zochi as an AI research agent capable of autonomously completing the entire scientific process—from literature analysis to peer-reviewed publication. The system operates through a multi-stage pipeline ‘designed to emulate the scientific method.’
So, are we heading towards a Cursor moment for scientific research publishing?
For context, AIM spoke to Raj Palleti, a researcher at Stanford University. Palleti said AI models today serve more as assistants than co-scientists, much like coding tools such as Cursor or GitHub Copilot, rather than end-to-end systems like Devin.
“That being said, people are hard at work pushing the frontiers of AI models in science,” he said.
“One promising avenue is in areas of research that don’t require a physical lab, like AI research,” Palleti noted, pointing towards Intology’s work with Zochi.
Zochi, he suggested, represents a potential “Devin moment” for scientific research. The system processes thousands of papers, identifies promising directions, and uncovers non-obvious connections across disparate work.
This is not an isolated case. Earlier this year, Anthropic’s Claude Opus was credited with contributing substantially to a research paper challenging one of Apple’s studies on reasoning models. The model was said to have done the ‘bulk of the writing’.
In another instance, Japanese AI lab Sakana AI announced the ‘AI Scientist V2’, an end-to-end agentic system capable of generating peer-reviewed papers.
“We evaluated The AI Scientist V2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review,” said Sakana AI in a report.
The system begins with ideas, conducts experiments in phases, explores variations through an agentic search tree, and automatically generates papers. It even uses a vision-language model to review figures and captions.
Still, limits remain. Intology stressed that while Zochi demonstrated fully autonomous research capabilities, the company manually reviewed the results, verified the code, corrected minor errors, and wrote the rebuttal without relying on the system. “The Zochi system will be offered primarily as a research copilot designed to augment human researchers through collaboration,” Intology said.
Palleti echoed the sentiment, noting that AI is most useful for easing repetitive tasks. He said writing papers is time-consuming, with researchers often overloaded with citations and lengthy introductions. “AI can certainly help here,” he said. “With idea generation, I can see AI being useful as a soundboard for researchers to bounce ideas off. As for AI carrying the bulk of the intellectual work, I am more sceptical.”
Ethics and Limitations
Even Sakana AI admits its AI Scientist V2 is best suited for workshops reporting preliminary work, as it does not yet meet the rigour of top-tier conferences. Their internal review revealed hallucinations and a lack of methodological depth in some cases.
A recent Stanford study also found that while LLM-generated research ideas may appear novel at first, their quality drops significantly upon execution compared with human-generated ideas.
Ethics also remain contentious. When Claude Opus contributed to a paper, it was initially credited as an author, but this was later revoked because arXiv policies prohibit listing AI tools as authors. Sakana AI, by contrast, disclosed AI involvement to organisers and even declined an accepted paper to avoid setting premature precedents.
“We emphasise that the community has not yet reached a consensus on integrating AI-generated research into formal scientific publications,” Sakana AI said. “Careful and transparent experimentation is essential at this preliminary stage.”
The post Is It Time for the Vibe Researcher to Rise and Shine? appeared first on Analytics India Magazine.