Paul L. Caron
Dean





Wednesday, June 12, 2024

Updated Stanford Report Finds High Hallucination Rates On Westlaw AI

Legal Tech News, Updated Stanford Report Finds High Hallucination Rates on Westlaw AI:

After pushback following Stanford’s research study “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” for what many saw as comparing apples to oranges—between Thomson Reuters’ (TR) Ask Practical Law and LexisNexis’ Lexis+ AI+ —the researchers released an updated study on May 30.

This time, they included results from AI-Assisted Research on Westlaw Precision, TR’s flagship, generative AI-powered legal research solution that is more comparable to Lexis+ AI than Ask Practical Law.

Unexpected to many, the new results were less flattering to TR than the first time around.

Stanford’s researchers found that Westlaw’s AI-Assisted Research tool hallucinated nearly twice as often as Lexis+ AI—with Lexis+ AI hallucinating 17% of the time, and Westlaw hallucinating 33% of the time, according to the paper. Additionally, findings showed that Lexis+ AI provided accurate answers 65% of the time, whereas Westlaw’s AI-Assisted Research provided accurate answers 42% of the time.

Lexis Westllaw Pract Law GPT-4 (2024)

After resistance to the initial study from TR, which said that Ask Practical Law was not the right tool to test when it comes to legal research, the company put out a blog post following the updated paper. The post, written by Mike Dahn, head of Westlaw Product Management, thanked Stanford for the study, and stated that the company is keen to work alongside Stanford’s researchers to further dig into how to create AI benchmarks.

Similar to TR’s comments after the first Stanford report was released, the company still maintains that its internal testing shows lower rates of hallucination in Westlaw’s AI. Dahn writes, “A key lesson learned here is that user experiences in these products could be more explicit about specific limitations of the system,” referring to Stanford’s inclusion of questions that are intended to trick the tools—and acknowledged by the researchers in the study as such.

https://taxprof.typepad.com/taxprof_blog/2024/06/updated-stanford-report-finds-high-hallucination-rates-on-westlaw-ai.html

Legal Ed News, Legal Ed Tech, Legal Education | Permalink