PHARMACEUTICAL IP

  • Our Offering
    • Fractional in-house
    • Start-ups
    • Established pharma
    • International IP counsel
    • Investors
    • Due diligence
    • Evolve AI
  • Fractional in-house
  • Sectors
    • Pharmaceuticals
    • Biotechnology
    • Biologics
    • Cell & gene therapies
    • AI drug discovery
    • Chemistry
  • Evolve Insights
    • Articles
    • Events & Webinars
    • Subscribe
  • About us
    • Our team
    • Join us
    • Contact us
  • Our Offering
    • Fractional in-house
    • Start-ups
    • Established pharma
    • International IP counsel
    • Investors
    • Due diligence
    • Evolve AI
  • Fractional in-house
  • Sectors
    • Pharmaceuticals
    • Biotechnology
    • Biologics
    • Cell & gene therapies
    • AI drug discovery
    • Chemistry
  • Evolve Insights
    • Articles
    • Events & Webinars
    • Subscribe
  • About us
    • Our team
    • Join us
    • Contact us
  • Our offering
  • Fractional in-house
  • Sectors
  • Evolve Insights
  • Our team
  • Join us
  • Our offering
  • Fractional in-house
  • Sectors
  • Evolve Insights
  • Our team
  • Join us

AI in the patent industry: Don’t believe the hype. Believe the data.

  • Sector: Patent law
  • 9th June 2026
Many in the IP profession remain considerably sceptical of AI. AI may be useful for checking for typos and simple calculations of deadlines, but it cannot replace in-depth human reasoning about complex scientific and legal issues. However, the data suggests something different.
 

Originally posted on IPKat.

How do we know LLMs are any good? 

Generative AI is an exceedingly competitive field. It is possibly one of the most competitive commercial arenas today. The foundational labs are in a race to produce the best models. This means it is very important to a lot of people, what it means to be “the best model”. The competitive nature of the field, and the widespread adoption of AI means that there is thus considerable interest in evaluating and comparing models according to every possible metric. Consequently, there is a wealth of data at our disposal comparing the models. 

The website Artificial Analysis is a good source of independent LLM evaluation data. As a casual perusal of Artificial Analysis shows, there are many possible ways to evaluate an LLM, including hallucination rate, speed, coding ability, ability to follow instructions, and mathematical reasoning. The MATH-500 Benchmark, for example, is a collection of 500 maths problems including algebra, geometry, intermediate algebra, number theory, precalculus, and probability, requiring step-by-step solutions and precise mathematical reasoning (GPT-5 is the current leader, scoring 99.4%). Given the focus on coding in the LLM world, it is unsurprising that many of the benchmarks relate to evaluating this type of ability. However, what we need the models to do in the IP profession, is very different from coding and mathematical solutions.  

Evaluating AI for patent work 

The LLM evaluation that is most relevant to the patent profession, in our opinion, is the assessment of the models’ ability of models to perform long-context reasoning (LCR). If an LLM is going to be any use for the patent profession, it is going to need to understand and reason about long complex documents. As luck would have it, there is an evaluation designed to test this ability. 

The current LCR evaluation for LLMs from Artificial Analysis consists of 100 questions relating to diverse document types, including academic papers, company financials, government consultations, legal documents, industry reports, and marketing materials. The documents average 100,000 tokens each (i.e. about 75,000 words, or 300 pages). The 23 legal documents in the test contributed the most tokens, with an average of 116,000 tokens. According to the Artificial Analysis summary, the LCR eval requires “genuine reasoning” rather than simple data extraction, comprising multi-step analysis to synthesize information from dispersed sections, the ability to understand complex domain-specific content, and clear and unambiguous answers free from errors and hallucinations. 

According to Artificial Analysis, traditionally humans have dramatically outscored LLMs on LCR. Indeed, up until around 2024, even the best LLMs were bad at it. Even the best frontier models, such as ChatGPT, Claude and Gemini, achieved less than 50% accuracy in the LCR evaluation. Given this, it is entirely unsurprising that, if you tried to use an LLM for patent work in early 2024, it probably wasn’t very good. In 2024, it was indeed a huge struggle to get LLMs to achieve good outcomes for complex tasks such as patent drafting, prosecution or prior art analysis without a highly sophisticated work flow, a great deal of separate prompting steps and complex coding loops, all of which needed a lot of underlying programming and software engineering. In the world of 2024, AI-wrapper software made a lot of sense. It was also at this time that many of the AI wrapper companies for IP were founded. After all, in early 2024, we needed them. 

The rate of change

According to the data, therefore, in 2024 LLMs were fairly bad at understanding and reasoning about long complicated documents. However, in AI, things change and they change fast, and the LCR evaluation data for the latest models tells us exactly how much things have changed. According to the independent assessment of Artificial Analysis, the best models currently available (ChatGPT 5, Claude Opus 4.6, Gemini Pro 3.1) currently score around 75% on the LCR test, a big jump up from 50%. 

But 75% is still pretty far off 100%, I hear readers cry. However, the key piece of comparative information we need to know is how well humans perform in this test. According to Artificial Analysis, human domain experts also struggle with the test. Whilst human evaluation confirmed that it was possible to answer a question correctly, the average expert human score was typically 40-60% of questions on the first attempt. In other words, with average scores of 75% the best models are now better at long-context reading than the average human domain expert (in a fraction of the time). 

The current ability of the frontier models on LCR tasks means that a lot of the programming and software solutions that the AI-wrapper companies were originally built on, are no longer needed. The frontier models now just do this, by default. Indeed, it is important to keep up to date with the capabilities of the underlying model, so as to avoid over-engineered solutions that actually prevent the models performing well (Evolve Insights). The skill of prompt engineering has become a lot about what you don’t need to say, as much as what you do need to say.  

Another key message from the LLM eval data on LCR is that, for patent work, we need to be using the best models. Interestingly, on the LCR benchmark, Grok 4.20 languishes down at 58%, whilst DeepSeek v3.2 is a respectable 65%. If you are using a free version of an LLM, or the “fast” non-reasoning/thinking model, it will be far worse, and probably no better than 50%. However, the better models are also far more expensive per token. If you are speaking to an AI software wrapper company, “what model are you using” is therefore one of the first questions you need to be asking. 

The demise of the AI wrapper?

A previous post discussed the increasing redundancy of AI wrapper software for IP (Evolve insights). There is not much difference these days between the output of such tools, and the output of a foundational LLM such as ChatGPT, Gemini or Claude combined with prompt engineering by an experienced patent attorney. In many cases, the content of the output is worse, as it will lack specific technical field and jurisdictional expertise. However, a strong argument for using a wrapper was that they do generally provide a user-friendly interface with, for example, the ability to combine prompting and track changes. 

That all changed last weekend, with Anthropic’s release of the Claude plug-in for Word. The Claude plug-in (which, interestingly, appears to be marketed at lawyers specifically), allows Claude users to prompt within Word, incorporating tracked-changes functionality. Word is, by all accounts, a horrible piece of clunky software to deal with, and it is notable that Microsoft themselves haven’t yet worked out how to combine CoPilot prompting and tracked changes in a useable way. Whilst tracked changes and version control combined with prompting has been available for ages for code, for text editing some people were even predicting a shift away from Word to a markdown editors or LaTeX editors. Claude has however, clearly recognised the importance of Word integration as a bottle-neck for AI adoption, and thrown everything at providing their own plug-in. As with all AI use, the Claude plug-in uses Claude itself, and therefore users need to have the appropriate confidential provisions in place (Evolve insights). 

Analysis

It is clear from the benchmark data that, not only are LLMs now very good, but their abilities are also increasing very rapidly. In our view, the launch of the Claude Word plug-in removes one of the few arguments remaining for relying on AI wrapper software instead of upskilling ourselves as attorneys to be able to use AI effectively. Whilst the proud dinosaurs may be content to wait out the rise of AI until they can take early retirement, those of us who are enthusiastic about the future of the profession should view LLMs as an opportunity for learning something new. To do this, we need to be learning how to use AI to enhance our own expert capabilities, and not relying on someone else’s. 

Author: Rose Hughes

Rose is a biotech and pharmaceutical patent specialist with over a decade of experience in intellectual property. Rose is a patent attorney at Evolve, where she leverages our unique fractional in-house model to provide clients with deep patent law expertise combined with the strategic commercial oversight typically associated with senior in-house counsel.

With a PhD in Immunology from UCL, Rose applies her technical background to complex innovations in biologics, cell and gene therapies, and the rapidly emerging field of AI-assisted drug development. Previously, Rose held the role of Director. Patents at AstraZeneca, where she was responsible for global IP portfolios and IP strategy at every stage of the pharmaceutical pipeline, from platform development and on-market commercialization to SPCs and patent term extensions.

A recognized thought leader in the field, Rose has been a regular contributor to IPKat since 2018, offering practical insights into European patent law developments. She is also a frequent speaker on the epi podcast, a guest lecturer for the Brunel University IP law Postgrad Certificate, and a contributing author to published books A User’s Guide to Intellectual Property in Life Sciences (2021) and Developments and Directions in Intellectual Property Law (2023).

Related insights...

Popper: The global patent prosecution AI for pharmaceutical IP

  • 9th June 2026
Popper is Evolve’s proprietary AI tool, built by our own pharma patent attorneys to navigate the complex intersection of life sciences, global patent law, and commercial strategy.

Is AI software for IP just expensive wrapping paper?

  • 14th May 2026
At the last count, there were more than seventy companies offering AI-assisted IP software solutions. Most of these companies are less than two years old.

Are AI-generated materials legally privileged? United States v. Heppner

  • 13th May 2026
Legal privilege ensures that you can share sensitive information with your lawyers without fear of it being used against you in court. This protection is critical in all fields of law. In patent law, without the assurance of secrecy, the ability of a patentee or a defendant to receive candid advice would be severely diminished.…

The future of the patent profession: Are we looking into an AI abyss?

  • 8th May 2026
AI presents a huge dilemma for patent attorneys. There is no doubt that AI will have a dramatic impact on the profession and the business model that many firms have relied on for decades.

Use of AI in the patent industry: Are you behind the wheel or waiting for the bus?

  • 1st May 2026
It took a global pandemic to move some patent firms away from paper files. Today, it seems that patent attorneys are finally entering modernity with the growing adoption in the industry of automation tools for patent drafting and prosecution case management. Interestingly, much of this is being sold and promoted as “AI”, despite much of…

Use of AI in the patent industry: The spectre of hallucination

  • 10th October 2025
What are the risks of AI hallucinations for the patent industry?

Use of AI in the patent industry: Solving the confidentiality problem

  • 10th October 2025
How can patent attorneys ensure client confidentiality when using AI software for patents.

Evolve AI: Building AI tools for IP that are expert-led and pharma-specific

  • 2nd October 2025
We believe that the value of AI for our profession lies in developing highly specialised tools that build upon and incorporate domain-specific attorney expertise.

Is it time for patent offices to enter the bioinformatic age?

  • 13th June 2025
In a world in which incalculable amounts of sophisticated sequence data is freely available, are the clunky processes necessary to input patent sequence data really fit-for-purpose?

An LLM is not (yet) a person skilled in the art (T 1193/23)

  • 20th May 2025
The EPO clarifies that an LLM interpretation of a technical term in a patent is not yet evidence of how a skilled person would interpret the term.
All Insights

evolve® is a trading entity of Evolve Intellectual Property Limited. Evolve Intellectual Property Limited is regulated by the Intellectual Property Regulation Board (IPReg). Details of the UK professional rules can be found on the IPReg website

registered address: 49 Greek Street, London, England, W1D 4EG

website out of house

© 2026 All Rights Reserved

Keep in touch

Subscribe

Contact Us