Is it time for patent offices to enter the bioinformatic age?

Sector: Biotechnology

13th June 2025

In a world in which incalculable amounts of sophisticated sequence data is freely available, are the clunky processes necessary to input patent sequence data really fit-for-purpose?

Originally published on IPKat.

The dual-purpose of patent sequence listings

All patent applications containing sequence data in the claims, figures or description are required to submit a sequence listing. The sequence listing provides the sequences, together with a unique sequence identification number (SEQ ID NO:) in a prescribed format and additional data such as the organism and the position of any unusual features in the sequence.

The sequence listing serves two purposes. First, the sequence listing is used by the patent office to search for the sequences disclosed in the patent application. The prescribed format of the sequence listing assists in the automation of this search. Importantly, all the sequences in the patent application must be included in the sequence listing, regardless of whether they are part of the invention or just relate to tool compounds used in the examples. If a sequence is disclosed in the specification, the sequence must be included in the sequence listing (with a few exceptions, such as very short sequences). This allows the patent office to search for all of the disclosed sequences in the application.

The second function of the sequence listing is to facilitate public access to the sequence information disclosed in patent applications. There is an understandable desire for the sequence data in patents, which may not be published elsewhere, to be searchable in public databases of sequence information such as GenBank. One of the purposes of the shift to ST.26 sequence listing format was to facilitate better integration between patent sequence data and these public databases of sequences.

Introduction of ST.26

From 1 July 2022, the old international standard for sequence listings, ST.25, was replaced by the new ST.26 standard. The shift to ST.26, and the introduction of the new WIPO software for preparing ST.26 sequence listings had the aim of increasing public access to patent sequences. The process of preparing ST.26 sequence listings has some improvements over ST.25, but also some new disadvantages. One of the main issues with ST.26 is that the sequence listing is no longer submitted in a human-readable txt format, but instead as a complex XML file. In an effort to address this problem, WIPO has now introduced the ability to visualise ST.26 sequence listings in an internet browser directly from patentscope, so that users no longer have to download the XML sequence listing and import it into WIPO sequence.

Further information about ST.26 and WIPO sequence can be found at the WIPO Sequence and ST.26 Knowledge Base. Users can also subscribe to the WIPO sequence listing newsletter.

The risks of errors in patent sequence data

Sequence information is often the most important part of a patent. Many inventions in the biotech field will be defined in the claims by their sequences, and usually by a SEQ ID for that sequence, as provided in the sequence listing. Therefore, if the sequence listing is incorrect even by a single letter, then the claims of the patent may define completely different subject matter to what the applicant intended to claim. In some cases, this could mean that the patent does not cover the commercial embodiment of an invention.

The decision in T 1213/05 demonstrates the potentially fatal consequences of there being errors in a patent’s sequence data. In this case, the sequences provided in the priority document contained inadvertent sequence errors. The Board of Appeal found that a priority claim for the corrected sequences in the European patent was therefore invalid. The Board of Appeal cited with agreement the reasoning in T 70/05 that a priority claim to an incorrect sequence cannot be maintained, regardless of the reasons for the possible mistakes, either arising from unintended sequencing or typing errors.

The Board of Appeal T 1213/05 also rejected the patentee’s arguments that the skilled person’s knowledge of a certain margin of error in sequence data permitted there to be some deviation between the sequence in the priority document and the patent application claiming priority. For the Board of Appeal, the DNA sequences had to be identical to relate to “the same invention” and permit a valid priority claim. Claims directed to the corrected sequences were consequently found invalid in view of intervening prior art disclosing the sequences.

The decision in T 1213/05 shows that anything less than 100% accuracy in the sequence data can be fatal for a patent. In this context, it is worth bearing in mind that sequence data consists of a list of many strings of letters, where each string (“sequence”) can be thousands of letters long. Making a mistake in just one letter out of the potentially millions of letters in a sequence listing can be both at once very easy to do and almost impossible to detect. The burden to applicants in preparing sequence data for a patent application therefore does not just include the time and cost associated with preparing the sequence listing. In order to avoid potentially fatal errors in the sequence data, robust procedures for checking and validating the sequencing listing are also necessary. However, the only way to effectively minimize errors in sequence listings for which there may be hundreds or even thousands of sequences, is to automate the process.

Automated processes for dealing with sequences – Lessons to be learnt from bioinformatics?

With the arrival of high throughput sequencing, it became necessary for the academic community to devise automated processes for dealing with vast quantities of sequence data and for uploading these sequences to public sequence databases. Compared with the automation tools used by bioinformaticians, the process of preparing and validating sequence listings for patent applications is exceedingly clunky.

In order to prepare a ST.26 sequence listing it is necessary to input each sequence and its features into the purpose-built WIPO sequence tool. Unlike ST.25, it is possible to import multiple sequences for your ST.26 sequence listing at once, e.g. in FASTA format, instead of copying and pasting each individual sequence. However, the “features” of each sequence in a sequence listing, such as unusual amino acids, still have to be inputted manually to WIPO sequence. The manual process of adding features can take an extraordinary amount of time. The growth in next generation oligonucleotide technologies also means that there is an increasing amount of “unusual” sequence information that must be inputted as features of the sequence.

In contrast to WIPO sequence, public databases of sequence have purpose-built submission tools that facilitate the upload of vast quantities of annotated sequence information with little manual input. The submission tools for GenBank (e.g. BankIt), for example, allows automated input of sequence information in a format that includes feature information in the form of a 5-column Feature table.

There is thus a huge disconnect between the automation tools necessary for high-throughput processing of sequence data in academia, and the clunky tools available for preparing patent sequence listing, despite the similar aims of both processes. Aligning patent sequence data submission with the automated processes for submitting sequences to publicly available sequence data would facilitate access to patent sequence data whilst simultaneously improving the process of sequence submission for applicants.

Final thoughts

The dual-function of the sequence listing, first as a search tool of the patent office and second as a tool for increasing accessibility to patent sequence information, has resulted in a prescriptive sequence listing format that does not satisfactorily fulfil either purpose. Applicants are forced to submit lengthy sequence listings, in which only a small fraction of the sequences actually relate to the invention, using a manual process for inputting feature data that creates a substantial risk of sequence errors. Given that the tools for automating sequence submission to public sequence databases already exist, a radical rethinking of how patent sequence data is called-for. In the bioinformatic age, the present situation by which patent applicants are forced to manually input sequence data would be almost comical, if it didn’t have such potentially dire consequences for the accuracy of patent sequence data.

Author: Rose Hughes

Related insights...

New referral on claim interpretation and relevance of G1/24 to added matter confirmed (G1/26)

23rd July 2026

We now have confirmation of a new referral to the EBA on claim interpretation (G1/26). As previously hinted, The Board of Appeal in T 0873/24 has decided to refer questions on the application of G1/24 to the assessment of added matter.

AI in the patent industry: Bubble trouble and the affordability myth

20th July 2026

Whenever we write about AI, responses have started to move away from talk about hallucination, confidentiality or quality. The new concern is one of economics, i.e. will the whole AI thing just become too expensive and are we putting ourselves at risk by outsourcing to AI.

When do you file for a drug combination invention? (T592/24)

16th July 2026

Drug combination inventions are increasingly difficult to protect. File too early for a drug-combination invention and the application may lack the data to support the non-obvious effect of the combination. Wait for the trial read-out, and a competitor, or your own published protocol, may have already placed the invention in the public domain.

Podcast: Staring into the AI abyss

14th July 2026

INSIGHT epi – The podcast for European patent professionals

Patenting the use of medical devices (T0941/24)

26th June 2026

Patenting medical devices in Europe is challenging. First, there is the problem that medical use claims are not permitted for medical devices, whilst simultaneously methods of treatment or surgery using a device are excluded from patentability.

Albert: AI-powered competitive intelligence and IP validity analysis for biotech & pharma

23rd June 2026

Expert analysis based on deep industry expertise, delivered quickly and cost-effectively.

AI in the patent industry: Don’t believe the hype. Believe the data.

9th June 2026

Many in the IP profession remain considerably sceptical of AI. AI may be useful for checking for typos and simple calculations of deadlines, but it cannot replace in-depth human reasoning about complex scientific and legal issues. However, the data suggests something different.

Popper: The global patent prosecution AI for pharmaceutical IP

9th June 2026

Popper is Evolve’s proprietary AI tool, built by our own pharma patent attorneys to navigate the complex intersection of life sciences, global patent law, and commercial strategy.

Is AI software for IP just expensive wrapping paper?

14th May 2026

At the last count, there were more than seventy companies offering AI-assisted IP software solutions. Most of these companies are less than two years old.

Are AI-generated materials legally privileged? United States v. Heppner

13th May 2026

Legal privilege ensures that you can share sensitive information with your lawyers without fear of it being used against you in court. This protection is critical in all fields of law. In patent law, without the assurance of secrecy, the ability of a patentee or a defendant to receive candid advice would be severely diminished.…

PHARMACEUTICAL IP