Patent searching has evolved tremendously in recent years and it’s time to consider what the next decade of patent search looks like.
Why do we search patents? Historically, the primary use-case was to find prior art for a potential patent application or for an existing patent. The patent world was effectively self-contained, with inventors and specialist searchers manually scanning for patents that related to their idea.
Today, patents are seen as an enormous repository of human innovation and knowledge, as well as a class of property that can be owned, traded and leveraged. What can be done with all this information about innovation within the patents, and all the information about patents as assets?
Armed with complete and accurate information about the universe of patents, business leaders can and should leverage patent analysis to inform a myriad of business decisions – such as acquisitions, competitive monitoring, R&D planning, and strategic decision-making. These use-cases require an entirely new way of searching for, accessing and analysing patents.
As a result, prior search approaches and tools are becoming more inadequate and obsolete by the day – modern use-cases demand a new generation of patent search and analysis tools.
1. The First Age: Manual Search
For many decades, patent searching meant one thing – going to the public search room in the U.S. Patent and Trademark Office in Virginia (or a few dozen satellite search rooms at Patent and Trademark Depository Libraries) to manually search through books of published patents.
The patents themselves weren’t actually searched – the searching was done on the U.S. Patent Classification codes. The searcher would find appropriate classifications by searching for relevant classification titles, then retrieve the patents that belonged to those classifications in date order, and read each patent individually. In fact, this could also be called “index search” – like searching via the index at the back of a book to find specific pages.
2. Manual index searching by patent classification codes
Patents moved from books to microfilm in the U.S. in the ‘70s, then to CD-ROM in the 80’s, but searches mostly remained the same: classification code search, patent document retrieval, and patent reading. The CD-ROM library for Japanese patents in 1994 was 140 CD-ROMs, so searching within a single CD yielded less than 1% of the possible matches. A librarian’s guide to patent search published as late as 2008 described this approach.
Not only is this search approach highly manual, but it is very likely to miss relevant patents, because applicable patents aren’t always in the expected classifications. A few tools emerged during the first generation to speed the process a bit, including patent summaries and third-party classification systems to try to make the patent terminologies more consistent, but these didn’t change the process or its limitations.
3. The Second Age: Keyword Search
Computerisation of the patent documents in a single database, accessible from the web starting in the late '90s, allowed basic keyword search of the actual patent documents, from anywhere in the world. Nirvana!
Searching the actual patent documents enabled far faster searches and higher relevance by refining multiple keyword searches and using Boolean operators. Patent summaries were still helpful to speed up the review of potentially thousands of patents found from keyword searching.
Patent search still required training and specialists because most search tools required selecting which database and which text field to search in, knowing special database codes, and even using specialised syntax. Google Patents finally broke through this with a simple keyword search anywhere in the patent document in late 2006.
The USPTO patent search website in early 1999. As a second-age patent search tool, it still looks much the same.
While searching is faster with keyword search, it will still miss relevant patents that use different terminologies. Patents are notoriously varied in their descriptions of technologies – as a technology changes over time, the same technology is present across different industries, or even in cases where the patent filer is trying to hide the patent with unusual vernacular.
Even “natural language” and many semantic search tools are really keyword searches in disguise – they strip out some words and apply a few synonyms, and then do a keyword search behind the scenes.
In the end, the second generation’s computer-assisted approach to patent search is basically the same as manual search – first search to define the set of patents to review, then retrieve the patent documents, and finally read each one to find the ones that are most relevant.
Over the last few years, companies have come to realise that second-generation patent search gets ever-more expensive and time-consuming, and cannot generate helpful or quick answers to business questions in many cases, due to several fundamental limitations:
Disparate data sets: Patent ownership changes and patent litigation cases are stored in different databases, and second-generation tools require the analyst to look these up separately. As a result, getting a complete picture of a patent – Who is the current patent owner? Has the owner been acquired by someone else? Has the patent been abandoned? Has it been litigated? –requires going to multiple databases and assembling the information by hand.
Data errors: Second-generation tools require manual cleanup of data to find a company’s portfolio, before actual analysis can even start. Company names have enormous numbers of mis-spellings (here are the top 100 mis-spellings for IBM, for example). This adds hours and significant cost to any analysis, and prevents ever getting a quick answer.
No prioritisation or “80:20” guidance: Second-generation tools are great at generating lists of hundreds or thousands of patents, but cannot tell you which ones are the 10% to focus on for your analysis. So you have to scan all of the patents without any guidance about which patents are strongest or most relevant. Again, this multiplies the cost and time to get an answer.
Doesn’t help with the explosion of patent data: When the USPTO enabled search on the web in 1998, the quantity of patents that were searchable – all of the active US grants at the time – was 2 million patents. Today, with the ever-increasing patent grants over the last 20 years, the publishing of patent applications starting in 2001, the addition of foreign jurisdictions that need to be analysed, and the rise of China to the number-one patent filing jurisdiction worldwide, the worldwide patent database now totals over 100 million documents. The second-generation search technique of manually reviewing and fixing the owner of every patent in the results cannot begin to keep up with the explosion of patent documents.
4. Third Age: Patent Analytics
Third-age patent analytics solutions have emerged to address the problems above. The explosion of patent data, multiple unconnected data sets, myriad of data errors and lack of prioritisation are all problems that modern big-data software can fix. By providing multiple ways to find relevant patents, improving information where possible, automatically linking patents to other helpful information, prioritising results, and analysing groups of patents with a single click, third-age patent analytics solutions greatly speed up existing analyses and allow entirely new types of analyses to be performed.
This allows the IP specialist to more quickly and readily address the underlying need, which is really about answering business questions, such as: Is this a new patentable concept? Who should we partner with to access this technology? Which patents should we try to license, and who should we approach for to license the most likely patents? Should we renew this patent?
As a result, patent analytics platforms dramatically increase the impact of patent information in business decision-making. Where second-age patent search tools simply show a list of patents to manually wade through before analysis can be performed, third-age tools allow the IP specialist to quickly and accurately perform analyses that directly address business questions. Thus third-age tools broaden both the range of questions that patent analyses can provide insights to, and the overall impact that an IP specialist can have.
An example third-age visual, in this case a PatentScape™ of the patents most relevant to software-defined networking, organised by sub-topic and current patent owner. The third-age tool automatically determines relevance from a description of the technology, discovers the most prevalent sub-topics, derives the current patent owner, sorts the most relevant patents to the center of each large cell, and presents an interactive visualisation that can be customised and with drill-down for further analysis. Amazon.com clearly has the strongest coverage in virtual computing and computing nodes
A Third-age Platform in Action
While I was in China last year, a last-minute prospect meeting was added. With only a few minutes in a taxi before the meeting, I pulled out my iPhone, opened the Innography website in Safari, entered a few characters of the prospect’s name and selected the company from a drop-down, and did a text cluster on their active patent portfolio that showed all the areas of focus of the prospect’s patenting efforts globally – allowing me to be much better informed for the meeting. Behind the scenes, Innography automatically incorporated dozens of subsidiaries, found translated patents from dozens of jurisdictions, corrected hundreds of company-name permutations, and discovered the common themes in the full text of 10,000 patents – all in less than a minute.