CPA Global contributes data to Google Patents Public Datasets.

Google recently announced the new Google Patents Public Datasets, to which CPA Global’s data and algorithm team contributed data to inform users’ R&D and corporate efforts.


CPA Global mines and correlates more than 200 data sources, including worldwide patent data, litigation, financials, trademarks, geographies, products, and people/inventor data amongst many other.  This creates a massive amount of data that requires data scientists and specialised algorithms to connect and simplify, enables business users to access information with a single click, and visualizes data in a way to present to their business. 

The biggest challenge is not the sourcing and extraction, but connecting the data into a single view. It’s been difficult for companies to connect data together because even something as simple as a patent number can be represented in many different ways in data sets.

This is particularly challenging for companies trying to connect their IP management system to public data sets, driving difficulties in data verification and consistency. We are glad to see Google is also interested in solving this challenge of data interoperability between companies with Google Patents Public Datasets, and we believe our experience in connecting data can help.

We wanted to contribute to Google Patents Public Datasets data that was about standards essential patent information. Given the recent court challenges and licensing issues around standards, providing the ETSI (European Telecommunications Standards Institute) data in a clean, accessible way would help researchers look at the impact of standards patents related to economic impact.

To illustrate, in Google BigQuery, you can use the following query to run an analysis of ETSI standards patents against PTAB challenges to see which companies are challenging the validity of those patents. This is particularly interesting as those patents are disclosed to the standards body as related to the standard, but the fact PTAB challenges are registered on those patents also validates these patents are being asserted, or there is enough value in these patents for companies to spend the time and resources to challenge them.

Here is the query:

















 `patents-public-data.patents.publications_201710` pat

FROM `patents-public-data.patents.publications_201710` pat

JOIN `innography-174118.technical_standards.etsi` etsi

    ON etsi.PublicationNumber = pat.publication_number

JOIN `patents-public-data.uspto_ptab.match_201710` pm

    ON pm.application_number = pat.application_number

JOIN `patents-public-data.uspto_ptab.trials_201710` ptab

    ON ptab.ApplicationNumber = pm.ApplicationNumber

Note: the number formats have been transformed to be consistent across datasets by each individual data provider participating in the system (Google and CPA Global in this case), so you can easily connect disparate datasets like chemistry, litigation, standards, patents, examination and more uploaded by different companies. You can also mix in your own private data.

The above query generates the following results

Sources: “CPA Global ETSI Data” by CPA Global (through ETSI IPR), CC BY 4.0, “USPTO PTAB API” by the USPTO, for public use.

This is from merging data from ETSI, PTAB, and bibliographic patent data (there are additional fields available than those displayed here). Being able to cross-validate information between the data sources provides keen insights into competitive and research activities.

This example is just one of our 200+ data sets where we compare and expose insight from our IP intelligence software suite, Innography and IP One Data. We enjoy working with companies to further the patent research agenda, and encourage communities to be more aware and versed in Big Data issues. We will constantly strive to provide connected, clean data for business critical decision making.