Blog Post

Some tools for lifting the patent data treasure

Bruegel contributes to the stream of research on PATSTAT by providing two algorithms that try to minimize the amount of manual work that has to be performed. We also provide data obtained by the application of these methods.

By: and Date: December 9, 2014 Topic: Innovation & Competition Policy

Patent applications are a product of research and development, but they are also subject to research. Studying patents and the patenting behaviour of individuals and organizations is important to understand how innovation works. Indeed, the relevance of patent data has long been recognized:

Patent statistics loom up as a mirage of wonderful plentitude and objectivity. They are available; they are by definition related to inventiveness, and they are based on what appears to be an objective and only slowly changing standard.

Zvi Griliches, 1987 (Patent Statistics as Economic Indicators: A Survey)

However, data helps only if it includes the kind of information that is necessary to answer some research question. Indeed, researchers have been involved in the construction and enrichment of datasets related to patents for at least 30 years. The problems we faced in our work with patent data have roots that go back to the eighties.

First of all, not all patentees that appear in the records with the same name are the same patentee, and conversely, two different names may actually correspond to a single patentee. The parent-subsidiary structure is an additional complication for the researcher that is interested in assigning a patent to a single entity.

Because the patent office does not employ a consistent company code in its computer record, except for the “top patenting companies” where the list of subsidiaries is checked manually, the company patenting numbers produced by a simple aggregation of its computer records can be seriously incomplete.

Zvi Griliches, 1987 (Patent Statistics as Economic Indicators: A Survey)

The second problem is that patent data needs to be complemented by company data to be of most use, and efforts to obtain this kind of dataset have always required extensive manual input (see Bound et al, 1984 and Griliches, Pakes, Hall 1988)

The manual work that was cumbersome thirty years ago is even more difficult today, given that the available data is hundreds of times larger than what was in the hands of researchers in the past.  Take the PATSTAT database: it is a useful source of standardized information for millions of patents from all over the world, but its usefulness is limited by the type of information that is included. For example, PATSTAT does not assign patent applicants to different categories, making it difficult to distinguish companies from public institutions or even individuals. Also, it is affected by the same problems that we mentioned above, namely the appearance of duplicates, and the missing link to other sources of firm-level data.

For this reason, some efforts are devoted to the enrichment of PATSTAT and its integration with external information. For example the EEE-PPAT table assigns a category to every patentee, establishing whether it is an individual, a company, or another kind of organization.

As shown in the map, we can geolocate a lot of PATSTAT patents using only information inside the database (left), but we can do much better once we link the patentees to companies, for which we have more precise information (right).

      

We contribute to this stream of research on PATSTAT by providing two algorithms that tackle the two above mentioned problems and that try to minimize the amount of manual work that has to be performed. We also provide data obtained by the application of these methods. Our work can be summarized as follows:

  1. We provide an algorithm that allows researchers to find the duplicates inside Patstat in an efficient way
  2. We provide an algorithm to connect Patstat to other kinds of information (CITL, Amadeus)
  3. We publish the results of our work in the form of source code and data for Patstat Oct. 2011.

More technically, we used or developed probabilistic supervised machine-learning algorithms that minimize the need for manual checks on the data, while keeping performance at a reasonably high level.

The data and source code is accompanied by three working papers:

A flexible, scaleable approach to the international patent “name game”

by Mark Huberty, Amma Serwaah, and Georg Zachmann

In this paper, we address the problem of having duplicated patent applicants’ names in the data. We use an algorithm that efficiently de-duplicates the data, needs minimal manual input and works well even on consumer-grade computers. Comparisons between entries are not limited to their names, and thus this algorithm is an improvement over earlier ones that required extensive manual work or overly cautious clean-up of the names.

Source code

Data

A scaleable approach to emissions-innovation record linkage

by Mark Huberty, Amma Serwaah, and Georg Zachmann

PATSTAT has patent applications as its focus. This means it lacks important information on the applicants and/or the inventors. In order to have more information on the applicants, we link PATSTAT to the CITL database. This way the patenting behaviour can be linked to climate policy. Because of the structure of the data, we can adapt the deduplication algorithm to use it as a matching tool, retaining all of its advantages.

Source code

Data

Remerge: regression-based record linkage with an application to PATSTAT

by Michele Peruzzi, Georg Zachmann, Reinhilde Veugelers

We further extend the information content in PATSTAT by linking it to Amadeus, a large database of companies that includes financial information. Patent microdata is now linked to financial performance data of companies. This algorithm compares records using multiple variables, learning their relative weights by asking the user to find the correct links in a small subset of the data. Since it is not limited to comparisons among names, it is an improvement over earlier efforts and is not overly dependent on the name-cleaning procedure in use. It is also relatively easy to adapt the algorithm to other databases, since it uses the familiar concept of regression analysis.

Source code

Data


Republishing and referencing

Bruegel considers itself a public good and takes no institutional standpoint. Anyone is free to republish and/or quote this post without prior consent. Please provide a full reference, clearly stating Bruegel and the relevant author as the source, and include a prominent hyperlink to the original post.

View comments
Read article

Blog Post

Standing on the shoulders of distant giants

New inventions build on earlier inventions, so patent citations are one indication of who is standing on whose shoulders. We show that four low-carbon technologies (wind, solar, electric vehicles and batteries) exhibit markedly different patterns of citation behaviour. If technology spillovers are structurally different between sectors, this could imply that policies to support innovation clusters would need different approaches. Differentiated policies could range from promoting individual champions for technologies with strong internal spillovers, to supporting regional eco-systems for technologies with more fuzzy spillovers.

By: Fabio Matera and Georg Zachmann Topic: Energy & Climate, Innovation & Competition Policy Date: May 23, 2017
Read about event More on this topic

Past Event

Past Event

Standardisation and patents: problems and policy options

Bruegel together with the Association for Competition Economics (ACE), is hosting an event on standardization and SEP licensing.

Speakers: Aleksandra Boutin, Georgios Petropoulos, Rebekka Porath, Pierre Regibeau and Hughes de la Motte Topic: Innovation & Competition Policy Location: Bruegel, Rue de la Charité 33, 1210 Brussels Date: May 9, 2017
Read about event More on this topic

Past Event

Past Event

Patents and royalties: stifling or promoting innovation in ICT?

The patent system is never out of the spotlight. Do patents achieve their ultimate goal of incentivising innovation, or actually stifle it? The debate is especially heated in the ICT sector...

Speakers: Paul Belleflamme, Benno Buehler, Paolo Casini, Esa Kaunistola, Jorge Padilla, Rebekka Porath and Reinhilde Veugelers Topic: Innovation & Competition Policy Location: Bruegel, Rue de la Charité 33, 1210 Brussels Date: November 25, 2015
Read article More on this topic

Blog Post

Huawei vs ZTE judgement: a welcome decision?

Today the European Court of Justice (ECJ) will rule on a dispute between Chinese tech companies Huawei and ZTE regarding a patent “essential” to the “Long Term Evolution” (LTE) wireless broadband technology standard. 

By: Mario Mariniello and Francesco Salemi Topic: Innovation & Competition Policy Date: July 15, 2015
Read article Download PDF More on this topic

Working Paper

The policy dilemma of the unitary patent

This paper provides new evidence about the budgetary consequences – for patent offices – of the coexistence of the forthcoming Unitary Patent (UP) with the current European Patent (EP).

By: Jérôme Danguy and Bruno van Pottelsberghe Topic: Innovation & Competition Policy Date: November 27, 2014
Read article Download PDF More on this topic

Working Paper

A flexible, scaleable approach to the international patent 'name game'

The inventors in PATSTAT are often duplicates: the same person or company may be split into multiple entries in PATSTAT, each associated to different patents. In this paper, we address this problem with an algorithm that efficiently de-duplicates the data.

By: Mark Huberty, Amma Serwaah and Georg Zachmann Topic: Innovation & Competition Policy Date: September 28, 2014
Read article More on this topic More by this author

Blog Post

Samsung, Google-Motorola ruling: stepping out of the patent abuse saga?

The Commission, in its role as regulator, should mandate standard-setting organisations to define the details of FRAND ‘contracts’ compatible with EU competition law. Enforcing those contracts would then naturally not create any institutional tension between the Commission and national courts. 

By: Mario Mariniello Topic: Innovation & Competition Policy Date: May 1, 2014
Read article More on this topic More by this author

Video

Video

An end to the patent war in Europe?

Earlier in March, the European Commission announced it was planning to issue two antitrust decisions over the use of standard-essential patents. The decisions concern the Google-Motorola and the Samsung cases. Commissioner Joaquín Almunia himself announced one of the decisions will seek a commitment while the other one will be, for the first time, a prohibition Ahead […]

By: Mario Mariniello Topic: Innovation & Competition Policy Date: April 27, 2014
Read article More on this topic More by this author

Blog Post

Thunderbolts in the patent storm – EU and US antitrust strikes in the Samsung and Google-Motorola cases

Standards and standard-setting processes play a key role in fostering European economic development. Standards ensure interoperability of networks and often give rise to significant reductions in transaction and production costs.

By: Mario Mariniello Topic: Innovation & Competition Policy Date: January 7, 2013
Read article More on this topic More by this author

Blog Post

The Unitary patent: challenges still ahead

On December 11th the European Parliament approved the proposal made by the Competitiveness Council at Ministerial level to create a “unitary” patent that would cover 25 member states (Spain and Italy opposed the system due to languages reasons).  SMEs will in addition benefit from lower fees.

By: Bruno van Pottelsberghe Topic: Innovation & Competition Policy Date: December 18, 2012
Read article More on this topic More by this author

Video

Video

The value of a well-designed EU patent

After more than 30 years of negotiations, the European Union is closer to having a unified patent system. After the agreement on translation requirements for the EU Patent back in December 2011, negotiations are now focusing on patent courts and litigation rules. In this video, Research Fellow Bruno van Pottelsberghe explains why it has taken […]

By: Bruno van Pottelsberghe Topic: Innovation & Competition Policy Date: November 22, 2012
Read article

Blog Post

Blogs review: the patent war in IT

What’s at stake: Apple’s recent victory in its ongoing dispute over IP rights with Samsung has received a great deal of attention from regulators, academics and the media worldwide. It is, however, just one of the many battles of an ongoing war in the IT sector over intellectual protection. Standard economic analysis sees IP protection as a trade-off between securing a fair reward for innovators while ensuring that future innovation is not jeopardized and that customers pay a fair price. Although the aim of the patent system is to strike the right balance between these two broad objectives, recent developments – for example patent trolls, patent thickets and ambush strategies – suggest that the balance has tipped towards incumbents.

By: Jérémie Cohen-Setton and Laurent Eymard Topic: Energy & Climate, Innovation & Competition Policy Date: October 26, 2012
Load more posts