Bruegel datasets

Remerge: regression-based record linkage with an application to PATSTAT

We further extend the information content in PATSTAT by linking it to Amadeus, a large database of companies that includes financial information. Patent microdata is now linked to financial performance.

Last update: 28 Septembre 2014


Record linkage algorithms typically find matches by comparing records on the fields they share. However, PATSTAT shares very little information with company databases. We introduce REMERGE: a flexible, open-source algorithm that allows PATSTAT, the worldwide patent database, to be intelligently linked with company databases, without limiting the comparisons to the shared fields.

The results of this matching application can be used to improve research into the economics of innovation. The algorithm could also be adapted for similar problems. We provide a description of our algorithm, together with details on the coverage on a by-country and by-sector basis, performance measures, and hints for future research. We also show results from an additional application of REMERGE to the European Commission’s Tenders Electronic Daily database.

Click here to download the related. Working Paper.