Blog post

Search engines, big data and network effects

Search engines are intermediaries in a two-way market between users and advertisers. Their huge stocks of data about users and their preferences can h

Publishing date: 22 November 2016
Authors: Georgios Petropoulos

Search engines respond to queries by providing relevant and valuable information about the topics that their users are looking for. They also act as intermediaries which match consumers with providers of services or sellers of products. To monetise their work, search engines collect and process data from users and sell advertising slots to companies. By analysing the users’ data, they can improve the quality of the search engine algorithm and provide more relevant organic search results, but they can also design personalised advertising strategies for companies’ products and services. Companies are thus more successful in placing their products, and consumers receive recommendations tailored to their interests.

Advertising slots are allocated on a competitive basis to the companies that are willing to pay the highest amount to get some prominence in the platform when the user is searching for relevant terms. Slots are auctioned and the winners have to pay a fee to the search engine whenever a potential consumer clicks on the displayed ad (pay-per-click strategy). Search engines base their revenues almost exclusively on selling advertising slots (see for example some relevant statistics in Liem and Petropoulos, 2016).

So, in this two-sided market environment, advertisers assign a higher valuation to participating in the search engine platform when they expect to have a higher probability of matching with as many consumers as possible. More consumers can increase the value for companies to be advertised in the search engine in two main ways. First, companies obviously see that they can be connected with more potential buyers as the pool of relevant consumers increases. Second, search engines with more users can analyse the additional data and improve their matching algorithms. They can then provide more personalised services to advertisers which can in turn have higher chances of success in attracting consumers. Given the competitive allocation of advertising slots, more consumers, therefore, implies higher revenues for the search engine as advertisers are willing to bid more for prominence.

Consumers, on the other side of the market, value information that leads them directly and quickly to what they are looking for, to minimise their search costs (for example, consumption of time). Hence, they mostly prefer to only visit the search engine of the highest quality (according to their standards and criteria) even if they have the option to use different search engines simultaneously.

In betwen these two sides, search engines compete in quality and efficiency of search (for example, accuracy of results, page load speed, real time relevance) in order to attract more consumers who will in turn bring more advertising revenues which can be used for further improving their quality to attract even more consumers, and so on. So, it is of interest for the search engines to build up a large stock of users’ information that can be analysed in order to provide good quality services which attract more consumers and advertisers.

How important is the scale of data collection for the improvement of services?

In general, the answer depends on the complexity of the algorithm in place, the number and the noise in observations. But in the context of data related to human behaviour and preferences, the amount of data and the quality of predictive power seem to be positively related.

When data are drawn from human actions, noise rates are often high. Social scientists have long argued that one way to circumvent the poor predictive validity of attitudes and traits is to aggregate data across occasions, situations, and forms of actions. This provides an early suggestion that more (and more varied) data might indeed be useful when modelling human behaviour data. The implication for predictive analytics based on data drawn from human behaviours is that by gathering more data from individuals (aggregated by the modelling), one could indeed hope for better predictions.

Fortuny, Martens and Provost (2013) demonstrate that when predictive models are built from sparse, fine-grained data—such as data on low-level human behaviour—economies of scale are important and we continue to see marginal increases in predictive performance even to very large scale. However, the curve between predictive power and data size do seem to show some diminishing returns to scale, albeit without any ceiling that limits the increasing pattern.

Their empirical results are based on data drawn from nine different predictive modelling applications, from book reviews to banking transactions. The study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics similar to those used in search engines.

Big data and quality driven competition

Since the quality of search engines increases with data gathering and analysis, search platforms with a larger stock of information than their competitors can have a competitive advantage. They can perform better and attract more users and consequently more advertisers. This suggests that the amount of data processed is linked with market power in the search engine market. Incumbents with a large stock of information and experience in data analytics can protect their market position against new entrants and firms that are far behind in data accumulation. Moving to the frontier of the search market and establishing a large base of users requires new entrants to offer very attractive prices for the advertising slots and at the same time to bear the costs of developing a good quality dynamic algorithm that can attract consumers. This is an investment that can only pay off in the long run. Bing, for example, in its effort to increase its installed base of users incurred losses for many years before starting to be profitable.

Note that in many markets product differentiation among competitors is multi-dimensional. Consumers’ preferences are affected differently by each of the multiple dimensions. If I need to subscribe to a mobile operator my choice depends not only on the quality of the network but also on whether I want to combine mobile with TV and home internet, whether I wish to call often in another country or how much mobile internet I intent to use. The plurality of options and preferences lead to multi-dimensional market competition where there is room for entry and growth of firms that specialise in specific dimensions.

The search engine market is by construction centralised and product differentiation has only on dimension, the quality of service. (Some could argue that privacy is another dimension, but this is a secondary choice parameter, especially since it is difficult for the users to rank competitors with respect to the privacy protection they provide). That means that the search engine of higher quality than its rivals can increase its market power relatively fast or even dominate the market, especially under the implementation of efficient methods of machine learning and data analytics.

Moreover, in many cases, companies that own a popular search engine can use the collected data for their operation in other data-driven markets to improve the quality of their services and consequently their market position. For example, since 30% of mobile searches are about location, queries in the search engine about finding a particular location can improve the quality of navigation services through the additional data collected by the search engine. So, the search engine can help the “mother” company to enter to new markets and improve its market position through data accumulation and analysis.

Are there network effects in the search engine market?

Direct network effects in an online platform exist when an increase in the number of users or in usage of the platform directly increases the users’ value from accessing the platform. Facebook is a typical example of a platform where the value of using the network highly depends on the number of friends that also use the network. In search engine platforms, on the other hand, such direct effects are absent as each user derives individual benefits from its use without any direct effects to other individual users.

Indirect network effects exist when the increase in the number of users or usage of the network spawn increases in the value of a complementary product or network, which can in turn increase the value of the original. The relevant question is if there are indirect network effects in the search engine market which increase the value of participation for users.

Hal Varian, the chief economist at Google, argued at a recent Bruegel event on big data and market competition that it is the accumulated experience in the market that helps search engines to learn how to improve the quality of their services (learning by doing effect) just as it happens in normal businesses. He claims there are no indirect network effects involved on the users’ side: “The higher the number of customers a business has, the higher the revenue of the business, revenue which can be reinvested in the maintenance and improvement of the business so as to attract more users.”

While experience clearly matters, the impact of learning by doing effect is greater than in normal businesses due to the importance of data in improving the search engine algorithm. The addition of new users makes the search engine more competent to provide high quality services through the processing of the additional data collected by the newcomers. That generates extra value for the advertisers and further increases search engine’s revenue as well as its incentives to improve quality and attract even more consumers.

However, I would argue that indirect network effects are also present. We should not ignore that search engines are two-sided markets in which both sides are inter-dependent. What happens on one side has an impact on the other side and affects the value of accessing using the search engine for both sides of the market. The addition of new users increases the value of using the search engine for the existing ones not only because of the increase in quality (due to the learning by doing effect) but also because more users attract more advertisers. That leads to more matching opportunities for the existing users through the sponsored links of the search engine and therefore their value from accessing increases.

Network effects may also extend to organic search results given that the ranking of firms depends to some extent to firms’ choices such as their search engine optimisation strategies. While the algorithm of the search engine is secret, published guidelines provide an overview of how firms can increase their chances for high rankings in the organic search results. The arrival of new users increases the incentives for firms to get a better ranking position in order to attract more users by investing in search engine optimisation and the quality of their website. So, they provide better web services to their visitors (existing and new) and help them to reach what they are looking for in a more timely and efficient fashion.

When we assess market competition in the search engine market we need to take into account the importance of big data in defining market power. Especially when users do not pay any price to access search platforms and the latter compete in other key dimensions such as the (data-driven) quality of service.

About the authors

Georgios Petropoulos

Georgios Petropoulos joined Bruegel as a visiting fellow in November 2015 and was a resident fellow from April 2016 to February 2022. Since March 2022, he is a non-resident fellow. He is Research Associate at MIT, Digital Fellow at Stanford University and CESifo Network affiliate. Georgios’ research focuses on the implications of digital technologies on innovation, competition policy and labour markets. He is currently studying how digital platforms should be regulated, what the relationship between big data and market competition is, as well as how the adoption of robots and information technologies affect labour markets, employment and wages. He holds a Bachelor’s degree in Physics, Master’s degrees in mathematical economics and econometrics and a PhD degree in Economics. He has also studied Astrophysics at a Master's level.