It has been almost two decades since the most well-known business-related social network, LinkedIn was founded. Today, it has expanded to nearly 450 million members and $6.8 billion in annual revenue.
LinkedIn was established by Reid Hoffman, Allen Blue, Konstantin Guercke, Eric Ly, and Jean-Luc Valliant in 2002. LinkedIn has changed the way we find jobs and establish professional connections, creating new opportunities for people. Therefore, the story behind how it was made deserves to be told, from the beginning.
LinkedIn was created in 2002, in the living room of Reid Hoffman, co-founder of LinkedIn. According to him they built LinkedIn because:
“We’re here to build a business, not to create something cool. MySpace and Facebook have done really well. And I think they can monetize what they have built, probably by adding in more e-commerce. But I think the opportunity on the business side is ultimately larger.”
Finally, in 2003, it was launched to the public. Reid has many multiple startup experiences before being part of LinkedIn. He was a board member of Google, eBay, and PayPal. The main goal of creating LinkedIn is to build connections in a professional world, and in the first month of their operation, they gained 4,500 members! And this is despite the very basic first LinkedIn web design:
In 2005, LinkedIn started to level up their service. They launched jobs and subscription fee options, by then the number of users had increased rapidly by approximately 3.3 million users at the beginning of the year.
In 2006, LinkedIn released the “people you may know” feature to open for new connections, which we are still using today. The company also recorded the first profit since being founded, an impressive outcome after only four years of operating.
LinkedIn continued its success, and expanded its headquarters to the United Kingdom, Spain, and France in the year 2008. Around this time, LinkedIn had 33 million members.
By 2010, LinkedIn needed to keep up with their extensive expansion with a fast growing number of its members that had now reached about 65 million. Because of this, A. George “Skip Battle” was appointed as a member of the board. He is also a board of directors in most of the prominent companies such as Expedia, Netflix, and Workday just to name a few. According to Jeff Weiner, LinkedIn CEO statement:
“We are positioning the company for significant growth this year and over the long term. Skip Battle brings a unique combination of consumer web and enterprise experience that will help guide our company.”
In 2013, LinkedIn celebrated its 10th year anniversary. This year was a double celebration as 225 million users were registered across the globe. LinkedIn also focused on targeting recent graduates and young professionals as new members. We can see below that LinkedIn’s web design by now was improved to a more modern and user-friendly experience.
By 2016, Microsoft acquired a large share of LinkedIn at a value of $26 billion, putting LinkedIn as one of the largest acquisitions made by Microsoft. Accordingly, Satya Nadella, Microsoft CEO believed that:
“Together we can accelerate the growth of LinkedIn, as well as Microsoft Office 365 and Dynamics as we seek to empower every person and organization on the planet.”
By 2017, LinkedIn’s new desktop version was created. This new version gives the user a seamless experience both to mobile and desktop apps.
In more recent times, more specifically in 2019, LinkedIn produced worldwide a new feature open for business, which allows freelancers to be discovered on the platform. Here is the sneak peak that we are still using today:
Today, professionals continue to benefit from LinkedIn, and the number of users continue to climb surpassing 675 million users worldwide as of writing this.
Now that we are aware of LinkedIn’s history, let’s get started how its technology was built.
In the beginning, LinkedIn started as a single monolithic application doing anything and everything. That single application was called Leo. It hosted web servlets for all LinkedIn’s pages, handled business logic, and connected to a handful of LinkedIn databases. The setup was really, really, basic:
However, as a social network, it is crucial to manage member to member connections. LinkedIn needed a system that queried connection data using graph traversals and lived in-memory for top performance. It quickly became clear that Leo could not scale independently, so a separate system for LinkedIn’s members, known as the member graph called Cloud was born. To keep this graph service separate from Leo, LinkedIn used Java RPC (Remote Procedure Calls) for communication.
It was around this time that LinkedIn needed to add search capabilities to their service. This caused the member graph service to start feeding data into a new search service running Lucene.
Scaling a monolithic system
As the LinkedIn site grew, so did Leo, increasing its role, responsibility, and as always when it comes to systems, its complexity. Load balancing helped as multiple instances of Leo were spun up. But the added load was taxing LinkedIn’s most critical system – its member profile database.
An easy fix LinkedIn did was classic vertical scaling – throwing more CPUs and memory at it! While that bought some time, it didn’t solve the underlying problem. The profile database powered by Leo handled both read and write traffic, and so in order to scale, replica slave DBs were introduced. The replica DBs were a copy of the member database, staying in sync using the earliest version of database, which is now open-sourced. The replica DBs were set up to handle all read traffic and logic was built to know when it was safe (and consistent) to read from a replica versus the main master DB.
With more and more traffic, the single monolithic app Leo was often going down in production, making it next to impossible to troubleshoot and recover, and difficult to release new code. High availability of the site was critical to LinkedIn’s business, so it quickly became clear that they needed to solve the underlying problem “killing Leo”, and break it up into many small functional and stateless services, known as a Service Oriented Architecture (SOA).
Service Oriented Architecture
With the engineering team at LinkedIn extracting microservices to hold APIs and business logic like search, profile, communications, and groups platforms. Later, presentation layers were extracted for areas like LinkedIn’s recruiter product and public profile. For new products, brand new services were created outside of Leo, and over time, vertical stacks emerged for each functional area.
With the move on to microservices for newer services and products, LinkedIn built their frontend servers from various domains, and formed the HTML (via JSPs). By 2010, LinkedIn had over 150 independent services and more than 750 today. Here is the graphical representation on how the architecture evolved into microservices:
Being stateless and hence operating in a microservice architecture, scaling LinkedIn could be achieved by spinning up new instances of any of the services and using hardware load balancers between them. This caused the engineering team to actively start redlining each service to know how much load it could take, and built out early provisioning and performance monitoring capabilities.
A common trick for sites that experience rapid growth as well as handle a lot of traffic, is to reduce the load altogether by adding more layers of cache. Many applications started to introduce mid-tier caching layers like memcache or couchbase, and LinkedIn was no different, adding caches to their data layers and started to use Voldemort with precomputed results when appropriate.
Over time, LinkedIn went the other way and removed many mid-tier caches. Mid-tier caches were storing derived data from multiple domains. While caches appear to be a simple way to reduce load at first, the complexity around invalidation and the call graph mentioned below was getting out of control. Keeping the cache closest to the data store as possible keeps latencies low, allowed LinkedIn to scale horizontally, and reduces cognitive load.
Kafka, Rest.li and Super Blocks
With the growth of LinkedIn, the data flow was also increasing rapidly, which meant the platform needed an efficient data pipeline for streaming and queuing data. For example, LinkedIn needed data to flow into a data warehouse, they needed to send batches of data into their Hadoop workflow for analytics, they collected and aggregated logs from every service, they collected tracking events like pageviews, they needed queueing for their inMail messaging system, and they needed to keep their people search system up to date whenever someone updated their profile.
As a result, LinkedIn created a new custom pipeline, called Kafka. It was made for scalability and high-speed. Kafka allowed real-time access to any data source, which helped the company to build real-time analytics, and immensely enhanced their site monitoring and track call graphs mentioned below. Nowadays, Kafka controls over 500 billion events per day.
Despite all their efforts to build efficient microservices, LinkedIn was still experiencing issues with their Java-based RPC (Remote Procedure Calls). This led them to create a new API model, dubbed Rest.li. It is an open-source REST API framework for building a powerful, scalable architecture, using type-safe bindings and asynchronous, non-blocking IO. Furthermore, LinkedIn is also using JSON over HTTP, as it became more efficient for the company to have non-Java based clients.
Today, LinkedIn is still predominantly developed in Java, yet the company runs many services using Python, Ruby, Node.js, and C++. Here is a quick look at the latest LinkedIn technology:
Service oriented architectures work well to decouple domains and scale services independently. But there are downsides. Many of LinkedIn’s applications fetch many types of different data, in turn making hundreds of downstream calls. This is typically referred to as a “call graph”, or “fanout” when considering all the many downstream calls. For example, any Profile page request fetches much more data beyond just profile data, this includes data types such as photos, connections, groups, subscription info, following info, long form blog posts, connection degrees from our graph, recommendations, etc. This call graph can be difficult to manage and was only getting more and more unruly. As a result, LinkedIn introduced the concept of a super blocks – groupings of backend services with a single access API. This allows LinkedIn to have a specific team optimize the block, while keeping the call graph in check for each client.
LinkedIn history and earliest technology was made simple, till now it became the fastest growing professional social network that builds bridges to many companies. There is no doubt that everything begins with an idea. Do you have a dream of making the next LinkedIn? Come and talk to Wiredelta! We’ll figure it out!