📋
space
  • SDE Interview
  • Multi-threading
    • Mutex vs Semorphore
    • Thread vs process
  • Design Pattern
  • Java
    • Polymorphism
    • Encapsulation
    • Inheritance
    • Override vs overload
  • MySQL
    • DB transaction
  • Data Structure
    • Design hashset
    • AVL / red black tree
    • LinkedList vs arrayList
    • HashMap vs HashTable
    • Binary Tree
    • Heap
  • System Design
    • Session Cookie
    • GFS / BigTable / MapReduce
    • Zookeeper
    • gRPC vs thrift
    • Amazon RDS vs Oracle
    • Microservices
    • REST vs RPC
    • Database design
    • idempotent in HTTP
    • Optimistic locking / Pessimistic locking
    • Partitioning / Sharding data
    • Consistent Hashing
    • Case Study
      • Design Delay Task scheduler
      • Design View Count
      • Design Twitter
      • Design Web Crawler
      • Design Uber
      • Design Netflix
      • Design Google Doc
      • Design Monitoring System
      • Design Dropbox
      • Distributed Lock
      • Design Instagram
      • Design Yelp
      • Design Amazon
      • Design Google Search
      • Distributed Database System Key Value Store
      • Design Facebook message / Whatsapp
      • Design Logging Systems
      • Design Movie booking system
      • Design Google Autocomplete Feature
      • Design Twitter Search
    • Message Broker
      • Kafka
    • Design Data Intensive Application
      • Chapter 8
    • SQL vs NoSQL
      • Cassandra
      • MongoDB vs Cassandra vs MySQL vs HBase
    • TCP vs UDP
    • Load Balancer
    • Cache
      • Memcached
      • Redis
    • DNS
    • CDN
    • Strong consistency vs eventual consistency
    • Scalability
Powered by GitBook
On this page
  • Requirements
  • Capacity Estimation and Constraints
  • System APIs
  • High-Level Design

Was this helpful?

  1. System Design
  2. Case Study

Design Twitter Search

PreviousDesign Google Autocomplete FeatureNextMessage Broker

Last updated 4 years ago

Was this helpful?

Requirements

Capacity Estimation and Constraints

System APIs

High-Level Design

Now let us built the most complicated Index Server. Since our tweets consist of words, we will build our index across words. So indexes in most layman terms are nothing just distributed hash table where the key is the word, and the set of tweet id that contains the words are the values.

We can use either of two technique to shard our data :

  1. Shard based on words: We first calculate the hash based on word and then find the server based on the hash.

Problems with this approach :

  1. There might be too much load on the single server when a search term becomes hot.

  2. No uniform distribution of load

2. Shard based on tweet id: We calculate the hash based on tweet id and find the server based on this hash. All the servers maintain the index for all the words . while fetching results, the result is first fetched from each server then compiled together on a central server and returned to the user. -> avoid hot partitions, slow search

two level shard, first shard by word

LogoDesign Twitter SearchMedium