Design Twitter Search

Requirements

Capacity Estimation and Constraints

System APIs

High-Level Design

Now let us built the most complicated Index Server. Since our tweets consist of words, we will build our index across words. So indexes in most layman terms are nothing just distributed hash table where the key is the word, and the set of tweet id that contains the words are the values.

We can use either of two technique to shard our data :

Shard based on words: We first calculate the hash based on word and then find the server based on the hash.

Problems with this approach :

There might be too much load on the single server when a search term becomes hot.
No uniform distribution of load

2. Shard based on tweet id: We calculate the hash based on tweet id and find the server based on this hash. All the servers maintain the index for all the words . while fetching results, the result is first fetched from each server then compiled together on a central server and returned to the user. -> avoid hot partitions, slow search

two level shard, first shard by word

PreviousDesign Google Autocomplete Feature NextMessage Broker

Last updated 4 years ago

Was this helpful?