Kafka
open source publish/subscribe messaging system




topic : taking care of categorizing data, divided into partitions










1. ä»ä¹æ¯ Kafka
Kafka æ¯åœä»éåžžæµè¡çååžåŒæµå€çå¹³å°ïŒéä¿ç¹æ¥è¯ŽïŒKafka äžä» ä» æ¯å¯ä»¥çšäœMessage BrokerïŒå®è¿æºæäŸäžäºé¢å€åèœïŒ
å¯ä»¥åšäžå®æ¶éŽå ååšä¿¡æ¯ (Store)
对信æ¯è¿è¡å®æ¶å€ç (Process)
åèª Kafka Documentation. Logo
æ£æ¯å 䞺æäŸäºä»¥äžäž€äžªé¢å€åèœïŒKafkaå¯ä»¥åšæ¯æäŒ ç» Message Broker ç䜿çšåºæ¯ïŒåšäžåç³»ç»ä¹éŽäŒ éä¿¡æ¯ïŒä»¥å€ïŒè¿èœæ¯æéå¯¹ä¿¡æ¯æµçååºå忢ãäžé¢çå°èäŒå¯¹ååšä¿¡æ¯å宿¶å€ç䞀䞪é¢å€åèœåä»ç»ç讲解ã
åŸçåèª Kafka Documentation. Introduction
2. Kafka çèµ·æºæ
äº
Jay KrepïŒKafka çäž»èŠäœè ä¹äžïŒåšä»çæç« äžè§£éäº Linkedin 讟计 Kafka çåè¡·ã仿æè¿°çåºçšåºæ¯æ¯åŸæä»£è¡šæ§çïŒèœåŸå¥œçè§£é Message Broker ç»äžäžªå€§åæå¡çä»·åŒãæä»¬å°±åè¿æ€ç»äžäžèåäžäžªå®äŸã
æåŒå§çæ¶åïŒLinkedin åžæå¯ä»¥äœ¿çš Oracle Data Warehouse å çä¿¡æ¯æ·èŽåºæ¥ïŒå° Hadoop äžåäžäºå€çãåšè¿äžªè¿çšåœäžïŒä»ä»¬çå·¥çšåžåç°äºè¿äžªé¡¹ç®çå 䞪ç¹ç¹ä»¥åæœåšå»¶äŒžã
è±äºå€§éçæ¶éŽæ¥ç¡®ä¿æ°æ®èœ¬ç§»ççš³å®æ§ïŒå äžºäžæŠæ°æ®èœ¬ç§»è¿çšäžåºç°ä»»äœé®é¢ïŒä¹å Hadoop çåæå°±ååŸæ æä¹äºã
æ°çæ°æ®æºéèŠå€§éæ¶éŽå»é 眮ïŒè¿åŸäžçæ³ãè§£å³æ¹æ³æ¯å¯¹ææçæ°æ®ç³»ç»çæ¥å£éœæ ååïŒäœ¿ Hadoop ç³»ç»å¯ä»¥èªåšå èœœæ°æ®ã
æå€§éçå ¶ä»ç»çæ°æ®æºåžæè¢«æŽåå°ç³»ç»éæ¥ïŒå 䞺å°åæ£åšå䞪äžåæå¡äžçä¿¡æ¯æŽååšäžèµ·å¯ä»¥å®ç°åŸå€æ¬æ¥æ æ³åå°çåæïŒäŒåž®å©ææçåäžè ã
å³äœ¿æå€§éæ°æ®æºè¢«æŽåïŒèŠèŸŸå°æ°æ®å šèŠçä»ç¶åŸéŸïŒæä»¥æä»¬èŠè¿äžæ¥å°æ°æ®æºæŽåçå·¥äœç®åã
æ¥æ¶ç«¯é€äº HadoopïŒè¿å¯ä»¥å æ¬åŸå€å ¶ä»ç³»ç»åŠ Monitoring å databaseã
åé端é€äºOracle Data WarehouseïŒè¿å¯ä»¥å æ¬ Voldemort (key-value store), Espresso (Document Store) ...
åŠæäœ¿çšæç²æµ
çåæ³ïŒèŠå®ç°å€å¯¹å€çå
šèŠçïŒéèŠæ»å
± O(N^2) äžªæ°æ®æµãè¿æ¯äžå¯ä»¥æ¥åçã
åèª Jay Kreps. (2013). The Log: What every software engineer should know about real-time data's unifying abstraction
ç¶èäžäžªäžä» - Message Broker å¯ä»¥å€§å€§ç®åå䞪系ç»å
éšå€çæ°æ®æµçå·¥äœïŒå°å
¶éå¶åšæ¯äžªæå¡äžäžªæ°æ®æµã
åèª Jay Kreps. (2013). The Log: What every software engineer should know about real-time data's unifying abstraction
3. ä¿¡æ¯äŒ é
Topic æ¯ Kafka çæ žå¿æŠå¿µïŒå®æè¿°ä¿¡æ¯æµçç±»å«åç§°ãä¿¡æ¯äŒ éçåŒç«¯æ¯Producer å°ä¿¡æ¯åéå°ç¹å®ç Topicã
åèª Kafka Documentation. Introduction
é对ç¹å®ç TopicïŒProducer å¯ä»¥å°ä¿¡æ¯åå°å€äžª PartitionïŒä¿¡æ¯åš Partition å
éšä¿è¯é¡ºåºãProducer å¯ä»¥ç®åå°èœ®æµ (Round-Robin) å°ä¿¡æ¯åéå°äžå PartitionäžïŒä¹å¯ä»¥åºäºä¿¡æ¯äžå
容åéå°ç¹å®ç Partitionã
åèª Kafka Documentation. Introduction
Consumer 绎æçå¯äžäžäžªç¶ææ¯åŸäžç OffsetïŒConsumer åºäºæ€æ§å¶ä» Log çä»ä¹å°æ¹åŒå§è¯»åã
åèª Kafka Documentation. Introduction
å䞪Consumer GroupäžçææConsumeräŒåå读åä¿¡æ¯ãæ¯äžªConsumer Groupä¹éŽäºçžç¬ç«ïŒåå«è¯»åææä¿¡æ¯ã
4. è®Ÿè®¡ææ³
Kafkaçæ žå¿æœè±¡æ¯æ¥å¿ (Log)ãåœæä»¬è¯åŸèŠçè§£ Kafka çè®Ÿè®¡ææ³çæ¶åïŒæä»¬å¯ä»¥ç®åå顟äžäžæ¥å¿çç¹ç¹ã
æ¥å¿æ¯ææç®åçååšæºå¶ïŒå å«äžæ®µåªèœæ·»å ïŒäžèœæŽæ¹çæç §æ¶éŽé¡ºåºæåçä¿¡æ¯ãæè§£åŒæ¯ä»¥äžå ç¹ïŒ
åæ»æ¯åçåšæ«å°ŸïŒèè¯»æ»æ¯äŸæ¬¡ä»å·ŠåŸå³ïŒäžéèŠä¹±åºè¯»å (Random Access)ã
æ¯æ¡ä¿¡æ¯çåºå·å¯ä»¥çšæ¥è¡šèŸŸæ¶éŽçå åã
æä»¬ååŸäžææäžå±ïŒæ¥æä»¥äžæ§èŽšçæ¥å¿å¯ä»¥çšæ¥å¹²ä»ä¹å¢ïŒæ¥å¿å¯ä»¥çšæ¥è®°åœååã
æºä»£ç çæ¬æ§å¶ç³»ç»äœ¿çšæ¥å¿æ¥è®°åœæ¯äžäžªåå²çæ¬çä¿¡æ¯
å®¹çŸæºå¶äœ¿çšæ¥å¿æ¥æ¢å€ç³»ç»åæ¬çç¶æïŒåŠæ°æ®åº
ååžåŒç³»ç»äœ¿çšæ¥å¿æ¥å°ä¿¡æ¯æŽæ°å°å€ä»œæºåšäž
çå°è¿éïŒèªæçååŠä»¬å¯èœæ³å°äºå€§å®¶æŽçæçæ°æ®åº - æ°æ®åºè®°åœç¶æïŒæ¥å¿è®°åœååã以äžçäžäžªäŸåéœæ¯éè¿æ¥å¿è®°åœçååïŒåºäºäžäžªåå§çæ¬äº§çäžäžªæè å€äžªåå²çæ¬ãæœè±¡äžç¹è¯ŽïŒæ¥å¿è®°åœååçæäºè®°åœææçåå²çæ¬ïŒè¿æ¯è®°åœç¶æçæ°æ®åºæ æ³åå°çã
çå°è¿éïŒååŠä»¬å¯èœæç¹äºééŸéïŒå€Žäžåé®å· - è¿äžå è· Kafka æä»ä¹å ³ç³»ïŒ
æ¥å¿äœäžº Kafka çæ žå¿æœè±¡ïŒåšæµå€çäžæäž€äžªéåžžæ£çç¹æ§ã
Producer 产ççä¿¡æ¯å¯ä»¥ä»¥äžåçé床被å€äžª Consumer å€ç
æ£æ¯å 䞺æ¥å¿è®°åœäºåå以ååçžè®°åœäºææåå²çæ¬ïŒæäœ¿åŸ Producer å Consumer æäºåŸé«çèªç±åºŠå»æç §èªå·±çèå¥åéåå€çä¿¡æ¯ïŒåå°æŽé«äžçº§å«çè§£èŠ (Decoupling)ïŒäžºæŽå€§è§æš¡ç Scaling æäžåºç¡ãå³äœ¿ Consumer 宿ºïŒä¿¡æ¯ä¹äžäŒäž¢å€±ã
æ°æ®ç»æçŽæ¥ååšåšç¡¬çäž
å 䞺读åçæš¡åŒåŸåäžïŒå°€å ¶æ¯äžéèŠä¹±åºè¯»åã䟿å®ç硬ç HDD ç读åé床åšé¡ºåºæ åµäžå¯ä»¥èŸŸå°200MB/sïŒè¿äžªé床åšäžæžžéèŠéå¯¹æ¯æ¡ä¿¡æ¯åå€ççæ åµäžå·²ç»è¶³å€äºã(äœäžºå¯¹æ¯ïŒDRAM å¯ä»¥èŸŸå°2-20GB/s)
Kafka å æ€å¯ä»¥ä¿åæ¶éŽé¿çå€çæ°æ®ïŒèäžéèŠç«å»å é€å€çå®çæ°æ®ã
5. RabbitMQ vs Kafka
ä¹åæä»¬åŠä¹ äº RabbitMQ å°±æ¯äžäžªäŒ ç»æä¹ç Message Brokerãå
¶æ žå¿æœè±¡æ¯éå(Queue)ãåŠäœå®ç° Message Broker | RabbitMQ å
æ žè§£æMessage Broker å°±åä¿¡æ¯é«éå
¬è·¯ïŒåšäžåç³»ç»ä¹éŽäŒ éçæ¶æ¯ïŒæ¯åŸå€æä»¬è³çèœè¯Šçæå¡äžäžå¯æ¿ä»£çäžéšåãä»å€©æä»¬å°±éè¿è§£æ RabbitMQ çå
æ žæ¥çè§£ Message Broker çäžç§å®ç°æ¹æ³åå
¶åºçšåºæ¯ã
çœèŸç±æç³»ç»è®Ÿè®¡![]()
æä»¬å¯ä»¥éè¿æ¯èŸå®ä»¬æ¥å æ·±æä»¬å¯¹ Kafka ççè§£ã
RABBITMQ
KAFKA
æ§èŽš
Message Broker
ååžåŒæµå€çå¹³å°
ä¿¡æ¯ä¿çæ¶éŽ
Consumer读å宿¯ä¿¡æ¯å³å é€
èŸé¿ïŒç±Produceré 眮
ä¿¡æ¯ååš
å åïŒå¯é硬çå€ä»œ
Log + Index åšç¡¬çïŒIndexåæ¶åå ¥å å
ä¿¡æ¯è¯»å
Push to Consumer
Consumer pulls
ä¿¡æ¯æ¬¡æ°
è³å€äžæ¬¡ æ è³å°äžæ¬¡
è³å€äžæ¬¡ æ è³å°äžæ¬¡ æ äžå€äžå°äžæ¬¡
ä¿¡æ¯äŒå 级
å¯é 眮信æ¯äŒå 级
äžæ¯æ
æ§èœ
äž
æé«
Consumer
ææConsumeråå读åä¿¡æ¯
æ¯äžªConsumer Groupåå«è¯»åä¿¡æ¯ å䞪Consumer GroupäžçConsumerååä¿¡æ¯
æä»¬ä»ç»åæå°±äŒåç°åœäžçä¿¡æ¯ä¿çæ¶éŽïŒååšå读åäžçåºå«å°±æ¯è¿äžªæ žå¿æœè±¡äžåè富èŽçã
6. å®ç°ç»è
6.1 Scalability
Partitions äŒè¢«åæ£å° Kafka Cluster çå€å°æºåšäžåä¿¡æ¯å€ç以忥å请æ±
æ¯äžª Partition å¯ä»¥è¢«å€å¶å°å€å°æå¡åšäžæ¥ä¿è¯å®¹çŸéæ±
æ¯äžª Partition æäžå°æºåšäœäžº LeaderïŒå ¶ä»æºåšäœäžº FollowerãLeader å€ç读åéæ±ïŒFollower å€å¶ LeaderãåŠæ Leader 宿ºïŒFollower äŒè¢«æå䞺Leaderã
Kafka äœ¿çš Zookeeper æ¥åè° Kafka Cluster äžçæºåšã
åšåäžäžª Consumer Group éïŒäžäžª Topic åªèœæ¯æè³å€æ°éè· Partition æ°éäžæ ·ç Consumerã
äžé¢æä»¬çäžäžè¯»åçå
·äœäŸåã
Kafka Write Scalabilityââ åèª InsideBigData Editorial Team. (2018). Developing a Deeper Understanding of Apache Kafka Architecture Part 2: Write and Read Scalability
ä»åçè§åºŠäžïŒTopic éçä¿¡æ¯äŒè¢«åæ£å°äžåç Partition LeaderãPartition Leader æŽæ°å¯¹åºç Followerã
Kafka Read Scalability ââåèª InsideBigData Editorial Team. (2018). Developing a Deeper Understanding of Apache Kafka Architecture Part 2: Write and Read Scalability
ä»è¯»çè§åºŠäžïŒConsumer å¯ä»¥åå«ä»å䞪 Partition Leader é£éåæ¶è¯»åä¿¡æ¯ã泚æä¹åæè¿çâåšåäžäžª Consumer Group éïŒäžäžª Topic åªèœæ¯æè³å€æ°éè· Partition æ°éäžæ ·ç ConsumerãâïŒè¿æå³çææ Consumer éœå¯ä»¥åæ¶åäžè¯»åã
æ»ç»èµ·æ¥è¯ŽïŒKafka å¯ä»¥å ·ææå¥œç Scalabilityãç¶èè¿è¿æ¯äŸèµäºäœ¿çšè æ ¹æ® Producer å Consumer çæ°éåçå°é 眮èŸå€ç PartitionïŒäœ¿ Kafka èªåžŠçScalability å¯ä»¥åæ¥åºæ¥ã
6.2 æ°æ®ç»æ
ä¹åæè¿ Kafka çæ žå¿æ°æ®ç»ææ¯æ¥å¿ãäžé¢æä»¬æ¥çäžçè¿äžªæ¥å¿çå®ç°ã
å顟äžäžåé¢çå
å®¹ïŒæ¯äžª Topic å¯ä»¥åæå€äžª Partitionãåœææ©çä¿¡æ¯éèŠæ¯äžªäžæ®µæ¶éŽè¢«å é€çæ¶åïŒä¿®æ¹æä»¶æ¯åŸéº»çŠçãäºæ¯ Kafka åŒå
¥äº Segment çæŠå¿µïŒäžäžª Partition å¯ä»¥è¢«è¿äžæ¥å°å岿 Segmentã
åèª Travis Jeffery. (2016). How Kafkaâs Storage Internals Work
åœéèŠåäžäžª Partition åä¿¡æ¯çæ¶åïŒå®é äžæä»¬æ¯ååšæåäžäžªè¿æªåå®ç Segment äžãåœåäžäžª Segment åå®ä¹åïŒæ°ç Segment äŒçæïŒç±å®ç Offset æ¥åœåã
诎å®äºæŠå¿µïŒæä»¬çç Partition å Segment åŠäœå¯¹åºå°æä»¶ç³»ç»äžã
åèª Kafka Documentation 5.4 Log
åšæä»¶ç³»ç»äžïŒPartition æ¯ç®åœåïŒè Segment æ¯æä»¶åãæ¯äžª Segment æ index å log 䞀䞪æä»¶ãåè å å«å ·äœçä¿¡æ¯æ¬èº«åå ä¿¡æ¯ã
åèª Travis Jeffery. (2016). How Kafkaâs Storage Internals Work
Indexæä»¶åšå
åäžæå¯æ¬ïŒåšConsumeräžçº¿æ¶åž®å©å®äœè¯»åçèµ·å§ç¹ã
Index & Log File Contentââ åèª Travis Jeffery. (2016). How Kafkaâs Storage Internals Work
7. é«çº§API
Kafka åšäŒ ç» Message Broker æäŸç Producer API å Consumer API åºç¡äžïŒé¢å€æäŸäºäžäºé«çº§ APIãå®ä»¬å»¶äŒžäºäŒ ç» Message Broker çåèœïŒæäŸäºæŽé«äžå±çæœè±¡ïŒäœ¿åŸçšæ·åšäœ¿çšæ¥å£æ¶æŽå æ¹äŸ¿ã
7.1 Kafka Streams
7.1.1 æŠè¿°
Kafka Streams æ¹äŸ¿äºå¯¹ä¿¡æ¯å®æ¶å€ç (Process)ã
Streams è¿äžªååèµ·åŸåŸåœ¢è±¡ïŒæè¿°äžæ¡æ å§æ ç»çä¿¡æ¯æµãæ¯äžäžªåç¬çä¿¡æ¯ç§°äžºäžäžª data recordïŒå®ç°äžæ¯é®åŒå¯¹ (Key-value Pair)ã
äžé¢æ¯å®çäž»èŠç¹ç¹ã
äœäžºå®¢æ·ç«¯çåº (Client Library)ïŒ çŽæ¥è·åšå®¢æ·ç«¯äžïŒèäžæ¯ Kafka Broker é矀äž
è·å®¢æ·ç«¯äžèµ· Scale
ä¿è¯æ¯æ¡ä¿¡æ¯å€çäžæ¬¡ïŒäžå€äžå°ïŒå³äœ¿å®¢æ·ç«¯å Kafka Broker åºç°é®é¢ïŒ
7.1.2 ä¿¡æ¯æµå€ç (Stream Processing)
è¿äžå°èæä»¬æ·±å ¥çäžç Kafka StreamsïŒä¹å°±æ¯ Kafka åä¿¡æ¯å€ççé«çº§ APIïŒæ¯æä¹æ¹äŸ¿çšæ·çã
äžé¢æ»ç»äº Kafka Streams ææ¯æçæäœã
æ¯ææç¶ææäœ (Stateful Operation) - åå¹¶ (Aggregations) 以å å å ¥ (Joins)
æ¯æéç¶ææäœ (Stateless Operation) - æ¯åŠ æ å° (Map) 以å è¿æ»€ (Filter)
æ¯æåšäžå®çæ¶éŽçªå£éåæç¶ææäœïŒæ¯åŠè®¡æ°æå æ»ïŒ
æ¯æèªå®ä¹æäœ - äœ¿çš Processor API

å ¶äžæ¯äžäžªäžéŽç Stream Processor (é Source å Sink) å¯ä»¥å®ç°äžè¿°åç±»æäœïŒç»æäžäžªææåŸïŒå®ç°äžæ¥äžæ¥æŽåå€äžªä¿¡æ¯æºçç®çã
ç°åšæä»¬æèäžäžåŠææä»¬éçšäŒ ç»ç Message brokerïŒäŒæä»ä¹ååã
æä»¬åšåŸå€æ åµäžæ¯æå¯¹ä¿¡æ¯å®æ¶å€ççéæ±çïŒç¹å«æ¯æç¶ææäœãæ¯åŠïŒæä»¬æ³èŠæ°äžäžæ¯äžç±»çä¿¡æ¯éœæå€å°äžªãè¿äžªæ åµäžïŒåŠææä»¬çšçæ¯ RabbitMQïŒæä»¬å°±åŸåšå®¢æ·ç«¯äžå®ç°è¯¥é»èŸïŒçè³æ¯ååžåŒçæ¬ç该é»èŸãåŠå€è¿éèŠèèæ éæ åµäžè®¡æ°çåç¡®æ§ïŒRabbitMQ äžä¿è¯æ¯æ¡ä¿¡æ¯äžå€äžå°åªå€çäžæ¬¡ïŒãå¯è§ïŒKafka Streams æäŸäºçžåœå®çšçåèœïŒäžäžª API Call å°±èœæå®ã
7.2 Kafka Connect
Kafka Connect æ ååäº Kafka äžå ¶ä»æ°æ®ç³»ç»çæ¥å£ã
å ¶äœç𿝿Ÿèæè§çãåšç¬¬äºå°è Kafka çèµ·æºæ äºäžæä»¬å°±æå°è¿è·äžåæ°æ®æºè¿æ¥éèŠå€§éçå·¥äœéïŒå¯äžçåºè·¯å°±æ¯æ ååå®ä»¬ä¹éŽçæ¥å£ã
Connector æ¢å¯ä»¥çšæ¥ä»å ¶ä»æ°æ®ç³»ç»éæå (Ingest) æ°æ®ïŒä¹å¯ä»¥çšæ¥åå ¶ä»ç³»ç»åå ¥æ°æ®ã
8. åèææ
Kafka Documentation. Introduction
Kafka Documentation. Kafka Streams Introduction
Kafka Documentation. Kafka Streams Core Concepts
Kafka Documentation. Streams DSL
Kafka Documentation. Kafka Connect
InsideBigData Editorial Team. (2018). Developing a Deeper Understanding of Apache Kafka Architecture Part 2: Write and Read Scalability
Travis Jeffery. (2016). How Kafkaâs Storage Internals Work
Last updated