When to create new Consumer in ConsumerGroup











up vote
0
down vote

favorite












I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.



But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?



Could someone help me to understand any real use case.



Thanks










share|improve this question


























    up vote
    0
    down vote

    favorite












    I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.



    But here my question is When we should decide when to create new Consumer within same Group.
    When we have huge amount of data?



    Could someone help me to understand any real use case.



    Thanks










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.



      But here my question is When we should decide when to create new Consumer within same Group.
      When we have huge amount of data?



      Could someone help me to understand any real use case.



      Thanks










      share|improve this question













      I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.



      But here my question is When we should decide when to create new Consumer within same Group.
      When we have huge amount of data?



      Could someone help me to understand any real use case.



      Thanks







      apache-kafka






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 at 7:26









      Still Learning

      5283826




      5283826
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...



          There are 2 scenarios I could think of:




          1. If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.



          2. If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:



            a) If the semantics of your data are changing as partitioning a topic
            based on the semantics is quite a common use case



            b) If the data volume is increasing and the semantics are also changing



            c) If only the volume is increasing that is leading to Scenario 1




          However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.



          Hope this helps!






          share|improve this answer




























            up vote
            0
            down vote













            In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.



            For example, assume that you have a topic test with 5 partitions and a consumer group test-group. At any given time, only 5 consumers can be active withing test-group. Say we've got 1000 messages in topic test, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.



            Another -less straight-forward- example would be the following (taken from):
            enter image description here



            In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.






            share|improve this answer






























              up vote
              0
              down vote













              As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.



              A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.



              Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.



              Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.



              While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
              enter image description here



              When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.



              If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers






              share|improve this answer





















                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407112%2fwhen-to-create-new-consumer-in-consumergroup%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                1
                down vote



                accepted










                I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...



                There are 2 scenarios I could think of:




                1. If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.



                2. If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:



                  a) If the semantics of your data are changing as partitioning a topic
                  based on the semantics is quite a common use case



                  b) If the data volume is increasing and the semantics are also changing



                  c) If only the volume is increasing that is leading to Scenario 1




                However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.



                Hope this helps!






                share|improve this answer

























                  up vote
                  1
                  down vote



                  accepted










                  I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...



                  There are 2 scenarios I could think of:




                  1. If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.



                  2. If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:



                    a) If the semantics of your data are changing as partitioning a topic
                    based on the semantics is quite a common use case



                    b) If the data volume is increasing and the semantics are also changing



                    c) If only the volume is increasing that is leading to Scenario 1




                  However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.



                  Hope this helps!






                  share|improve this answer























                    up vote
                    1
                    down vote



                    accepted







                    up vote
                    1
                    down vote



                    accepted






                    I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...



                    There are 2 scenarios I could think of:




                    1. If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.



                    2. If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:



                      a) If the semantics of your data are changing as partitioning a topic
                      based on the semantics is quite a common use case



                      b) If the data volume is increasing and the semantics are also changing



                      c) If only the volume is increasing that is leading to Scenario 1




                    However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.



                    Hope this helps!






                    share|improve this answer












                    I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...



                    There are 2 scenarios I could think of:




                    1. If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.



                    2. If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:



                      a) If the semantics of your data are changing as partitioning a topic
                      based on the semantics is quite a common use case



                      b) If the data volume is increasing and the semantics are also changing



                      c) If only the volume is increasing that is leading to Scenario 1




                    However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.



                    Hope this helps!







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 21 at 12:32









                    Lalit

                    341210




                    341210
























                        up vote
                        0
                        down vote













                        In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.



                        For example, assume that you have a topic test with 5 partitions and a consumer group test-group. At any given time, only 5 consumers can be active withing test-group. Say we've got 1000 messages in topic test, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.



                        Another -less straight-forward- example would be the following (taken from):
                        enter image description here



                        In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.






                        share|improve this answer



























                          up vote
                          0
                          down vote













                          In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.



                          For example, assume that you have a topic test with 5 partitions and a consumer group test-group. At any given time, only 5 consumers can be active withing test-group. Say we've got 1000 messages in topic test, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.



                          Another -less straight-forward- example would be the following (taken from):
                          enter image description here



                          In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.






                          share|improve this answer

























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.



                            For example, assume that you have a topic test with 5 partitions and a consumer group test-group. At any given time, only 5 consumers can be active withing test-group. Say we've got 1000 messages in topic test, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.



                            Another -less straight-forward- example would be the following (taken from):
                            enter image description here



                            In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.






                            share|improve this answer














                            In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.



                            For example, assume that you have a topic test with 5 partitions and a consumer group test-group. At any given time, only 5 consumers can be active withing test-group. Say we've got 1000 messages in topic test, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.



                            Another -less straight-forward- example would be the following (taken from):
                            enter image description here



                            In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 21 at 7:54

























                            answered Nov 21 at 7:46









                            Giorgos Myrianthous

                            3,64121233




                            3,64121233






















                                up vote
                                0
                                down vote













                                As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.



                                A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.



                                Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.



                                Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.



                                While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
                                enter image description here



                                When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.



                                If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers






                                share|improve this answer

























                                  up vote
                                  0
                                  down vote













                                  As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.



                                  A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.



                                  Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.



                                  Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.



                                  While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
                                  enter image description here



                                  When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.



                                  If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers






                                  share|improve this answer























                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.



                                    A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.



                                    Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.



                                    Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.



                                    While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
                                    enter image description here



                                    When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.



                                    If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers






                                    share|improve this answer












                                    As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.



                                    A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.



                                    Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.



                                    Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.



                                    While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
                                    enter image description here



                                    When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.



                                    If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 21 at 9:05









                                    Nishu Tayal

                                    11.1k73381




                                    11.1k73381






























                                         

                                        draft saved


                                        draft discarded



















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407112%2fwhen-to-create-new-consumer-in-consumergroup%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Berounka

                                        Sphinx de Gizeh

                                        Different font size/position of beamer's navigation symbols template's content depending on regular/plain...