When to create new Consumer in ConsumerGroup

up vote
0
down vote

favorite

I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.

But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?

Could someone help me to understand any real use case.

Thanks

asked Nov 21 at 7:26

Still Learning

5283826

add a comment |

up vote
0
down vote

favorite

I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.

But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?

Could someone help me to understand any real use case.

Thanks

asked Nov 21 at 7:26

Still Learning

5283826

add a comment |

up vote
0
down vote

favorite

I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.

But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?

Could someone help me to understand any real use case.

Thanks

asked Nov 21 at 7:26

Still Learning

5283826

I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.

But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?

Could someone help me to understand any real use case.

Thanks

apache-kafka

asked Nov 21 at 7:26

Still Learning

5283826

asked Nov 21 at 7:26

Still Learning

5283826

asked Nov 21 at 7:26

Still Learning

5283826

asked Nov 21 at 7:26

Still Learning

5283826

asked Nov 21 at 7:26

Still Learning

5283826

add a comment |

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...

There are 2 scenarios I could think of:

If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.

If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:

a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case

b) If the data volume is increasing and the semantics are also changing

c) If only the volume is increasing that is leading to Scenario 1

However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.

Hope this helps!

answered Nov 21 at 12:32

Lalit

341210

add a comment |

up vote
0
down vote

In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.

For example, assume that you have a topic test with 5 partitions and a consumer group test-group. At any given time, only 5 consumers can be active withing test-group. Say we've got 1000 messages in topic test, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.

Another -less straight-forward- example would be the following (taken from):
enter image description here

In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.

edited Nov 21 at 7:54

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

add a comment |

up vote
0
down vote

As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.

A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.

Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.

Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.

While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
enter image description here

When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.

If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407112%2fwhen-to-create-new-consumer-in-consumergroup%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...

There are 2 scenarios I could think of:

If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.

If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:

a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case

b) If the data volume is increasing and the semantics are also changing

c) If only the volume is increasing that is leading to Scenario 1

Hope this helps!

answered Nov 21 at 12:32

Lalit

341210

add a comment |

up vote
1
down vote

accepted

I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...

There are 2 scenarios I could think of:

If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.

If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:

a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case

b) If the data volume is increasing and the semantics are also changing

c) If only the volume is increasing that is leading to Scenario 1

Hope this helps!

answered Nov 21 at 12:32

Lalit

341210

add a comment |

up vote
1
down vote

accepted

I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...

There are 2 scenarios I could think of:

If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.

If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:

a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case

b) If the data volume is increasing and the semantics are also changing

c) If only the volume is increasing that is leading to Scenario 1

Hope this helps!

answered Nov 21 at 12:32

Lalit

341210

I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...

There are 2 scenarios I could think of:

If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.

If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:

a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case

b) If the data volume is increasing and the semantics are also changing

c) If only the volume is increasing that is leading to Scenario 1

Hope this helps!

answered Nov 21 at 12:32

Lalit

341210

answered Nov 21 at 12:32

Lalit

341210

answered Nov 21 at 12:32

Lalit

341210

answered Nov 21 at 12:32

Lalit

341210

add a comment |

up vote
0
down vote

Another -less straight-forward- example would be the following (taken from):
enter image description here

In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.

edited Nov 21 at 7:54

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

add a comment |

up vote
0
down vote

Another -less straight-forward- example would be the following (taken from):
enter image description here

In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.

edited Nov 21 at 7:54

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

add a comment |

up vote
0
down vote

Another -less straight-forward- example would be the following (taken from):
enter image description here

In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.

edited Nov 21 at 7:54

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

Another -less straight-forward- example would be the following (taken from):
enter image description here

In this scenario, we do have two topics (A and B), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.

edited Nov 21 at 7:54

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

edited Nov 21 at 7:54

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

answered Nov 21 at 7:46

Giorgos Myrianthous

3,64121233

add a comment |

up vote
0
down vote

As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.

Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.

Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.

While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
enter image description here

If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

add a comment |

up vote
0
down vote

As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.

Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.

Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.

While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
enter image description here

If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

add a comment |

up vote
0
down vote

As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.

Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.

Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.

While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
enter image description here

If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.

Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.

Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.

While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
enter image description here

If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

answered Nov 21 at 9:05

Nishu Tayal

11.1k73381

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htykuut