How to design a table in Cassandra with event date and four column to filter
Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.
I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?
cassandra
add a comment |
Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.
I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?
cassandra
1
If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26
I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26
You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30
add a comment |
Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.
I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?
cassandra
Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.
I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?
cassandra
cassandra
asked Nov 22 at 17:20
Ignatius J. Reilly
396
396
1
If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26
I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26
You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30
add a comment |
1
If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26
I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26
You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30
1
1
If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26
If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26
I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26
I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26
You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30
You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435759%2fhow-to-design-a-table-in-cassandra-with-event-date-and-four-column-to-filter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435759%2fhow-to-design-a-table-in-cassandra-with-event-date-and-four-column-to-filter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26
I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26
You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30