Re-processing data for Elasticsearch with a new pipeline
up vote
0
down vote
favorite
I have an ELK-stack server that is being used to analyse Apache web log data. We're loading ALL of the logs, going back several years. The purpose is to look at some application-specific trends over this time period.
The data-processing pipeline is still being tweaked, as this is the first time anyone has looked in detail into this data and some people are still trying to decide how they want the data to be processed.
Some changes were suggested and while they're easy enough to do in the logstash pipeline for new, incoming data, I'm not sure how to apply these changes to the data that's already in elastic. It took several days to load the current data set, and quite a bit more data has been added so re-processing everything through logstash, with the modified pipeline will probably take several days longer.
What's the best way to apply these changes to data that has already been ingested into elastic? In the early stages of testing this set-up, I would just remove the index and rebuild from scratch, but that was done with very limited data sets and with the amount of data in use here, I'm not sure that's feasible. Is there a better way?
elasticsearch logstash
add a comment |
up vote
0
down vote
favorite
I have an ELK-stack server that is being used to analyse Apache web log data. We're loading ALL of the logs, going back several years. The purpose is to look at some application-specific trends over this time period.
The data-processing pipeline is still being tweaked, as this is the first time anyone has looked in detail into this data and some people are still trying to decide how they want the data to be processed.
Some changes were suggested and while they're easy enough to do in the logstash pipeline for new, incoming data, I'm not sure how to apply these changes to the data that's already in elastic. It took several days to load the current data set, and quite a bit more data has been added so re-processing everything through logstash, with the modified pipeline will probably take several days longer.
What's the best way to apply these changes to data that has already been ingested into elastic? In the early stages of testing this set-up, I would just remove the index and rebuild from scratch, but that was done with very limited data sets and with the amount of data in use here, I'm not sure that's feasible. Is there a better way?
elasticsearch logstash
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have an ELK-stack server that is being used to analyse Apache web log data. We're loading ALL of the logs, going back several years. The purpose is to look at some application-specific trends over this time period.
The data-processing pipeline is still being tweaked, as this is the first time anyone has looked in detail into this data and some people are still trying to decide how they want the data to be processed.
Some changes were suggested and while they're easy enough to do in the logstash pipeline for new, incoming data, I'm not sure how to apply these changes to the data that's already in elastic. It took several days to load the current data set, and quite a bit more data has been added so re-processing everything through logstash, with the modified pipeline will probably take several days longer.
What's the best way to apply these changes to data that has already been ingested into elastic? In the early stages of testing this set-up, I would just remove the index and rebuild from scratch, but that was done with very limited data sets and with the amount of data in use here, I'm not sure that's feasible. Is there a better way?
elasticsearch logstash
I have an ELK-stack server that is being used to analyse Apache web log data. We're loading ALL of the logs, going back several years. The purpose is to look at some application-specific trends over this time period.
The data-processing pipeline is still being tweaked, as this is the first time anyone has looked in detail into this data and some people are still trying to decide how they want the data to be processed.
Some changes were suggested and while they're easy enough to do in the logstash pipeline for new, incoming data, I'm not sure how to apply these changes to the data that's already in elastic. It took several days to load the current data set, and quite a bit more data has been added so re-processing everything through logstash, with the modified pipeline will probably take several days longer.
What's the best way to apply these changes to data that has already been ingested into elastic? In the early stages of testing this set-up, I would just remove the index and rebuild from scratch, but that was done with very limited data sets and with the amount of data in use here, I'm not sure that's feasible. Is there a better way?
elasticsearch logstash
elasticsearch logstash
asked Nov 21 at 21:48
FrustratedWithFormsDesigner
20.7k26115171
20.7k26115171
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
Setup an ingest pipeline and use reindex API to move data from current index to new index (with the pipeline configured for destination index)
Ingest Node
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Setup an ingest pipeline and use reindex API to move data from current index to new index (with the pipeline configured for destination index)
Ingest Node
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
add a comment |
up vote
2
down vote
Setup an ingest pipeline and use reindex API to move data from current index to new index (with the pipeline configured for destination index)
Ingest Node
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
add a comment |
up vote
2
down vote
up vote
2
down vote
Setup an ingest pipeline and use reindex API to move data from current index to new index (with the pipeline configured for destination index)
Ingest Node
Setup an ingest pipeline and use reindex API to move data from current index to new index (with the pipeline configured for destination index)
Ingest Node
answered Nov 22 at 0:31
ben5556
1,602139
1,602139
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
add a comment |
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
It sounds like this would process all the existing data through the updated pipeline, and populating a new index, and then I guess I'd drop the old one when the operation is finished. I guess this is better than reloading all of the log files from scratch. Would it be much faster? I was hoping for a way to update the index in-place, but I guess that's because I'm used to doing that in relational databases. ;)
– FrustratedWithFormsDesigner
Nov 22 at 15:44
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
...or should I create a special pipeline that only has the updates, and reindex existing data through that, while new data goes through the regular (and newly updated) logstash pipeline?
– FrustratedWithFormsDesigner
Nov 22 at 15:59
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Another issue is that some of the changes to the logstash pipeline involve using the Aggregate filter plugin, and it doesn't look like there is any Ingest Node processor that is equivalent to that.
– FrustratedWithFormsDesigner
Nov 22 at 17:00
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
Yep it will be much faster. And yes, create a pipeline that only has the updates and use it while reindexing. New data will continue to use your logstash pipeline. Unfortunately, ingest processors work for most use cases but yes they are not as powerful as logstash pipelines to be able to aggregate etc.
– ben5556
Nov 22 at 18:47
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
To have the same index name with updated data, after reindexing you can take a snapshot of the new index and restore it to your old index name. New index can be deleted after restore.
– ben5556
Nov 22 at 18:49
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420930%2fre-processing-data-for-elasticsearch-with-a-new-pipeline%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown