Continuously INFO JobScheduler:59 - Added jobs for time *** ms in my Spark Standalone Cluster












1














We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration.



Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears in console.



2016-03-29 11:35:25,044  INFO TaskSchedulerImpl:59 - Removed TaskSet 18.0, whose tasks have all completed, from pool 
2016-03-29 11:35:25,044 INFO DAGScheduler:59 - Job 18 finished: foreachRDD at EventProcessor.java:87, took 1.128755 s
2016-03-29 11:35:31,471 INFO JobScheduler:59 - Added jobs for time 1459231530000 ms
2016-03-29 11:35:35,004 INFO JobScheduler:59 - Added jobs for time 1459231535000 ms
2016-03-29 11:35:40,004 INFO JobScheduler:59 - Added jobs for time 1459231540000 ms
2016-03-29 11:35:45,136 INFO JobScheduler:59 - Added jobs for time 1459231545000 ms
2016-03-29 11:35:50,011 INFO JobScheduler:59 - Added jobs for time 1459231550000 ms
2016-03-29 11:35:55,004 INFO JobScheduler:59 - Added jobs for time 1459231555000 ms
2016-03-29 11:36:00,014 INFO JobScheduler:59 - Added jobs for time 1459231560000 ms
2016-03-29 11:36:05,003 INFO JobScheduler:59 - Added jobs for time 1459231565000 ms
2016-03-29 11:36:10,087 INFO JobScheduler:59 - Added jobs for time 1459231570000 ms
2016-03-29 11:36:15,004 INFO JobScheduler:59 - Added jobs for time 1459231575000 ms
2016-03-29 11:36:20,004 INFO JobScheduler:59 - Added jobs for time 1459231580000 ms
2016-03-29 11:36:25,139 INFO JobScheduler:59 - Added jobs for time 1459231585000 ms


Can you please help, how to solve this problem.










share|improve this question





























    1














    We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration.



    Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears in console.



    2016-03-29 11:35:25,044  INFO TaskSchedulerImpl:59 - Removed TaskSet 18.0, whose tasks have all completed, from pool 
    2016-03-29 11:35:25,044 INFO DAGScheduler:59 - Job 18 finished: foreachRDD at EventProcessor.java:87, took 1.128755 s
    2016-03-29 11:35:31,471 INFO JobScheduler:59 - Added jobs for time 1459231530000 ms
    2016-03-29 11:35:35,004 INFO JobScheduler:59 - Added jobs for time 1459231535000 ms
    2016-03-29 11:35:40,004 INFO JobScheduler:59 - Added jobs for time 1459231540000 ms
    2016-03-29 11:35:45,136 INFO JobScheduler:59 - Added jobs for time 1459231545000 ms
    2016-03-29 11:35:50,011 INFO JobScheduler:59 - Added jobs for time 1459231550000 ms
    2016-03-29 11:35:55,004 INFO JobScheduler:59 - Added jobs for time 1459231555000 ms
    2016-03-29 11:36:00,014 INFO JobScheduler:59 - Added jobs for time 1459231560000 ms
    2016-03-29 11:36:05,003 INFO JobScheduler:59 - Added jobs for time 1459231565000 ms
    2016-03-29 11:36:10,087 INFO JobScheduler:59 - Added jobs for time 1459231570000 ms
    2016-03-29 11:36:15,004 INFO JobScheduler:59 - Added jobs for time 1459231575000 ms
    2016-03-29 11:36:20,004 INFO JobScheduler:59 - Added jobs for time 1459231580000 ms
    2016-03-29 11:36:25,139 INFO JobScheduler:59 - Added jobs for time 1459231585000 ms


    Can you please help, how to solve this problem.










    share|improve this question



























      1












      1








      1


      1





      We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration.



      Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears in console.



      2016-03-29 11:35:25,044  INFO TaskSchedulerImpl:59 - Removed TaskSet 18.0, whose tasks have all completed, from pool 
      2016-03-29 11:35:25,044 INFO DAGScheduler:59 - Job 18 finished: foreachRDD at EventProcessor.java:87, took 1.128755 s
      2016-03-29 11:35:31,471 INFO JobScheduler:59 - Added jobs for time 1459231530000 ms
      2016-03-29 11:35:35,004 INFO JobScheduler:59 - Added jobs for time 1459231535000 ms
      2016-03-29 11:35:40,004 INFO JobScheduler:59 - Added jobs for time 1459231540000 ms
      2016-03-29 11:35:45,136 INFO JobScheduler:59 - Added jobs for time 1459231545000 ms
      2016-03-29 11:35:50,011 INFO JobScheduler:59 - Added jobs for time 1459231550000 ms
      2016-03-29 11:35:55,004 INFO JobScheduler:59 - Added jobs for time 1459231555000 ms
      2016-03-29 11:36:00,014 INFO JobScheduler:59 - Added jobs for time 1459231560000 ms
      2016-03-29 11:36:05,003 INFO JobScheduler:59 - Added jobs for time 1459231565000 ms
      2016-03-29 11:36:10,087 INFO JobScheduler:59 - Added jobs for time 1459231570000 ms
      2016-03-29 11:36:15,004 INFO JobScheduler:59 - Added jobs for time 1459231575000 ms
      2016-03-29 11:36:20,004 INFO JobScheduler:59 - Added jobs for time 1459231580000 ms
      2016-03-29 11:36:25,139 INFO JobScheduler:59 - Added jobs for time 1459231585000 ms


      Can you please help, how to solve this problem.










      share|improve this question















      We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration.



      Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears in console.



      2016-03-29 11:35:25,044  INFO TaskSchedulerImpl:59 - Removed TaskSet 18.0, whose tasks have all completed, from pool 
      2016-03-29 11:35:25,044 INFO DAGScheduler:59 - Job 18 finished: foreachRDD at EventProcessor.java:87, took 1.128755 s
      2016-03-29 11:35:31,471 INFO JobScheduler:59 - Added jobs for time 1459231530000 ms
      2016-03-29 11:35:35,004 INFO JobScheduler:59 - Added jobs for time 1459231535000 ms
      2016-03-29 11:35:40,004 INFO JobScheduler:59 - Added jobs for time 1459231540000 ms
      2016-03-29 11:35:45,136 INFO JobScheduler:59 - Added jobs for time 1459231545000 ms
      2016-03-29 11:35:50,011 INFO JobScheduler:59 - Added jobs for time 1459231550000 ms
      2016-03-29 11:35:55,004 INFO JobScheduler:59 - Added jobs for time 1459231555000 ms
      2016-03-29 11:36:00,014 INFO JobScheduler:59 - Added jobs for time 1459231560000 ms
      2016-03-29 11:36:05,003 INFO JobScheduler:59 - Added jobs for time 1459231565000 ms
      2016-03-29 11:36:10,087 INFO JobScheduler:59 - Added jobs for time 1459231570000 ms
      2016-03-29 11:36:15,004 INFO JobScheduler:59 - Added jobs for time 1459231575000 ms
      2016-03-29 11:36:20,004 INFO JobScheduler:59 - Added jobs for time 1459231580000 ms
      2016-03-29 11:36:25,139 INFO JobScheduler:59 - Added jobs for time 1459231585000 ms


      Can you please help, how to solve this problem.







      apache-spark spark-streaming apache-spark-standalone






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 '18 at 14:24









      user10465355

      1,3971413




      1,3971413










      asked Mar 29 '16 at 10:26









      Charan AdabalaCharan Adabala

      5916




      5916
























          3 Answers
          3






          active

          oldest

          votes


















          2














          Change the spark-submit master from local to local[2]



          spark-submit --master local[2] --class YOURPROGRAM YOUR.jar


          Or set



          new SparkConf().setAppName("SparkStreamingExample").setMaster("local[2]")


          If you still facing the same problem after changing the number to 2, maybe you should just change it to a bigger number.



          Reference:
          http://spark.apache.org/docs/latest/streaming-programming-guide.html



          When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > the number of receivers to run (see Spark Properties for information on how to set the master).



          Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise, the system will receive data, but not be able to process them.



          Credit to bit1129: http://bit1129.iteye.com/blog/2174751






          share|improve this answer





























            1














            I solved this problem by setting master from local to local[2]. Following related quote is from spark streaming doc:




            But note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).







            share|improve this answer























            • thx for your suggestions.
              – LU KONG
              May 5 '17 at 6:32



















            -1














            It's not a problem indeed, these INFOs are just log messages which you can avoid by changing log level from INFO to WARN or ERROR in conf/log4j.properties.



            Spark Streaming would buffer your input data into small batches and submit the batch of input for execution periodically, therefore not a problem here.






            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f36281665%2fcontinuously-info-jobscheduler59-added-jobs-for-time-ms-in-my-spark-stand%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              Change the spark-submit master from local to local[2]



              spark-submit --master local[2] --class YOURPROGRAM YOUR.jar


              Or set



              new SparkConf().setAppName("SparkStreamingExample").setMaster("local[2]")


              If you still facing the same problem after changing the number to 2, maybe you should just change it to a bigger number.



              Reference:
              http://spark.apache.org/docs/latest/streaming-programming-guide.html



              When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > the number of receivers to run (see Spark Properties for information on how to set the master).



              Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise, the system will receive data, but not be able to process them.



              Credit to bit1129: http://bit1129.iteye.com/blog/2174751






              share|improve this answer


























                2














                Change the spark-submit master from local to local[2]



                spark-submit --master local[2] --class YOURPROGRAM YOUR.jar


                Or set



                new SparkConf().setAppName("SparkStreamingExample").setMaster("local[2]")


                If you still facing the same problem after changing the number to 2, maybe you should just change it to a bigger number.



                Reference:
                http://spark.apache.org/docs/latest/streaming-programming-guide.html



                When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > the number of receivers to run (see Spark Properties for information on how to set the master).



                Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise, the system will receive data, but not be able to process them.



                Credit to bit1129: http://bit1129.iteye.com/blog/2174751






                share|improve this answer
























                  2












                  2








                  2






                  Change the spark-submit master from local to local[2]



                  spark-submit --master local[2] --class YOURPROGRAM YOUR.jar


                  Or set



                  new SparkConf().setAppName("SparkStreamingExample").setMaster("local[2]")


                  If you still facing the same problem after changing the number to 2, maybe you should just change it to a bigger number.



                  Reference:
                  http://spark.apache.org/docs/latest/streaming-programming-guide.html



                  When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > the number of receivers to run (see Spark Properties for information on how to set the master).



                  Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise, the system will receive data, but not be able to process them.



                  Credit to bit1129: http://bit1129.iteye.com/blog/2174751






                  share|improve this answer












                  Change the spark-submit master from local to local[2]



                  spark-submit --master local[2] --class YOURPROGRAM YOUR.jar


                  Or set



                  new SparkConf().setAppName("SparkStreamingExample").setMaster("local[2]")


                  If you still facing the same problem after changing the number to 2, maybe you should just change it to a bigger number.



                  Reference:
                  http://spark.apache.org/docs/latest/streaming-programming-guide.html



                  When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > the number of receivers to run (see Spark Properties for information on how to set the master).



                  Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise, the system will receive data, but not be able to process them.



                  Credit to bit1129: http://bit1129.iteye.com/blog/2174751







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Oct 14 '16 at 9:29









                  Wei ChenWei Chen

                  213




                  213

























                      1














                      I solved this problem by setting master from local to local[2]. Following related quote is from spark streaming doc:




                      But note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).







                      share|improve this answer























                      • thx for your suggestions.
                        – LU KONG
                        May 5 '17 at 6:32
















                      1














                      I solved this problem by setting master from local to local[2]. Following related quote is from spark streaming doc:




                      But note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).







                      share|improve this answer























                      • thx for your suggestions.
                        – LU KONG
                        May 5 '17 at 6:32














                      1












                      1








                      1






                      I solved this problem by setting master from local to local[2]. Following related quote is from spark streaming doc:




                      But note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).







                      share|improve this answer














                      I solved this problem by setting master from local to local[2]. Following related quote is from spark streaming doc:




                      But note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).








                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited May 5 '17 at 9:09









                      Amit Joshi

                      5,50252653




                      5,50252653










                      answered May 5 '17 at 3:09









                      LU KONGLU KONG

                      4110




                      4110












                      • thx for your suggestions.
                        – LU KONG
                        May 5 '17 at 6:32


















                      • thx for your suggestions.
                        – LU KONG
                        May 5 '17 at 6:32
















                      thx for your suggestions.
                      – LU KONG
                      May 5 '17 at 6:32




                      thx for your suggestions.
                      – LU KONG
                      May 5 '17 at 6:32











                      -1














                      It's not a problem indeed, these INFOs are just log messages which you can avoid by changing log level from INFO to WARN or ERROR in conf/log4j.properties.



                      Spark Streaming would buffer your input data into small batches and submit the batch of input for execution periodically, therefore not a problem here.






                      share|improve this answer


























                        -1














                        It's not a problem indeed, these INFOs are just log messages which you can avoid by changing log level from INFO to WARN or ERROR in conf/log4j.properties.



                        Spark Streaming would buffer your input data into small batches and submit the batch of input for execution periodically, therefore not a problem here.






                        share|improve this answer
























                          -1












                          -1








                          -1






                          It's not a problem indeed, these INFOs are just log messages which you can avoid by changing log level from INFO to WARN or ERROR in conf/log4j.properties.



                          Spark Streaming would buffer your input data into small batches and submit the batch of input for execution periodically, therefore not a problem here.






                          share|improve this answer












                          It's not a problem indeed, these INFOs are just log messages which you can avoid by changing log level from INFO to WARN or ERROR in conf/log4j.properties.



                          Spark Streaming would buffer your input data into small batches and submit the batch of input for execution periodically, therefore not a problem here.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Mar 29 '16 at 15:42









                          Yijie ShenYijie Shen

                          4,77512134




                          4,77512134






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f36281665%2fcontinuously-info-jobscheduler59-added-jobs-for-time-ms-in-my-spark-stand%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Berounka

                              Sphinx de Gizeh

                              Different font size/position of beamer's navigation symbols template's content depending on regular/plain...