Pandas - groupby multiple columns and the compare averages of counts












0















I have a 4000 record plus pandas dataframe with records for individual events by time stamp



Timestamp            Date        Holiday  DayOfWeek
2017-01-01 02:25:00 2017-01-01 True Monday
2017-01-01 12:25:00 2017-01-01 True Monday
2017-01-02 03:45:00 2017-01-02 False Tuesday
2017-01-02 15:55:00 2017-01-02 False Tuesday
2017-02-03 01:01:00 2017-02-03 False Thursday
2017-02-03 4:25:00 2017-02-03 False Thursday
2017-04-03 4:25:00 2017-04-03 True Monday


What I'm trying to do is compare the means of events per day by day of the week and if it was on a holiday.

So for each day of the week, compare the the average number of events per day for when that day was a holiday vs when that day was NOT a holiday.



events.groupby(['DayOfWeek', 'Holiday']).count()


Will get me the number of events for each day of the week by holiday



DayOfWeek Holiday  Count
Monday True 50
False 34
Tuesday True 32
False 23
...


But I can't figure out how to combine this with the number of events per individual date



events.groupby('Date').count()
Date Count
01-01-2017 2
01-02-2017 2
01-03-2017 4
....


I want a data frame more like



DayOfWeek Holiday  Mean
Monday True 4.5
False 3.23
Tuesday True 2.1
False 3.2
...


And then ideally make a bar chart from it.



But can't figure out how to combine the operations to create what I want first.










share|improve this question



























    0















    I have a 4000 record plus pandas dataframe with records for individual events by time stamp



    Timestamp            Date        Holiday  DayOfWeek
    2017-01-01 02:25:00 2017-01-01 True Monday
    2017-01-01 12:25:00 2017-01-01 True Monday
    2017-01-02 03:45:00 2017-01-02 False Tuesday
    2017-01-02 15:55:00 2017-01-02 False Tuesday
    2017-02-03 01:01:00 2017-02-03 False Thursday
    2017-02-03 4:25:00 2017-02-03 False Thursday
    2017-04-03 4:25:00 2017-04-03 True Monday


    What I'm trying to do is compare the means of events per day by day of the week and if it was on a holiday.

    So for each day of the week, compare the the average number of events per day for when that day was a holiday vs when that day was NOT a holiday.



    events.groupby(['DayOfWeek', 'Holiday']).count()


    Will get me the number of events for each day of the week by holiday



    DayOfWeek Holiday  Count
    Monday True 50
    False 34
    Tuesday True 32
    False 23
    ...


    But I can't figure out how to combine this with the number of events per individual date



    events.groupby('Date').count()
    Date Count
    01-01-2017 2
    01-02-2017 2
    01-03-2017 4
    ....


    I want a data frame more like



    DayOfWeek Holiday  Mean
    Monday True 4.5
    False 3.23
    Tuesday True 2.1
    False 3.2
    ...


    And then ideally make a bar chart from it.



    But can't figure out how to combine the operations to create what I want first.










    share|improve this question

























      0












      0








      0








      I have a 4000 record plus pandas dataframe with records for individual events by time stamp



      Timestamp            Date        Holiday  DayOfWeek
      2017-01-01 02:25:00 2017-01-01 True Monday
      2017-01-01 12:25:00 2017-01-01 True Monday
      2017-01-02 03:45:00 2017-01-02 False Tuesday
      2017-01-02 15:55:00 2017-01-02 False Tuesday
      2017-02-03 01:01:00 2017-02-03 False Thursday
      2017-02-03 4:25:00 2017-02-03 False Thursday
      2017-04-03 4:25:00 2017-04-03 True Monday


      What I'm trying to do is compare the means of events per day by day of the week and if it was on a holiday.

      So for each day of the week, compare the the average number of events per day for when that day was a holiday vs when that day was NOT a holiday.



      events.groupby(['DayOfWeek', 'Holiday']).count()


      Will get me the number of events for each day of the week by holiday



      DayOfWeek Holiday  Count
      Monday True 50
      False 34
      Tuesday True 32
      False 23
      ...


      But I can't figure out how to combine this with the number of events per individual date



      events.groupby('Date').count()
      Date Count
      01-01-2017 2
      01-02-2017 2
      01-03-2017 4
      ....


      I want a data frame more like



      DayOfWeek Holiday  Mean
      Monday True 4.5
      False 3.23
      Tuesday True 2.1
      False 3.2
      ...


      And then ideally make a bar chart from it.



      But can't figure out how to combine the operations to create what I want first.










      share|improve this question














      I have a 4000 record plus pandas dataframe with records for individual events by time stamp



      Timestamp            Date        Holiday  DayOfWeek
      2017-01-01 02:25:00 2017-01-01 True Monday
      2017-01-01 12:25:00 2017-01-01 True Monday
      2017-01-02 03:45:00 2017-01-02 False Tuesday
      2017-01-02 15:55:00 2017-01-02 False Tuesday
      2017-02-03 01:01:00 2017-02-03 False Thursday
      2017-02-03 4:25:00 2017-02-03 False Thursday
      2017-04-03 4:25:00 2017-04-03 True Monday


      What I'm trying to do is compare the means of events per day by day of the week and if it was on a holiday.

      So for each day of the week, compare the the average number of events per day for when that day was a holiday vs when that day was NOT a holiday.



      events.groupby(['DayOfWeek', 'Holiday']).count()


      Will get me the number of events for each day of the week by holiday



      DayOfWeek Holiday  Count
      Monday True 50
      False 34
      Tuesday True 32
      False 23
      ...


      But I can't figure out how to combine this with the number of events per individual date



      events.groupby('Date').count()
      Date Count
      01-01-2017 2
      01-02-2017 2
      01-03-2017 4
      ....


      I want a data frame more like



      DayOfWeek Holiday  Mean
      Monday True 4.5
      False 3.23
      Tuesday True 2.1
      False 3.2
      ...


      And then ideally make a bar chart from it.



      But can't figure out how to combine the operations to create what I want first.







      python pandas data-science






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 18:26









      crackernuttercrackernutter

      444




      444
























          1 Answer
          1






          active

          oldest

          votes


















          1














          If I understand correctly, what you're looking for should be



          df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']


          First we group by date (and DayOfWeek and Holiday to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek and Holiday and calculate the mean.



          For the sample data you provided this results in



          DayOfWeek  Holiday
          Monday True 1.5
          Thursday False 2.0
          Tuesday False 2.0





          share|improve this answer



















          • 1





            Thanks! That was a lot easier than I imaged it would be....

            – crackernutter
            Nov 26 '18 at 15:09











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451453%2fpandas-groupby-multiple-columns-and-the-compare-averages-of-counts%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If I understand correctly, what you're looking for should be



          df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']


          First we group by date (and DayOfWeek and Holiday to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek and Holiday and calculate the mean.



          For the sample data you provided this results in



          DayOfWeek  Holiday
          Monday True 1.5
          Thursday False 2.0
          Tuesday False 2.0





          share|improve this answer



















          • 1





            Thanks! That was a lot easier than I imaged it would be....

            – crackernutter
            Nov 26 '18 at 15:09
















          1














          If I understand correctly, what you're looking for should be



          df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']


          First we group by date (and DayOfWeek and Holiday to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek and Holiday and calculate the mean.



          For the sample data you provided this results in



          DayOfWeek  Holiday
          Monday True 1.5
          Thursday False 2.0
          Tuesday False 2.0





          share|improve this answer



















          • 1





            Thanks! That was a lot easier than I imaged it would be....

            – crackernutter
            Nov 26 '18 at 15:09














          1












          1








          1







          If I understand correctly, what you're looking for should be



          df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']


          First we group by date (and DayOfWeek and Holiday to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek and Holiday and calculate the mean.



          For the sample data you provided this results in



          DayOfWeek  Holiday
          Monday True 1.5
          Thursday False 2.0
          Tuesday False 2.0





          share|improve this answer













          If I understand correctly, what you're looking for should be



          df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']


          First we group by date (and DayOfWeek and Holiday to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek and Holiday and calculate the mean.



          For the sample data you provided this results in



          DayOfWeek  Holiday
          Monday True 1.5
          Thursday False 2.0
          Tuesday False 2.0






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 23 '18 at 18:51









          andersourceandersource

          51418




          51418








          • 1





            Thanks! That was a lot easier than I imaged it would be....

            – crackernutter
            Nov 26 '18 at 15:09














          • 1





            Thanks! That was a lot easier than I imaged it would be....

            – crackernutter
            Nov 26 '18 at 15:09








          1




          1





          Thanks! That was a lot easier than I imaged it would be....

          – crackernutter
          Nov 26 '18 at 15:09





          Thanks! That was a lot easier than I imaged it would be....

          – crackernutter
          Nov 26 '18 at 15:09


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451453%2fpandas-groupby-multiple-columns-and-the-compare-averages-of-counts%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Berounka

          Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

          Sphinx de Gizeh