How to run my tokeniser functions in lists - module object is not callable?












1















Task: In the code cell below write code to run both the NLTK_Tokenise and your own Tokenise function on a sample of 10 sentences from the Reuters corpus.



I've got written the following code:



import pandas as pd
sample_size=10
r_list=

for sentence in rcr.sample_raw_sents(sample_size):
r_list.append(sentence)

my_list = r_list

????
my_list=[i.split(tokenise) for i in my_list]
r_list=[i.split(nltk.tokenize) for i in r_list]

pd.DataFrame(list(zip(my_list,r_list)),columns=["MINE","NLTK"])


I have also considered (from just past the "????"):



my_list = [i.split() for i in my_list]
r_list = [i.split() for i in r_list]

tok = tokenise(my_list)
cortok = nltk.tokenize(r_list)

pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


Now I've got 2 lists with the same corpus information, and I want to apply my functions to said lists, though I can't figure out any way that allows me to apply functions rather than strings etc. Should I just copy & paste my tokenisers as strings, I'm sure there would be a better way to do this. For the second option I doubt I'll need the 2 separate lists and can tokenise the one list and attach it to new variables.



Further progress if anyone helps:



import pandas as pd
sample_size=10
r_list=

for sentence in rcr.sample_raw_sents(sample_size):
r_list.append(sentence)

new_list = [i.split()[0] for i in r_list]

tok = tokenise(new_list)
cortok = nltk.tokenize(new_list)

pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


What I think I want to do is separate the list into different variables to then make a DataFrame with a size of 10 (sample_size). Though I have no idea how to split a list of length into different variables unless I literally go 1,2,3,4,...,10 independently.



So I've gotten even further progress, I've realised I will have to use map():



import pandas as pd
sample_size=10
r_list=

for sentence in rcr.sample_raw_sents(sample_size):
r_list.append(sentence)

tok = map(tokenise,r_list)
cortok = map(nltk.tokenize,r_list)

pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


Though something is still wrong with my final line. TypeError: 'module' object is not callable. I've googled it though still not entirely sure what the problem is. pandas has already been imported?



I've now realised I had a silly error where I input nltk.tokenize rather than word_tokenize.










share|improve this question





























    1















    Task: In the code cell below write code to run both the NLTK_Tokenise and your own Tokenise function on a sample of 10 sentences from the Reuters corpus.



    I've got written the following code:



    import pandas as pd
    sample_size=10
    r_list=

    for sentence in rcr.sample_raw_sents(sample_size):
    r_list.append(sentence)

    my_list = r_list

    ????
    my_list=[i.split(tokenise) for i in my_list]
    r_list=[i.split(nltk.tokenize) for i in r_list]

    pd.DataFrame(list(zip(my_list,r_list)),columns=["MINE","NLTK"])


    I have also considered (from just past the "????"):



    my_list = [i.split() for i in my_list]
    r_list = [i.split() for i in r_list]

    tok = tokenise(my_list)
    cortok = nltk.tokenize(r_list)

    pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


    Now I've got 2 lists with the same corpus information, and I want to apply my functions to said lists, though I can't figure out any way that allows me to apply functions rather than strings etc. Should I just copy & paste my tokenisers as strings, I'm sure there would be a better way to do this. For the second option I doubt I'll need the 2 separate lists and can tokenise the one list and attach it to new variables.



    Further progress if anyone helps:



    import pandas as pd
    sample_size=10
    r_list=

    for sentence in rcr.sample_raw_sents(sample_size):
    r_list.append(sentence)

    new_list = [i.split()[0] for i in r_list]

    tok = tokenise(new_list)
    cortok = nltk.tokenize(new_list)

    pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


    What I think I want to do is separate the list into different variables to then make a DataFrame with a size of 10 (sample_size). Though I have no idea how to split a list of length into different variables unless I literally go 1,2,3,4,...,10 independently.



    So I've gotten even further progress, I've realised I will have to use map():



    import pandas as pd
    sample_size=10
    r_list=

    for sentence in rcr.sample_raw_sents(sample_size):
    r_list.append(sentence)

    tok = map(tokenise,r_list)
    cortok = map(nltk.tokenize,r_list)

    pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


    Though something is still wrong with my final line. TypeError: 'module' object is not callable. I've googled it though still not entirely sure what the problem is. pandas has already been imported?



    I've now realised I had a silly error where I input nltk.tokenize rather than word_tokenize.










    share|improve this question



























      1












      1








      1








      Task: In the code cell below write code to run both the NLTK_Tokenise and your own Tokenise function on a sample of 10 sentences from the Reuters corpus.



      I've got written the following code:



      import pandas as pd
      sample_size=10
      r_list=

      for sentence in rcr.sample_raw_sents(sample_size):
      r_list.append(sentence)

      my_list = r_list

      ????
      my_list=[i.split(tokenise) for i in my_list]
      r_list=[i.split(nltk.tokenize) for i in r_list]

      pd.DataFrame(list(zip(my_list,r_list)),columns=["MINE","NLTK"])


      I have also considered (from just past the "????"):



      my_list = [i.split() for i in my_list]
      r_list = [i.split() for i in r_list]

      tok = tokenise(my_list)
      cortok = nltk.tokenize(r_list)

      pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


      Now I've got 2 lists with the same corpus information, and I want to apply my functions to said lists, though I can't figure out any way that allows me to apply functions rather than strings etc. Should I just copy & paste my tokenisers as strings, I'm sure there would be a better way to do this. For the second option I doubt I'll need the 2 separate lists and can tokenise the one list and attach it to new variables.



      Further progress if anyone helps:



      import pandas as pd
      sample_size=10
      r_list=

      for sentence in rcr.sample_raw_sents(sample_size):
      r_list.append(sentence)

      new_list = [i.split()[0] for i in r_list]

      tok = tokenise(new_list)
      cortok = nltk.tokenize(new_list)

      pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


      What I think I want to do is separate the list into different variables to then make a DataFrame with a size of 10 (sample_size). Though I have no idea how to split a list of length into different variables unless I literally go 1,2,3,4,...,10 independently.



      So I've gotten even further progress, I've realised I will have to use map():



      import pandas as pd
      sample_size=10
      r_list=

      for sentence in rcr.sample_raw_sents(sample_size):
      r_list.append(sentence)

      tok = map(tokenise,r_list)
      cortok = map(nltk.tokenize,r_list)

      pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


      Though something is still wrong with my final line. TypeError: 'module' object is not callable. I've googled it though still not entirely sure what the problem is. pandas has already been imported?



      I've now realised I had a silly error where I input nltk.tokenize rather than word_tokenize.










      share|improve this question
















      Task: In the code cell below write code to run both the NLTK_Tokenise and your own Tokenise function on a sample of 10 sentences from the Reuters corpus.



      I've got written the following code:



      import pandas as pd
      sample_size=10
      r_list=

      for sentence in rcr.sample_raw_sents(sample_size):
      r_list.append(sentence)

      my_list = r_list

      ????
      my_list=[i.split(tokenise) for i in my_list]
      r_list=[i.split(nltk.tokenize) for i in r_list]

      pd.DataFrame(list(zip(my_list,r_list)),columns=["MINE","NLTK"])


      I have also considered (from just past the "????"):



      my_list = [i.split() for i in my_list]
      r_list = [i.split() for i in r_list]

      tok = tokenise(my_list)
      cortok = nltk.tokenize(r_list)

      pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


      Now I've got 2 lists with the same corpus information, and I want to apply my functions to said lists, though I can't figure out any way that allows me to apply functions rather than strings etc. Should I just copy & paste my tokenisers as strings, I'm sure there would be a better way to do this. For the second option I doubt I'll need the 2 separate lists and can tokenise the one list and attach it to new variables.



      Further progress if anyone helps:



      import pandas as pd
      sample_size=10
      r_list=

      for sentence in rcr.sample_raw_sents(sample_size):
      r_list.append(sentence)

      new_list = [i.split()[0] for i in r_list]

      tok = tokenise(new_list)
      cortok = nltk.tokenize(new_list)

      pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


      What I think I want to do is separate the list into different variables to then make a DataFrame with a size of 10 (sample_size). Though I have no idea how to split a list of length into different variables unless I literally go 1,2,3,4,...,10 independently.



      So I've gotten even further progress, I've realised I will have to use map():



      import pandas as pd
      sample_size=10
      r_list=

      for sentence in rcr.sample_raw_sents(sample_size):
      r_list.append(sentence)

      tok = map(tokenise,r_list)
      cortok = map(nltk.tokenize,r_list)

      pd.DataFrame(list(zip(tok,cortok)),columns=["MINE","NLTK"])


      Though something is still wrong with my final line. TypeError: 'module' object is not callable. I've googled it though still not entirely sure what the problem is. pandas has already been imported?



      I've now realised I had a silly error where I input nltk.tokenize rather than word_tokenize.







      python pandas list module token






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 24 '18 at 14:51







      bemzoo

















      asked Nov 23 '18 at 16:51









      bemzoobemzoo

      6611




      6611
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Make use of map():



          from nltk.tokenize import word_tokenize
          import pandas as pd
          sample_size=10
          r_list=

          for sentence in rcr.sample_raw_sents(sample_size):
          r_list.append(sentence)

          tok = map(tokenise,r_list)
          cortok = map(word_tokenize,r_list)

          pd.DataFrame(list(zip_longest(tok,cortok)),columns=["MINE", "NLTK"])





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450425%2fhow-to-run-my-tokeniser-functions-in-lists-module-object-is-not-callable%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Make use of map():



            from nltk.tokenize import word_tokenize
            import pandas as pd
            sample_size=10
            r_list=

            for sentence in rcr.sample_raw_sents(sample_size):
            r_list.append(sentence)

            tok = map(tokenise,r_list)
            cortok = map(word_tokenize,r_list)

            pd.DataFrame(list(zip_longest(tok,cortok)),columns=["MINE", "NLTK"])





            share|improve this answer




























              0














              Make use of map():



              from nltk.tokenize import word_tokenize
              import pandas as pd
              sample_size=10
              r_list=

              for sentence in rcr.sample_raw_sents(sample_size):
              r_list.append(sentence)

              tok = map(tokenise,r_list)
              cortok = map(word_tokenize,r_list)

              pd.DataFrame(list(zip_longest(tok,cortok)),columns=["MINE", "NLTK"])





              share|improve this answer


























                0












                0








                0







                Make use of map():



                from nltk.tokenize import word_tokenize
                import pandas as pd
                sample_size=10
                r_list=

                for sentence in rcr.sample_raw_sents(sample_size):
                r_list.append(sentence)

                tok = map(tokenise,r_list)
                cortok = map(word_tokenize,r_list)

                pd.DataFrame(list(zip_longest(tok,cortok)),columns=["MINE", "NLTK"])





                share|improve this answer













                Make use of map():



                from nltk.tokenize import word_tokenize
                import pandas as pd
                sample_size=10
                r_list=

                for sentence in rcr.sample_raw_sents(sample_size):
                r_list.append(sentence)

                tok = map(tokenise,r_list)
                cortok = map(word_tokenize,r_list)

                pd.DataFrame(list(zip_longest(tok,cortok)),columns=["MINE", "NLTK"])






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 24 '18 at 14:51









                bemzoobemzoo

                6611




                6611






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450425%2fhow-to-run-my-tokeniser-functions-in-lists-module-object-is-not-callable%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Berounka

                    Sphinx de Gizeh

                    Different font size/position of beamer's navigation symbols template's content depending on regular/plain...