Count unique words in all text files in directory, and delete those having less than 2?











up vote
-1
down vote

favorite












This gets me the count. But how to delete those files having count < 2?



$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4


How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



Thanks for reading.



Update:



The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.










share|improve this question




























    up vote
    -1
    down vote

    favorite












    This gets me the count. But how to delete those files having count < 2?



    $ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
    1
    $ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
    4


    How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



    Thanks for reading.



    Update:



    The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



    A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.










    share|improve this question


























      up vote
      -1
      down vote

      favorite









      up vote
      -1
      down vote

      favorite











      This gets me the count. But how to delete those files having count < 2?



      $ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
      1
      $ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
      4


      How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



      Thanks for reading.



      Update:



      The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



      A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.










      share|improve this question















      This gets me the count. But how to delete those files having count < 2?



      $ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
      1
      $ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
      4


      How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



      Thanks for reading.



      Update:



      The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



      A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.







      bash uniq wc






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 at 3:07

























      asked Nov 21 at 0:22









      Geoffrey Anderson

      534514




      534514
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer



















          • 1




            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
            – David C. Rankin
            Nov 21 at 0:37










          • Yes good point.
            – Red Cricket
            Nov 21 at 0:37










          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
            – Geoffrey Anderson
            Nov 21 at 3:05










          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
            – Red Cricket
            Nov 21 at 3:19













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer



















          • 1




            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
            – David C. Rankin
            Nov 21 at 0:37










          • Yes good point.
            – Red Cricket
            Nov 21 at 0:37










          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
            – Geoffrey Anderson
            Nov 21 at 3:05










          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
            – Red Cricket
            Nov 21 at 3:19

















          up vote
          0
          down vote













          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer



















          • 1




            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
            – David C. Rankin
            Nov 21 at 0:37










          • Yes good point.
            – Red Cricket
            Nov 21 at 0:37










          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
            – Geoffrey Anderson
            Nov 21 at 3:05










          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
            – Red Cricket
            Nov 21 at 3:19















          up vote
          0
          down vote










          up vote
          0
          down vote









          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer














          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 21 at 0:38

























          answered Nov 21 at 0:29









          Red Cricket

          4,03283381




          4,03283381








          • 1




            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
            – David C. Rankin
            Nov 21 at 0:37










          • Yes good point.
            – Red Cricket
            Nov 21 at 0:37










          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
            – Geoffrey Anderson
            Nov 21 at 3:05










          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
            – Red Cricket
            Nov 21 at 3:19
















          • 1




            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
            – David C. Rankin
            Nov 21 at 0:37










          • Yes good point.
            – Red Cricket
            Nov 21 at 0:37










          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
            – Geoffrey Anderson
            Nov 21 at 3:05










          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
            – Red Cricket
            Nov 21 at 3:19










          1




          1




          cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
          – David C. Rankin
          Nov 21 at 0:37




          cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...
          – David C. Rankin
          Nov 21 at 0:37












          Yes good point.
          – Red Cricket
          Nov 21 at 0:37




          Yes good point.
          – Red Cricket
          Nov 21 at 0:37












          The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
          – Geoffrey Anderson
          Nov 21 at 3:05




          The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
          – Geoffrey Anderson
          Nov 21 at 3:05












          You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
          – Red Cricket
          Nov 21 at 3:19






          You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.
          – Red Cricket
          Nov 21 at 3:19




















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Berounka

          Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

          Sphinx de Gizeh