Count unique words in all text files in directory, and delete those having less than 2?
up vote
-1
down vote
favorite
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
add a comment |
up vote
-1
down vote
favorite
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
bash uniq wc
edited Nov 21 at 3:07
asked Nov 21 at 0:22
Geoffrey Anderson
534514
534514
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 at 3:19
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 at 3:19
add a comment |
up vote
0
down vote
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 at 3:19
add a comment |
up vote
0
down vote
up vote
0
down vote
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
edited Nov 21 at 0:38
answered Nov 21 at 0:29
Red Cricket
4,03283381
4,03283381
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 at 3:19
add a comment |
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 at 3:19
1
1
cat ./a1esso.doc
is an Unnecessary Use Of cat
(UUOc). Instead grep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
cat ./a1esso.doc
is an Unnecessary Use Of cat
(UUOc). Instead grep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
Yes good point.
– Red Cricket
Nov 21 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 at 3:05
You don't need
cat
to "feed filenames". grep
takes a filename as an argument. cat file > grep ...
is equivalent to grep … file
, it is just that for former is consider bad form.– Red Cricket
Nov 21 at 3:19
You don't need
cat
to "feed filenames". grep
takes a filename as an argument. cat file > grep ...
is equivalent to grep … file
, it is just that for former is consider bad form.– Red Cricket
Nov 21 at 3:19
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown