linux – Check if folder contains files with extensions and write directories into categories

unary operator expected is because [ and * (in your *fastq.gz) work independently.

[ is not shell syntax. [ is a regular command (a builtin in Bash, but still a command) and ] is its last argument, a mandatory one. Anything in between is an argument too.

The shell expands /path/to/dir/*fastq.gz to one or more words before it calls [. [ will see these words plus the mandatory ] as arguments. Depending on the number of arguments and what they are, [ expects zero or more argument(s) to be operators (like -f).

Your [ /path/to/dir/*fastq.gz ] will be valid if /path/to/dir/*fastq.gz expands to a single argument (note “will be valid” is not equivalent to “will do what you want”). This includes cases where * matches nothing; traditionally (and by default in Bash) if there is no match then /path/to/dir/*fastq.gz will be processed as-is. It may happen /path/to/dir/*fastq.gz expands to multiple words, none of them will look like an operator [ understands. The error you got is most likely from a case where the pattern expanded to two words.

Later you used [ "$in"/*spring -f ]. This is even worse, because you probably wanted something like [ -f some/path ] where -f is before the path to test. Still [ -f "$in"/*spring ] is not a robust fix because "$in"/*spring in general may expand to multiple arguments and [ will not stand them. You wrote there is at most one *spring file per directory, so in your case code like this may kinda work; it’s still poor code though.

With [, do not use wildcards like * that may expand to multiple words; this will fail immediately or soon. [[ is different under the hood but it’s not good for your purpose either.

You want to know how many files a pattern like /path/to/dir/*fastq.gz matches. The right way to do it is to assign the result of the expansion to an array. Portably there’s only one array: the array of arguments of the shell script (or the shell function); and you need extra code to detect a case of zero matches (that still generates one word: the unexpanded pattern string). Your question is tagged , so I will use a named array and few other non-portable functionalities:

# non-portable code, works in Bash
check_dir () (
   dir="${1-.}"
   dir="${dir%/}/"
   [ -d "$dir" ] || { echo "Not a directory." >&2; return 1; }
   shopt -s nullglob
   files=( "$dir"/*fastq.gz )
   nf="${#files[@]}"
   files=( "$dir"/*spring )
   ns="${#files[@]}"
   printf '%s\t%s\t%s\n' "$nf" "$ns" "$dir"
)

Usage: check_dir path/to/dir or check_dir (the default path is .). The function will print the number of *fastq.gz files, a tab, the number of *spring files, a tab, finally the examined path (printed with a trailing /).

Now you can analyze a directory tree (the below function requires the above function to be defined):

# non-portable code, works in Bash
check_dirs () (
   dir="${1-.}"
   dir="${dir%/}/"
   [ -d "$dir" ] || { echo "Not a directory." >&2; return 1; }
   shopt -s nullglob globstar
   for d in "$dir"**/; do
      check_dir "$d"
   done
)

Usage: check_dirs path/to/dir or check_dirs (the default path is .).

Notes:

  • For a large directory tree check_dirs may seem to initially stall. This is because for d in "$dir"**/ needs to be fully expanded before check_dir is ever called and prints anything.

  • The functions are deliberately defined as subshells (check_dir () ( as opposed to check_dir () {), so shell options (shopt) and all variables are local.

  • If you want check_dir to count hidden files, you need dotglob in this function (i.e. shopt -s nullglob dotglob).

  • If you want check_dirs to descend to hidden directories, you need dotglob in this function (i.e. shopt -s nullglob globstar dotglob).

  • Unless the names of your directories contain newline characters, the output from check_dir or check_dirs is easily parsable with standard tools. Useful commands: sort -n, grep $'^2\t1\t', cut -f 3-.

    E.g. to find directories under ./ with exactly one *fastq.gz file and exactly zero *spring files:

    check_dirs | grep $'^1\t0\t' | cut -f 3-
    

Read more here: Source link