开发者

Regarding UNIX Grep Command

开发者 https://www.devze.com 2022-12-21 07:32 出处:网络
I need to write a shell script that pick all the files (not directories)in /exp/files directory.For each file inside the directory I want to find whether the last line of file is received. The last li

I need to write a shell script that pick all the files (not directories) in /exp/files directory. For each file inside the directory I want to find whether the last line of file is received . The last line in the file is a trailer record. Also the third field in the last line is the number of data records count i.e 2315 (Total Number of lines in the file -2 (header,trailer) ) . In my unix shell script i want to check whether the last line is a trailer record by checking T and want to check whether the number of lines in the file is equal to (2315+2). If this is successful then i want to move the file to a different directory /exp/ready.

tail -1 test.csv 
T,Test.csv,2315,80045.96

Also in the inputfile sometimes 0 or 1 more fields of trailer record can be within double quotes

"T","Test.csv","2315","80045.96"
"T", 开发者_如何学JAVATest.csv, 2212,"80045.96"
T,Test.csv,2315,80045.96


You can test for the presence of the last line with the following:

tail -1 ${filename} | egrep '^T,|^"T",' >/dev/null 2>&1
rc=$?

At that point $rc will be 0 if the line started with either T, or "T",, assuming that's enough to catch the trailer record.

Once you've established that, you can extract the line count with:

lc=$(cat ${filename} | wc -l)

and you can get the expected line count with:

elc=$(tail -1 ${filename} | awk -F, '{sub(/^"/,"",$3);print 2+$3}')

and compare the two.

So, tying all that together, this would be a good start. It outputs the file itself (my test files num[1-9].tst) along with a message indicating whether the file is okay or why it is not okay.

#!/bin/bash
cd /exp/files
for fspec in *.tst ; do
    if [[ -f ${fspec} ]] ; then
        cat ${fspec} | sed 's/^/   /'
        tail -1 ${fspec} | egrep '^T,|^"T",' >/dev/null 2>&1
        rc=$?
        if [[ ${rc} -eq 0 ]] ; then
            lc=$(cat ${fspec} | wc -l)
            elc=$(tail -1 ${fspec} | awk -F, '{sub(/^"/,"",$3);print 2+$3}')
            if [[ ${lc} -eq ${elc} ]] ; then
                echo '***' File ${fspec} is done and dusted.
            else
                echo '***' File ${fspec} line count mismatch: ${lc}/${elc}.
            fi
        else
            echo '***' File ${fspec} has no valid trailer.
        fi
    else
        ls -ald ${fspec} | sed 's/^/   /'
        echo '***' File ${fspec} is not a regular file.
    fi
done

The sample run, showing the test files I used:

   H,Test.csv,other rubbish goes here
   this file does not have a trailer
*** File num1.tst has no valid trailer.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes and correct count
   "T","Test.csv","1","80045.96"
*** File num2.tst is done and dusted.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes but bad count
   "T","Test.csv","9","80045.96"
*** File num3.tst line count mismatch: 3/11.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes except T, and correct count
   T,"Test.csv","1","80045.96"
*** File num4.tst is done and dusted.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with no quotes on T or count and correct count
   T,"Test.csv",1,"80045.96"
*** File num5.tst is done and dusted.
   H,Test.csv,other rubbish goes here
   this file does have a traier with quotes on T only, and correct count
   "T",Test.csv,1,80045.96
*** File num6.tst is done and dusted.
   drwxr-xr-x+ 2 pax None 0 Feb 23 09:55 num7.tst
*** File num7.tst is not a regular file.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes except the bad count
   "T","Test.csv",8,"80045.96"
*** File num8.tst line count mismatch: 3/10.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with no quotes and a bad count
   T,Test.csv,7,80045.96
*** File num9.tst line count mismatch: 3/9.


If you want to move the files after they've been written and closed then you should consider using something like inotify, incron, FAM, gamin, etc.


This code does all of the logic calculations via a single call to awk which makes it very efficient. It also does NOT hardcode the example value of 2315 but rather uses the value contained in the trailer line as I believe this was your intent.

Remember to remove the echo if you are satisfied with the results.

#!/bin/bash

for file in /exp/files/*; do
  if [[ -f "$file" ]]; then
    if nawk -F, '{v0=$0;v1=$1;v3=$3}END{gsub(/"/,"",v0);exit !(v1 == "T" && NR == v3+2)}' "$file"; then
      echo mv "$file" /ext/ready
    fi
  fi
done

Update

I had to add {v0=$0;v1=$1;v3=$3} because SunOS's implementation of awk does not support END{} having access to the field variables ($0, $1, $2, etc.) but instead must be saved to a user-defined variable if you want to work on them inside END{}. See the last row of the first table in This awk feature comparison link


Don't have a UNIX shell handy here, but

#!/bin/bash
files=$(find /exp/files -type f)

should put all files in a BASH array; then iterating through each of them as paxdiablo suggested above should get you sorted


destination=/exp/ready
for file in /exp/files/*.csv
do
    var=$(tail -1 "$file" | awk -F"," '{ gsub(/\042|\047/,"") }
    $1=="T" && $3 == "2315" { print "ok" }')
    if [ "$var" = "ok" ]; then
        echo mv "$file" "$destination"
    else
        echo "invalid: $file"
    fi
done


#!/bin/bash

ex findready.sh <<'HERE'
  i#!/bin/bash/

  let NUMLINES=$(wc -l $1)
  let TRAILER=$(cat $1 | tail -1 | tr -d '"' | sed 's/^\(.\).*$/\1/')

  if [[ $NUMLINES -eq 2317 && $TRAILER == "T" ]] ; then
      mv $1 /exp/ready/$1
  fi
  .
  wq
HERE

chmod a+x findready.sh

find /exp/files/ -type f -name '*.csv' -exec ./findready.sh {} ';' > /dev/null 2>&1
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号