关于awk以及正则的一点知识

秋日摘桂花做饼

2020-01-13

linux › archlinux

learn

毕业也已经半年有余了，都说只要出了大学的校门，都会被这个社会所改变。更多指的是人会变得圆滑，但是我怎么觉得我一点也没变。我还是那个我，不知道我是进步了还是退步了。

awk忽略某一列的显示

awk也算一个比较常用的命令了，但是呢，高级的语法我也没怎么学。目前用到了一个功能特别好用，所以需要记录一下。

需求是这样的：我需要将数据中的第一列删除，仅仅显示其余的列。当然了，也可以写$2,$3..$n但是列很多啊，所以有没有好一点的方法？那就是将第一列指定为空

1	awk '{$1=''; print $0}' somefile

正则中[]的作用

之前只知道[0-9a-zA-Z]这样的正则写法，表示这个字符可能是数字或者大小写字母。理解的有限，所以在实际运用中就不会写了。

当然，这个例子是oracle中的正则：我需要从很长的数据中去掉以冒号开头的那些字符

1	select regexp_replace(':dfsa_kj^dfk_a+234^:jkj_2+34.^we09',':[0-9a-zA-Z_.+-]+','') from dual;

这样就把冒号开头的那些奇奇怪怪的没有规律的字符全部替换为空。括号中表示的是这些字符中的某一个，后面的加号就代表每一个位置都可能是这样的字符。所以连起来就是以冒号开头的出现括号中任意字符的连续字符串，都替换为空。

[]表示括号中随机无序的任意一个字符，它和+连起来就极好了，就可以表示每一个位置都可能出现这些字符中任意一个的这样的连续字串。

这次的shadowsock脚本加上了速度判断

#!/bin/zsh


#clear screen and give a choice to choose,yes to catch a new file 
clear

#go to the script dir and scratch all of information to a tmp file
realdir=`whereis sslink | awk -F ":" '{print $2}' | xargs ls -l | awk -F ">" '{print $2}' | xargs dirname`
cd $realdir

#product the command line to a list
if [[ -e ss.html ]];then
    echo -e	 "\033[32m <<<<<<<<<< the file exist, do you wanna to get a new one?(default no) >>>>>>>>>>  \033[0m"
    echo -e -n	 "\033[32m >>>>>>>>>>  \033[0m"
    read answer
    if [[ $answer == 'yes' ]];then
        cat ./ss.html | grep  '<td align=' | grep -v 'class' | grep -E '^<' | awk -F'>|<' '{print $3}' > ss.hl
    fi
elif [[ ! -e ss.html ]];then
    echo -e	 "\033[32m <<<<<<<<<< the file doesn't exist,you need to get it >>>>>>>>>>   \033[0m"
    echo -e -n	 "\033[32m >>>>>>>>>>  \033[0m"
    exit
fi

echo -e	 "\033[32m <<<<<<<<<< do you wanna to retest and get the faster one?(default no) >>>>>>>>>>  \033[0m"
echo -e -n	 "\033[32m >>>>>>>>>>  \033[0m"
read choose
if [[ $choose == "yes" ]];then
    :>ss.tmp && :>ss.final
    i=1 && j=`cat ss.hl | wc -l`
    while [[ $j -gt 0 ]]
    do
        s=`sed -n ''"$i"'p' ss.hl ` && ((i++))
        p=`sed -n ''"$i"'p' ss.hl ` && ((i++))
        k=`sed -n ''"$i"'p' ss.hl ` && ((i++))
        m=`sed -n ''"$i"'p' ss.hl ` && ((i++))
        ss="sslocal -s $s -p $p -k $k -m $m -l 1080"

        eval $ss 2>/dev/null 2>&1 &
        sleep 2
        unset start && unset stop && unset tm && start=`date +%S`
        timeout 6 proxychains -q curl www.google.com >/dev/null 2>&1 
        stop=`date +%S` 

        tm=$((stop-start))
        if [[ $tm -gt 0 && $tm -lt 6 ]];then
            if [[ $tm -lt 0 ]];then
                tm=$((tm+60))
            fi
            ss="$tm sslocal -s $s -p $p -k $k -m $m -l 1080"
            echo $ss >> ss.tmp
        fi 
        j=$((j-4))
        killall sslocal 2>/dev/null
        killall proxychains 2>/dev/null
    done
    sort -g ss.tmp | awk '{$1="";print $0}' > ss.final
else
    cat ss.final
    total=`cat ss.final | wc -l`
    while :
    do
        echo -e	 "\033[32m <<<<<<<<<< choose the one you wanna to connect or the app to be proxyed >>>>>>>>>> \033[0m"
        echo -e -n	 "\033[32m >>>>>>>>>>  \033[0m"
        read info
        if [[ $info -gt 0 && $info -le $total ]];then
            command=`sed -n ''$info'p' ss.final`
            killall sslocal 2>/dev/null
            eval $command >/dev/null 2>&1 &
        elif [[ $info == 'exit' ]];then
            exit
        elif [[ $info == 'clear' ]];then
            clear
        elif [[ $info == 'menu' ]];then
            cat ss.final
        elif [[ $info -gt $total ]];then
            echo "total:$total,can not greater than $total"
        else
            which $info >/dev/null 2>&1
            if [[ $? == 0 ]];then
                proxychains -q $info > /dev/null 2>&1 &
            fi
        fi
    done
fi

这个脚本并不完美，因为网页中有图片识别码，所以我无法获取到数据，只能手动下载。所以后面需要做的是用python来识别图片，进行网页源码的下载。