Linux筆記: 5月 2011

2011年5月19日星期四

[轉貼]過濾 log 資訊的 script

仲佑的網誌
http://yowlab.shps.kh.edu.tw/wordpress/?p=1294

檢查誰在亂踹：
1.grep 'login error' /var/log/openwebmail.log > auth_error.log
2.sed ' s/^.* (\[0-9]*\.[0-9]*.\[0-9]*\.[0-9]*\).*/\1/g 'auth_error.log > ip_list.txt
3.sort ip_list.txt > sort_ip_list.txt
4.uniq -c sort_ip_list.txt

檢查誰在寄信：
1.grep 'send message' /var/log/openwebmail.log.1 > send-mail.log
2.sed ' s/^.* (\[0-9]*\.[0-9]*.\[0-9]*\.[0-9]*) \ ( [a-zA-Z].*\ ) -send message .*/\1/g ' send-mail.log > send-list.log
3.sort sender-list.log > sort-sender.log
4.uniq -c sort-sender.log

http://www.ruanyifeng.com/blog/2012/01/a_bash_script_of_apache_log_analysis.html

反复查看手册，确认用法和合适的参数。下面就是我的日志分析脚本，虽然它还不是通用的，但是我相信里面用到的命令，足以满足一般的日志分析需求，同时也是很好的学习Bash的实例。如果下面的每一个命令你都知道，我觉得可以堪称熟练使用Bash了。

一、操作环境

在介绍脚本之前，先讲一下我的服务器环境。

我的网络服务器软件是Apache，它会对每一个http请求留下记录，就像下面这一条：

　　203.218.148.99 - - [01/Feb/2011:00:02:09 +0800] "GET /blog/2009/11/an_autobiography_of_yang_xianyi.html HTTP/1.1" 200 84058 "http://www.ruanyifeng.com/blog/2009/11/freenomics.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13"

它的意思是2011年2月1日，IP地址为203.218.148.99的访问者，向服务器请求访问网址/blog/2009/11/an_autobiography_of_yang_xianyi.html。

当天所有的访问记录，组成一个日志。过去一年，一共生成了365个日志文件。它们存放在12个目录中，每一个目录表示一个月（2011-01、2011-02、......2011-12），里面的日志文件依次为www-01.log、www-02.log、......www-31.log（假定该月有31天）。

在不压缩的情况下，365个日志文件加起来，要占掉10GB空间。我的目标就是分析这10GB日志，最后得到一个如下形式的访问量排名：

　　访问量网址1
　　访问量网址2
　　访问量网址3
　　...... ......

二、为什么要用Bash

很多计算机语言，都可以用来完成这个任务。但是，如果只是简单的日志分析，我觉得Bash脚本是最合适的工具。

主要原因有两个：一是"开发快"，Bash脚本是各种Linux命令的组合，只要知道这些命令怎么用，就可以写脚本，基本上不用学习新的语法，而且它不用编译，直接运行，可以边写边试，对开发非常友好。二是"功能强"，Bash脚本的设计目的，就是为了处理输入和输出，尤其是单行的文本，所以非常合适处理日志文件，各种现成的参数加上管道机制，威力无穷。

前面已经说过，最终的脚本我只用了20多行，处理10GB的日志，20秒左右就得到了结果。考虑到排序的巨大计算量，这样的结果非常令人满意，充分证明了Bash的威力。

三、总体思路

我的总体处理思路是这样的：

　　第一步，处理单个日志。统计每一天各篇文章的访问量。

　　第二步，生成月度排名。将每一天的统计结果汇总，得到月度访问量。

　　第三步，生成年度排名。将12个月的统计结果汇总，进行年度访问量的排序。

四、处理单个日志

以2011年1月1日的日志为例，它在目录2011-01之中，文件名是www-01.log，里面有10万条如下格式的记录：

　　203.218.148.99 - - [01/Feb/2011:00:02:09 +0800] "GET /blog/2009/11/an_autobiography_of_yang_xianyi.html HTTP/1.1" 200 84058 "http://www.ruanyifeng.com/blog/2009/11/freenomics.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13"

处理这个日志，我只用了一行代码：

　　awk '$9 == 200 {print $7}' www-01.log | grep -i '^/blog/2011/.*\.html$' | sort | uniq -c | sed 's/^ *//g' > www-01.log.result

它用管道连接了5个命令，每一个都很简单，我们依次来看：

（1） awk '$9 == 200 {print $7}' www-01.log

awk命令默认用空格，将每一行文本分割成若干个字段。仔细数一下，我们需要的只是第7个字段，即http请求的网址，{print $7}表示将第7个字段输出，结果就是：

　　/blog/2009/11/an_autobiography_of_yang_xianyi.html

考虑到我们只统计成功的请求，因此再加一个限制条件，服务器的状态代码必须是200（表示成功），写成"$9 == 200"，即第9个字段必须是200，否则不输出第7个字段。

更精细的统计，还应该区分网络蜘蛛和真实访问者，由于我想不出简单的分辨方法，这里只好忽略了。

（2）grep -i '^/blog/2011/.*\.html$'

在输出的所有记录的第7个字段之中，并不是每一条记录都需要统计的。根据我的文章的命名特点，它们的网址应该都以"/blog/2011/"开头，以".html"结尾。所以，我用一个正则表达式"^/blog/2011/.*\.html$"，找出这些记录。参数i表示不区分大小写。

（3）sort

这时，所有需要统计的记录应该都列出来了，但是它们的次序是杂乱的。接着，使用sort命令，不过目的不是为了排序，而是把相同的网址排列在一起，为后面使用uniq命令创造条件。

（4）uniq -c

uniq的作用是过滤重复的记录，只保留一行。c参数的作用，是在每行的开头添加该记录的出现次数。处理之后的输出应该是这样的：

　　32 /blog/2011/01/guidelines_for_english_translations_in_public_places.html
　　32 /blog/2011/01/api_for_google_s_url_shortener.html
　　30 /blog/2011/01/brief_history_of_arm.html

它表示以上三篇文章，在1月1日的日志中，分别有32条、32条、30条的访问记录（即访问次数）。

（5）sed 's/^ *//g' > www-01.log.result

上一步uniq命令添加的访问次数，是有前导空格的。也就是说，在上例的32、32、30之前有一连串空格，为了后续操作的方便，这里把前导空格删去。sed命令是一个处理行文本的编辑器，'s/^ *//g'是一个正则表达式（^和*之间有一个空格），表示将行首的连续空格替换为空（即删除）。接着，将排序结果重定向到文件www-01.result。单个日志分析就完成了。

五、月度汇总排名

经过上一步之后，1月份的31个日志文件，生成了31个对应的分析结果文件。为了汇总整个月的情况，必须把这31个结果文件合并。

（6）合并分析结果

　　for i in www-*.log.result
　　do
　　　　cat $i >> log.result
　　done

这是一个循环结构，把所有www-01.log.result形式的文件，都写进log.result文件。

然后，我用一行语句，计算月度排名。

　　sort -k2 log.result | uniq -f1 --all-repeated=separate |./log.awk |sort -rn > final.log.result

这行语句由3个命令和1个awk脚本组成：

（7）sort -k2 log.result

由于是31个文件汇总，log.result文件里面的记录是无序的，必须用sort命令，将相同网址的记录归类在一起。但是此时，访问次数是第一个字段，网址是第二个字段，因此参数k2表示根据第二个字段进行排序。

（8）uniq -f1 --all-repeated=separate

uniq的作用是过滤重复的记录，参数f1表示忽略第一个字段（访问次数），只考虑后面的字段（网址）；参数表示all-repeated=separate，表示过滤掉所有只出现一次的记录，保留所有重复的记录，并且每一组之间用一个空行分隔。这一步完成以后，输出结果变成如下的形式：

　　617 /blog/2011/01/guidelines_for_english_translations_in_public_places.html
　　455 /blog/2011/01/guidelines_for_english_translations_in_public_places.html

　　223 /blog/2011/01/2010_my_blogging_summary.html
　　253 /blog/2011/01/2010_my_blogging_summary.html

相同网址都归在一组，组间用空行分割。为了简洁，上面的例子每一组只包含两条记录，实际上每一组都包含31条记录（分别代表当月每天的访问次数）。

（9）log.awk脚本

为了将31天的访问次数加总，我动了很多脑筋。最后发现，唯一的方法就是用awk命令，而且必须另写一个awk脚本。

　　#!/usr/bin/awk -f

　　BEGIN {
　　　　RS="" #将多行记录的分隔符定为一个空行
　　}

　　{
　　　　sum=0 #定义一个表示总和的变量，初值为0
　　　　for(i=1;i<=NF;i++){ #遍历所有字段
　　　　　　if((i%2)!=0){ #判断是否为奇数字段
　　　　　　　　sum += $i #如果是的话，累加这些字段的值
　　　　　　}
　　　　}
　　　　print sum,$2 #输出总和，后面跟上对应的网址
　　}

我已经对上面这个log.awk脚本加了详细注释。这里再说明几点：首先，默认情况下，awk将"\n"作为记录的分隔符，设置RS=""表示改为将空行作为分隔符，因此形成了一个多行记录；其次，NF是一个awk的内置变量，表示当前行的字段总数。由于输入文件之中，每一行都包含两个字段，第一个是访问数，第二个是网址，所以这里做一个条件判断，只要是奇数字段就累加，偶数字段则一律跳过。最后，每个记录输出一个累加值和网址，它们之间用空格分割。

（10）sort -rn > final.log.result

对awk脚本的处理结果进行排序，sort默认使用第一个字段，参数r表示逆序，从大往小排；参数n表示以数值形式排序，不以默认的字典形式排序，否则会出现10小于2的结果。排序结果重定向到final.log.result。至此，月度排名完成。

六、脚本文件

用一个脚本，包含上面两节所有的内容。

　　#!/bin/bash

　　if ls ./*.result &> /dev/null #判断当前目录中是否有后缀名为result的文件存在
　　then
　　　　rm *.result #如果有的话，删除这些文件
　　fi

　　touch log.result #创建一个空文件

　　for i in www-*.log #遍历当前目录中所有log文件
　　do
　　　　echo $i ... #输出一行字，表示开始处理当前文件
　　　　awk '$9 == 200 {print $7}' $i|grep -i '^/blog/2011/.*\.html$'|sort|uniq -c|sed 's/^ *//g' > $i.result #生成当前日志的处理结果
　　　　cat $i.result >> log.result #将处理结果追加到log.result文件
　　　　echo $i.result finished #输出一行字，表示结束处理当前文件
　　done

　　echo final.log.result ... #输出一行字，表示最终统计开始

　　sort -k2 log.result | uniq -f1 --all-repeated=separate |./log.awk |sort -rn > final.log.result #生成最终的结果文件final.log.result

　　echo final.log.result finished #输出一行字，表示最终统计结束

这就是月度排名的最终脚本。编写的时候，我假定这个脚本和log.awk脚本与日志文件在同一个目录中，而且这两个脚本都具有执行权限。

年度排名的处理与此类似，就不再赘述了。

=================================================================

2011年5月8日星期日

25 Best SSH Commands / Tricks

http://blog.urfix.com/25-ssh-commands-tricks/

25 Best SSH Commands / Tricks
Friday, November 19, 2010

OpenSSH is a FREE version of the SSH connectivity tools that technical users of the Internet rely on. Users of telnet, rlogin, and ftp may not realize that their password is transmitted across the Internet unencrypted, but it is. OpenSSH encrypts all traffic (including passwords) to effectively eliminate eavesdropping, connection hijacking, and other attacks. Additionally, OpenSSH provides secure tunneling capabilities and several authentication methods, and supports all SSH protocol versions.

SSH is an awesome powerful tool, there are unlimited possibility when it comes to SSH, heres the top Voted SSH commands

1) Copy ssh keys to user@host to enable password-less ssh logins.

ssh-copy-id user@host
To generate the keys use the command ssh-keygen

2) Start a tunnel from some machine’s port 80 to your local post 2001

ssh -N -L2001:localhost:80 somemachine
Now you can acces the website by going to http://localhost:2001/

3) Output your microphone to a remote computer’s speaker

dd if=/dev/dsp | ssh -c arcfour -C username@host dd of=/dev/dsp
This will output the sound from your microphone port to the ssh target computer’s speaker port. The sound quality is very bad, so you will hear a lot of hissing.

4) Compare a remote file with a local file

ssh user@host cat /path/to/remotefile | diff /path/to/localfile -
Useful for checking if there are differences between local and remote files.

5) Mount folder/filesystem through SSH

sshfs name@server:/path/to/folder /path/to/mount/point
Install SSHFS from http://fuse.sourceforge.net/sshfs.html
Will allow you to mount a folder security over a network.

6) SSH connection through host in the middle

ssh -t reachable_host ssh unreachable_host
Unreachable_host is unavailable from local network, but it’s available from reachable_host’s network. This command creates a connection to unreachable_host through “hidden” connection to reachable_host.

7) Copy from host1 to host2, through your host

ssh root@host1 “cd /somedir/tocopy/ && tar -cf – .” | ssh root@host2 “cd /samedir/tocopyto/ && tar -xf -”

Good if only you have access to host1 and host2, but they have no access to your host (so ncat won’t work) and they have no direct access to each other.

8) Run any GUI program remotely

ssh -fX @

The SSH server configuration requires:
X11Forwarding yes # this is default in Debian
And it’s convenient too:
Compression delayed

9) Create a persistent connection to a machine

ssh -MNf @
Create a persistent SSH connection to the host in the background. Combine this with settings in your ~/.ssh/config:
Host host
ControlPath ~/.ssh/master-%r@%h:%p
ControlMaster no
All the SSH connections to the machine will then go through the persisten SSH socket. This is very useful if you are using SSH to synchronize files (using rsync/sftp/cvs/svn) on a regular basis because it won’t create a new socket each time to open an ssh connection.

10) Attach screen over ssh

ssh -t remote_host screen -r
Directly attach a remote screen session (saves a useless parent bash process)

11) Port Knocking!

knock 3000 4000 5000 && ssh -p user@host && knock 5000 4000 3000
Knock on ports to open a port to a service (ssh for example) and knock again to close the port. You have to install knockd.
See example config file below.
[options]
logfile = /var/log/knockd.log
[openSSH]
sequence = 3000,4000,5000
seq_timeout = 5
command = /sbin/iptables -A INPUT -i eth0 -s %IP% -p tcp –dport 22 -j ACCEPT
tcpflags = syn
[closeSSH]
sequence = 5000,4000,3000
seq_timeout = 5
command = /sbin/iptables -D INPUT -i eth0 -s %IP% -p tcp –dport 22 -j ACCEPT
tcpflags = syn

12) Remove a line in a text file. Useful to fix

ssh-keygen -R
In this case it’s better do to use the dedicated tool

13) Run complex remote shell cmds over ssh, without escaping quotes

ssh host -l user $(＜cmd.txt)　
Much simpler method. More portable version: ssh host -l user “`cat cmd.txt`”

14) Copy a MySQL Database to a new Server via SSH with one command

mysqldump –add-drop-table –extended-insert –force –log-error=error.log -uUSER -pPASS OLD_DB_NAME | ssh -C user@newhost “mysql -uUSER -pPASS NEW_DB_NAME”
Dumps a MySQL database over a compressed SSH tunnel and uses it as input to mysql – i think that is the fastest and best way to migrate a DB to a new server!

15) Remove a line in a text file. Useful to fix “ssh host key change” warnings

sed -i 8d ~/.ssh/known_hosts

16) Copy your ssh public key to a server from a machine that doesn’t have ssh-copy-id

cat ~/.ssh/id_rsa.pub | ssh user@machine “mkdir ~/.ssh; cat >> ~/.ssh/authorized_keys”
If you use Mac OS X or some other *nix variant that doesn’t come with ssh-copy-id, this one-liner will allow you to add your public key to a remote machine so you can subsequently ssh to that machine without a password.

17) Live ssh network throughput test

yes | pv | ssh $host “cat > /dev/null”
connects to host via ssh and displays the live transfer speed, directing all transferred data to /dev/null
needs pv installed
Debian: ‘apt-get install pv’
Fedora: ‘yum install pv’ (may need the ‘extras’ repository enabled)

18) How to establish a remote Gnu screen session that you can re-connect to

ssh -t user@some.domain.com /usr/bin/screen -xRR
Long before tabbed terminals existed, people have been using Gnu screen to open many shells in a single text terminal. Combined with ssh, it gives you the ability to have many open shells with a single remote connection using the above options. If you detach with “Ctrl-a d” or if the ssh session is accidentally terminated, all processes running in your remote shells remain undisturbed, ready for you to reconnect. Other useful screen commands are “Ctrl-a c” (open new shell) and “Ctrl-a a” (alternate between shells). Read this quick reference for more screen commands: http://aperiodic.net/screen/quick_reference

19) Resume scp of a big file

rsync –partial –progress –rsh=ssh $file_source $user@$host:$destination_file
It can resume a failed secure copy ( usefull when you transfer big files like db dumps through vpn ) using rsync.
It requires rsync installed in both hosts.
rsync –partial –progress –rsh=ssh $file_source $user@$host:$destination_file local -> remote
or
rsync –partial –progress –rsh=ssh $user@$host:$remote_file $destination_file remote -> local

20) Analyze traffic remotely over ssh w/ wireshark

ssh root@server.com ‘tshark -f “port !22″ -w -’ | wireshark -k -i -
This captures traffic on a remote machine with tshark, sends the raw pcap data over the ssh link, and displays it in wireshark. Hitting ctrl+C will stop the capture and unfortunately close your wireshark window. This can be worked-around by passing -c # to tshark to only capture a certain # of packets, or redirecting the data through a named pipe rather than piping directly from ssh to wireshark. I recommend filtering as much as you can in the tshark command to conserve bandwidth. tshark can be replaced with tcpdump thusly:
ssh root@example.com tcpdump -w – ‘port !22′ | wireshark -k -i -

21) Have an ssh session open forever

autossh -M50000 -t server.example.com ‘screen -raAd mysession’
Open a ssh session opened forever, great on laptops losing Internet connectivity when switching WIFI spots.

22) Harder, Faster, Stronger SSH clients

ssh -4 -C -c blowfish-cbc
We force IPv4, compress the stream, specify the cypher stream to be Blowfish. I suppose you could use aes256-ctr as well for cypher spec. I’m of course leaving out things like master control sessions and such as that may not be available on your shell although that would speed things up as well.

23) Throttle bandwidth with cstream

tar -cj /backup | cstream -t 777k | ssh host ‘tar -xj -C /backup’
this bzips a folder and transfers it over the network to “host” at 777k bit/s.
cstream can do a lot more, have a look http://www.cons.org/cracauer/cstream.html#usage
for example:
echo w00t, i’m 733+ | cstream -b1 -t2

24) Transfer SSH public key to another machine in one step

ssh-keygen; ssh-copy-id user@host; ssh user@host
This command sequence allows simple setup of (gasp!) password-less SSH logins. Be careful, as if you already have an SSH keypair in your ~/.ssh directory on the local machine, there is a possibility ssh-keygen may overwrite them. ssh-copy-id copies the public key to the remote host and appends it to the remote account’s ~/.ssh/authorized_keys file. When trying ssh, if you used no passphrase for your key, the remote shell appears soon after invoking ssh user@host.

25) Copy stdin to your X11 buffer

ssh user@host cat /path/to/some/file | xclip
Have you ever had to scp a file to your work machine in order to copy its contents to a mail? xclip can help you with that. It copies its stdin to the X11 buffer, so all you have to do is middle-click to paste the content of that looong file :)

Have Fun

Please comment if you have any other good SSH Commands OR Tricks.

訂閱：文章 (Atom)

2011年5月19日 星期四