Header Shadow Image


WordPress: Dealing with Comment and User Registration Spam.

This is a fine one.  Anyone who uses WordPress may need to deal with this.  Unfortunately, despite using reCAPTCHA, Akismet still picks up tons of span on my blog.  So why am I getting spam comments when I have reCAPTCHA? The answer could surprise you.

I really had no idea, but it turned out reCAPTCHA is marking those that come in as spam and that resulted in them appearing in the Akismet spam folder making it appear that Akismet caught them.  How's that for mistaken identify!  Some time back when I installed reCAPTCHA all of a sudden ALL comment spam stopped so I could hardly imagine Akismet doing that.  I would imagine spammers are not breaking through the CAPTCHA box but have been reading online that either:

  1. CAPTCHA's are potentially breakable by software.
  2. Some foreigh workers get payed to do that sort of spam to make CAPTCHA look breakable.

I really don't know or would like to believe they are unbreakable.  I do know I'm getting spam and I don't want it.  So off I go seeing how I can deal with this because:

  1. I won't read, buy or visit any link in a message I feel is spam
  2. The only thing I'll do with it is to try to prevent it when I see it.
  3. I really don't want these things hitting my site generating extra bandwidth to begin with.

I've decided to deal with this from the database side using phpMyAdmin of my provider (WORD OF CAUTION: If you haven't done any sort of SQL before, I recommend you take caution, even though we're not deleting or modifying anything).  So once I login to phpMyAdmin I select the wp_comments table and run this query:

SELECT distinct comment_author_IP, count(comment_author_IP) as Occurrance
FROM `wp_comments`
WHERE comment_approved LIKE '%spam%'
GROUP BY comment_author_IP
ORDER BY Occurrance DESC
LIMIT 0, 500

So with a rather simple query I tried to mimick the results Akismet reports it on the WordPress Dashboard, I get this list of offending IP's: 

comment_author_IP Occurrance
194.8.75.141 178
194.8.74.171 53
212.117.176.186 30
91.214.44.201 10
194.8.75.161 4
86.122.164.46 4
70.70.10.78 2
114.127.246.36 2
194.8.75.159 2
194.8.74.133 2
212.95.54.40 2
209.162.3.99 1
76.125.194.28 1
188.16.124.183 1
85.13.138.96 1
69.42.209.2 1
211.141.86.152 1
220.199.184.27 1
194.8.75.153 1
189.202.11.120 1
62.175.249.249 1
208.115.135.106 1
212.117.187.10 1
79.142.207.54 1
90.198.135.211 1
202.239.242.75 1
67.234.218.99 1
188.16.118.12 1
24.44.166.244 1
94.142.128.140 1
94.181.233.87 1
220.72.71.220 1
98.130.2.75 1
92.113.234.71 1
70.38.38.164 1
217.170.53.71 1
92.112.50.231 1
201.76.212.243 1
79.116.143.11 1
193.231.72.188 1
83.233.30.77 1
188.16.117.72 1
86.108.136.123 1
124.173.195.8 1
115.124.102.182 1
96.9.170.124 1
95.133.64.118 1
190.38.153.184 1
62.147.192.173 1
58.185.196.82 1
69.133.77.123 1

 
So what does someone do with a list like this.  In my case, as I have no access to the firewall on my host, I'm going to use .htaccess to essentially block the IP's.  Before I just plug all the IP's there, I'm going to check on some of the less frequently noted IP's above with this query and omit the ones that have one occurrance.   This should prevent a couple of things:

  1. In case a comment within one of the single occurrances is not really spam.
  2. I'll use a rule of thumb and say everything around five occurrances and that is already marked as spam by Akismet is really Spam.
  3. I don't want to get overly complicated and keep a large .htaccess file.

So I use this query to check on a few of the less often ones just to see if they are spam or not:

SELECT *
FROM wp_comments
WHERE comment_author_IP
IN (
'194.8.75.161',
'86.122.164.46',
'70.70.10.78',
'114.127.246.36',
'194.8.75.159',
'194.8.74.133',
'212.95.54.40')

LIMIT 0 , 500

Going over the list quickly, I see it's all spam.  So everything with two or more occurrances, get's a spot in my .htaccess file.  I decide to use this query to automate some of the labour and generate the correct .htaccess syntax automagically:

SELECT CONCAT("deny from", " ", comment_author_IP) as Action
FROM wp_comments
WHERE comment_approved LIKE '%spam%'
GROUP BY comment_author_IP
HAVING COUNT(comment_author_IP) >= 2
LIMIT 0, 500

Action
deny from 114.127.246.36
deny from 194.8.74.133
deny from 194.8.74.171
deny from 194.8.75.141
deny from 194.8.75.159
deny from 194.8.75.161
deny from 212.117.176.186
deny from 212.95.54.40
deny from 70.70.10.78
deny from 86.122.164.46
deny from 91.214.44.201

 .htaccess relevant code

.
.
.
<Limit GET POST>
order deny,allow
# Old Entries
deny from 209.47.94.52
deny from 72.20.4.30
deny from 92.48.193.55
deny from 87.118.104.158

# New Entries
deny from 114.127.246.36
deny from 194.8.74.133
deny from 194.8.74.171
deny from 194.8.75.141
deny from 194.8.75.159
deny from 194.8.75.161
deny from 212.117.176.186
deny from 212.95.54.40
deny from 70.70.10.78
deny from 86.122.164.46
deny from 91.214.44.201
</Limit>
.
.
.

And that is that.  The only thing that's left now is to save and upload the new .htaccess file to your web root and see if there is any improvement.

The baffling thing for me is user registrations.  I'll use a combination of LINUX and SQL here.  I'm interested in all the user names I'm getting that have registered but are listed as spam bots.  I also want to know the IP's of the registrations but really I would prefer a CAPTCHA style code to filter stuff like this.  So the first thing I get is a list of users that have been registered recently:

SELECT user_email, user_registered, user_nicename
FROM wp_users
WHERE user_login NOT LIKE '%admin%'
LIMIT 0 , 100

This gives me something like this:

which I then stick it in some file on my UNIX box called:

# cat ureg.txt
fdghjweudyf@konversia-aero.ru   2009-04-20 06:16:04     analia
draimacleroic@gmail.com         2009-04-20 21:16:31     sopssheerce
changfuuu@gmail.com     2009-04-21 02:08:15     anavoinkemi
actichziniunc@gmail.com         2009-04-21 11:32:10     joannahopkin
katyai4857@atlaskit.com         2009-04-21 13:25:08     hewatmom
qapocahemiekid12763@gmail.com   2009-04-23 06:27:04     kesenasikacusa
.
.
.
#

Now that I have this file, I would like to check if these are legitimate users or spammers:

http://www.stopforumspam.com
http://www.botscout.com

I'll then use that network and the below tiny script to find out which ones are legitimate and which ones I should get rid of:

 #!/bin/bash

# Short code for checking forum spam user registrations:

currdate=$(date +%d_%m_%Y-%H_%M_%S);
tmpfile=ureg.tran.$currdate.dat;
ipfile=ureg.ip.$currdate.dat;

>$tmpfile;
>$ipfile;
for email in $(cat $1|awk '{ print $1 }'); do
        lwp-request "http://www.stopforumspam.com/search?q=$email&export=xml" > $tmpfile;
        if [[ $(cat $tmpfile|egrep "no results found.") != "" ]]; then
                printf "%50s%-30s\n" $email " : ";
        else
                printf "%50s%-30s\n" $email ": LISTED";
                for ipv in $(cat $tmpfile|egrep "<ip>"|sed -e "s/[<>]/ /g"|awk '{ print $2 }'|sort|uniq); do
                        echo "$ipv" >> $ipfile;
                done
        fi
        sleep 2;
done

echo -ne "UNIQUE IP'S\n";
cat $ipfile|sort|uniq -c;
/bin/rm $tmpfile

This will give something like:

fdghjweudyf@konversia-aero.ru: LISTED
draimacleroic@gmail.com: LISTED
changfuuu@gmail.com: LISTED
actichziniunc@gmail.com: LISTED
katyai4857@atlaskit.com: LISTED
qapocahemiekid12763@gmail.com: LISTED
payomacon@gmail.com :
rackflinciatt@gmail.com: LISTED

The code also generates an IP file called something like ureg.ip.<DATE>.dat.  The unfortunate thing is that the IP's from the emails above all vary  with only a few that have the same subnets so it made it impractical to block these single IP's with .htaccess files.  The reason for this is that most of the IP's in emails that can be shown in the comments above usually can be and are fake IP addresses that have nothing to do with spam and have been put there by the spammers to confuse things.  So it turned out it simply wasn't worth blocking the IP's in the emails though the method did tell me which of the above comments are valid based on whether the email address itself appeared on the above spambot monitoring websites.  This verification was the main thing I wanted that told me the emails typically used in trying to register spam comments.

Least but not last, I'll add a plugin here called: WP-reCAPTCHA to help handle these when they come in so I don't have to repeat this procedure too often. 

 

FAST FORWARD TO THE FUTURE

So what did my spambox look like after about one month.  Here is a basic brakedown:

  • Thanks to WP-reCAPTCHA, I received no further spam user registrations.  All gone.  100%.
  • Since blocking the IP's, my Akismet stats now look like this:

     

     

    Historical Stats

      Spam detected Ham detected Missed spam False positives
    2009-08 95 0 0 3
    2009-07 92 10 0 0
    2009-06 562 0 0 0
    2009-05 541 1 0 0
    2009-04 814 8 0 0
    2009-03 413 12 0 0
    2009-02 34 4 0 0
    2009-01 13 20 0 0
    2008-12 39 35 0 0
    2008-11 34 0 0 0

So for July, I got about 7 times less spam.  Mind you, 100% of spam got caught and put in the spam bucket.  The stuff above in the Historical Stats is basically those IP's that are still spamming and not blocked in my .htaccess file and are still getting to the comments post page to attempt to spam. 

What's also not shown is also a large spike that braught in about 53 comments in on August 4th.  This typically is an indication that it was done from only one or two IP's.  Running the first SQL above, promptly verified that:

comment_author_IP Occurrance
91.214.44.229 25
94.142.129.98 7
92.241.160.24 6
194.8.75.147 4
78.110.175.31 3
93.185.199.117 3
74.86.148.194 2
212.95.54.235 2
221.178.181.198 2
94.102.51.196 2
148.233.229.235 2
119.95.171.110 2
85.11.66.105 2
143.161.248.25 2

and off it went to the .htaccess file.  :)

 

Later on you may want to filter based on frequency of repeat offenders.  That is to say undeny or unblock the IP's if they haven't been hitting your site for a while (a Month or two).  To that end, here is a little bash script to check which one of the blocked .htaccess IP addresses still continue to hit your site and try to post spam.  Those that haven't hit your site in months are likely no longer being used to spam and could be unblocked:

  • Download the access and error log files from your site.
  • Download the .htaccess file from the site to the same folder as above.
  • In the same folder as above, run this script (You would need Linux for below to work).

# for ipv in $(cat mds.com-htaccess |grep -Ei "^deny from"|grep -v all|awk '{ printf "%s\n", $3 }' ); do ipv=$(echo $ipv|tr '\r' ' '|sed -e 's/ //g'); echo -ne "ipv=|"$ipv"| : \t\tCount: "; ipcnt=$(cat access.log* 2>/dev/null|grep -c $ipv); echo -ne "\t\t$ipcnt\n"; done|awk '{ if ( $4 != 0  ) print; }'| awk -F"|" '{ print "deny from "$2 }'
 

The above will identify any IP's that still hit your site and print them out in the format deny from XX.XX.XX.XX so you can copy and paste to your .htaccess file.

 

JAN 6 2011

Sometime back in July-August of last year, I decided to remove some IP's from the .htaccess file to see what effect that would have on the spam I recevie.  Here is the result:

Akismet-block-spam-ip-effect.jpg

Noteworthy to say that the spam I got shot up nearly 8 fold.  So I followed the above instructions again to reblock whatever was in my current list of spam box above.  But I can already guess on the effect from the above image. 

It looks like much of the IP's I unblocked earlier really continue to spam on an ongoing basis.  And I always thought they would change their IP's when one is blocked….  :)

Update: Oct 1 2011

Here's an updated version of the query showing instances of IP's that were marked as spam:

SELECT CONCAT("deny from ", comment_author_IP) as WebAction,
'#' as CommentSeparator,
COUNT(comment_author_IP) as IPCount
FROM wp_comments
WHERE comment_approved LIKE '%spam%'
GROUP BY comment_author_IP
HAVING COUNT(comment_author_IP) >= 5
ORDER BY IPCount DESC
LIMIT 0, 200

This will also order by the count to allow finer set choosing.

April 3 2013

So another bout with comment spam reveals these new sources:

WebAction CommentSeparator IPCount
deny from 198.200.33.49 # 142
deny from 142.0.136.21 # 130
deny from 74.126.177.42 # 88
deny from 142.0.137.225 # 86
deny from 96.46.3.90 # 65
deny from 142.0.132.220 # 60
deny from 218.6.9.118 # 60
deny from 192.74.240.140 # 60
deny from 96.46.5.74 # 55
deny from 142.4.117.124 # 46
deny from 218.6.9.246 # 43
deny from 96.46.6.194 # 43
deny from 192.151.156.250 # 41
deny from 218.6.9.237 # 40
deny from 192.74.234.44 # 36
deny from 192.74.228.68 # 36
deny from 218.6.8.166 # 35
deny from 192.95.32.53 # 33
deny from 218.6.8.179 # 32
deny from 142.4.126.244 # 32
deny from 216.152.251.6 # 27
deny from 208.177.76.14 # 27
deny from 88.190.241.38 # 26
deny from 218.6.9.125 # 26
deny from 71.199.48.67 # 23
deny from 198.50.154.185 # 23
deny from 88.190.240.66 # 23
deny from 125.78.241.4 # 22
deny from 96.46.7.219 # 21
deny from 88.190.241.168 # 20
deny from 58.221.58.22 # 19
deny from 137.175.18.85 # 19
deny from 218.6.8.19 # 18
deny from 192.74.244.97 # 18
deny from 58.49.50.234 # 17
deny from 192.74.244.100 # 16
deny from 110.90.61.198 # 15
deny from 137.175.15.132 # 15
deny from 137.175.68.178 # 15
deny from 91.207.5.14 # 14
deny from 198.2.204.77 # 14
deny from 74.126.180.243 # 13
deny from 27.153.231.219 # 13
deny from 137.175.118.100 # 13
deny from 27.155.88.80 # 12
deny from 88.190.241.178 # 12
deny from 208.177.76.7 # 12
deny from 88.190.61.95 # 11
deny from 218.207.132.212 # 11
deny from 88.190.47.234 # 11
deny from 74.126.182.76 # 11
deny from 192.99.1.164 # 11
deny from 110.85.103.64 # 11
deny from 88.190.47.233 # 10
deny from 88.190.47.232 # 10
deny from 120.37.239.249 # 9
deny from 218.6.9.14 # 9
deny from 120.42.64.116 # 9
deny from 61.147.120.17 # 9
deny from 88.190.61.100 # 9
deny from 192.151.145.28 # 9
deny from 91.207.7.69 # 9
deny from 208.177.76.8 # 8
deny from 91.207.7.150 # 8
deny from 137.175.15.131 # 8
deny from 125.78.241.10 # 8
deny from 112.111.39.131 # 7
deny from 175.42.11.217 # 7
deny from 88.190.47.222 # 7
deny from 198.2.204.156 # 7
deny from 88.190.61.98 # 7
deny from 198.2.204.76 # 7
deny from 110.85.100.24 # 7
deny from 198.2.207.248 # 7
deny from 137.175.105.28 # 7
deny from 142.4.126.28 # 6
deny from 198.2.204.153 # 6
deny from 88.190.241.111 # 6
deny from 198.2.203.16 # 6
deny from 220.200.32.197 # 6
deny from 36.251.47.53 # 6
deny from 61.241.222.8 # 6
deny from 183.5.140.64 # 6
deny from 218.6.9.102 # 6
deny from 220.250.40.36 # 6
deny from 88.190.61.96 # 6
deny from 137.175.14.33 # 6
deny from 211.97.109.246 # 6
deny from 142.0.32.3 # 6
deny from 120.37.235.217 # 6
deny from 208.177.76.13 # 6
deny from 112.111.37.96 # 6
deny from 110.80.74.199 # 6
deny from 112.111.15.79 # 6
deny from 112.111.36.168 # 6
deny from 137.175.18.84 # 6
deny from 175.42.10.6 # 6
deny from 112.111.55.101 # 5
deny from 113.119.55.235 # 5
deny from 198.100.149.163 # 5
WebAction CommentSeparator IPCount
deny from 112.111.12.4 # 5
deny from 220.249.165.206 # 5
deny from 208.177.76.11 # 5
deny from 208.177.76.12 # 5
deny from 88.190.240.28 # 5
deny from 58.22.22.192 # 5
deny from 183.14.109.186 # 5
deny from 116.21.65.174 # 5
deny from 88.190.241.198 # 5
deny from 137.175.68.177 # 5
deny from 89.230.79.125 # 5
deny from 202.105.89.171 # 5
deny from 36.248.71.91 # 5
deny from 220.250.40.71 # 5
deny from 120.37.234.226 # 5
deny from 36.248.21.247 # 5
deny from 199.15.234.218 # 5
 

 

Good Luck!

6 Responses to “WordPress: Dealing with Comment and User Registration Spam.”

  1. Hi,

    Due to the implementation of the reCAPTCHA plugin, spam caught by reCAPTCHA appears in the akismet queue. When one makes a request in the browser, the comment gets deleted from the akismet queue automatically, however many bots don’t trigger this code path.

    Our recommendation is to turn off akismet (avoiding false positives) and then ignore the spam queue.

  2. Tom Kacperski on July 2nd, 2009 at 11:59 pm

    Thanks Ben!

    Yes. Now that you mention, I also recall part of the WP-reCAPTCHA FAQ mentioned this. Once I find out that WordPress not Akismet deletes comments automatically after 15 days, I’ll disable Akismet. Until then I sort of rely on the Akismet auto delete functionality. :)

    In a way this is good that reCAPTCHA makes comments indistinguishable (?) from Akismet caught spam and places them in the same queue. I’ll take advantage of it to block the worst offenders if I can from getting through the web server very far, in the way I outlined above.

    Personally I’m fond of reCAPTCHA and can’t imagine any effective way against it regardless of various opinions. So because of this the fact that it wasn’t catching SPAM posts seamed very unlikely to me. I’m glad it is as since I installed it, I have seen a complete stop in SPAM ‘new comment’ messages in my inbox. On top of that I’m watching to see what effect it will have on reducing spam user registrations, as I just enabled it on that.

    Again, thanks Ben! Keep up the great work!

  3. WordPress: Dealing with Comment and User Registration Spam….

    Kudos for a great SEO article – Trackback from SEOKudos…

  4. Ever since our site got bumped to PR4 the amount of spam has been staggering. Once we installed Spam free wordrpess plugin, it stopped immediately. I guess the use of a password (you just have to copy/paste) instead of captcha is more effective. I wonder how long this will work though. We are now getting a wave of spam user registrations….any solution for this one?

  5. There’s a captcha for user registrations I believe. Look for:

    Enable reCAPTCHA on registration form.

    Cheers!
    TK

  6. [...] using .htaccess IP blocks.  This had a dramatic effect.  This post is an extension of WordPress: Dealing with Comment and User Registration Spam. where we describe how to identify spamming IP's to your blog or site and how to use the [...]

Leave a Reply

 


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License