Hosting News & Commentary Hosting News & Commentary Hosting News & Commentary

As designers most of us have been victim at some time or other to design theft. Those little warts of the web that come along and think that it’s ok to rip off your web design. So what do you do when someone steals your design? I fell victim around July of last year - some 14 year old from the Netherlands decided to copy our design, and change the main image - but stupidly left the CSS links intact - linking back to me, and therefore showing up in my referral logs. Sometimes we aren’t so lucky, and the design theft goes unnoticed until someone points it out.

The normal recourse for this sort of thing (and works rather well) is to report the offending site to the web host hosting it. Assuming that the thief in question hasn’t responded well to gentle persuasion, it may be in breach of their hosting agreement. But I’m a big believer in prevention being better than cure.

So lets change things for the better - we can block User Agents from accessing our sites - so why not block those web scrapers that rip content and design directly? Lets face it those that decide to rip off a website aren’t too bright, this should help prevent the majority of design stealing.

PHP code

Php makes things easy pie with mod_rewrite - as shown below. This should block about 99% of the screenscrapers out there at the minute.

RewriteEngine OnRewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]RewriteCond %{HTTP_USER_AGENT} ^ZeusRewriteRule ^.* - [F,L]

ASP Code

You should just be able to create an include file and drop this in.

<% Sub AddViolation(objDict, strWord)'Adds a violation (a robot in this case)objDict.Add strWord, FalseEnd SubFunction objDict)'Determines if the string strString has any violationsDim bolViolationsbolViolations = FalseDim strKeyFor Each strKey in objDictIf InStr(1, strString, strKey, vbTextCompare) > 0bolViolations = TrueobjDict(strKey) = TrueEnd IfNextCheckStringForViolations = bolViolationsEnd FunctionDim objDictViolationsSet objDictViolations = objDictViolations, objDictViolations, “ChinaClaw”AddViolation objDictViolations, “DISCo”AddViolation objDictViolations, “Download Demon”AddViolation objDictViolations, “eCatch”AddViolation objDictViolations, “EirGrabber”AddViolation objDictViolations, “EmailSiphon”AddViolation objDictViolations, “EmailWolf”Dim strCheck, strKeystrCheck = Len(strCheck) > 0 thenIf objDictViolations) IfEnd If%>

I’ll be putting together a ASP.NET version - using HTTP Handlers soon. Feel free to rewrite this code and link up in the comments if you are using a different platform

  1. No user reviews yet.


Leave a Reply