The Blogging Arms Race

November 19th, 2005

What is it about spam that makes even a laid back, tolerant person’s blood boil? What is it about spam that makes someone who’s against the death penalty even for child molesters and serial killers start ranting about burning spammers at the stake? Sometimes our hatred for spam borders on the irrational…. but hate it I do.

And I think there is good reason to loathe spammers and fight them actively. I’ve seen people argue that spam is just commercialism brought to the Internet, and if you don’t like it, delete it. (Spammers sometimes even put disqualifiers at the bottom of their spam explaining with tortured, disingenuous logic why “This is not spam” or else screeching that they have a “First Amendment right” to spam you.) I think the “free speech” and “commercial enterprise” arguments are obviously flawed.

It is interesting to watch the ever-evolving technological arms race between spammers and those who hate spam.

Blog spam is an interesting case in point. When blogs became popular, no one would have thought at first that spammers would have any reason to start posting links to porn sites and online “pharmacies” in blog comments. But then Google became the number one search engine, and as we all know, one of the criteria Google uses to rank sites in a search is the number of other sites that link to it. So, if you have a link to your site posted in forty bajillion blog entries, then as Google starts indexing all those blog entries, your site will rise higher in search rankings. Thus, suddenly bloggers were having their blogs inundated with irrelevant links to porn, gambling, and Viagra sites.

It didn’t take much for blogging software to add functionality to delete comments the administrator didn’t like (most had such features already, since it was anticipated that someone might post things in your blog you didn’t want to allow). The initial response from the spammers was to try to “disguise” their spam as something relevant — “Hey man, I love your site! Keep posting!” followed by the link. But this isn’t hard to spot, especially since the sheer redundancy of such posts creates patterns that anyone can spot instantly after being hit with them a few times.

But deleting every spam post you get is a tedious process when your blog is spammed en mass (I’ve had scores of blog spam posts put in my blog at once; some people have reported being bombarded with hundreds at a time.)

So the writers of blogging software started adding filters and blacklists. You could filter by known URLs or phrases and have posts containing those phrases or URLs automatically get labeled as spam and removed. And bloggers could share blacklists so that as soon as a new site sprang up trying to blogspam its way up the Google rankings, everyone would add it to their blacklist. Some blog software (like WordPress) has added Bayesian filters so you can not only delete a blog comment, but mark it as “spam” much as e-mail clients do, so that over time it will become “smarter” at recognizing spam.

Blog spammers began doing the same thing that e-mail spammers do to try to circumvent filters: posts with long strings of unrelated words, random phrases copied from books and poetry and quotations, anything to fill their post with so much unrelated text that Bayesian filters will have a hard time classifying it. The problem with this tactic is that (a) filters are getting smarter all the time, and (b) a person looking at such posts can still recognize them for what they are very quickly.

Recently, though, bloggers have seen the latest escalation in the blogging arms race: posts that look like typical blog spam… but the links are to sites like Yahoo! and Apple and Adobe, CNN and Microsoft, websites for charities and museums and the like.

What is the point of this? To fill blacklists with popular high-traffic sites that don’t belong there and thus, render blacklisting useless or counterproductive.

For now, what this does is force bloggers to be more careful about removing spam from their blogs, and make sure to manually delete posts like the above without marking it as “spam” and thus adding it to their filters. So simultaneously, the spammers are both compromising the utility of blacklists and making us do more tedious work to keep them off our blogs.

I am sure new measures and countermeasures will evolve. But this is certainly clear evidence that spammers (at least some of them) are an organized, cynical cabal who are very deliberately trying to force everyone on the Internet to simply accept their “right” to use every resource on the ‘net for their own ends. In this way, they’re on precisely the same ethical level as virus-writers and hackers.

And this is why spam is neither “free speech” nor “advertising” and that it is right and proper to try to stop them by any legal means available — including by writing laws against spam. Because they aren’t just trying to advertise to you (even if we discount all the spammers selling products and services that are illegal). They are trying to take over your computer, your webserver, your bandwidth, your resources. They actively engage in technological warfare to commandeer resources that do not belong to them, whether it is by sending you unsolicited e-mail that consumes bandwidth that you pay for and they do not, or by going further and trying to sabotage, in an organized fashion, countermeasures people take to try to “opt out” of their endeavors.

And that’s only limiting the discussion to spamming tactics that are at least technically legal. We now know that the biggest spammers make use of botnets, which are large clusters of PCs that have been hacked and turned into “zombies” that can be used to anonymously relay spam, launch DDOS attacks, or anything else the owner wants. Spammers rent these botnets from the gangs that control them so they can distribute their spam without fear of it being traced back to them.

And this is why I really, really hate spammers.

(Check out Inside the Spam Cartel for more.)

SmallRoller Update (v. 1.3)

May 30th, 2005

SmallRoller lives! At long last, a new version!

The most frequently-requested feature was the ability to calculate odds when rolling a collection consisting of different types of dice (for example 2D6,1D4,1D8, and 1D10). Now you can do this by entering each new die type in a dialog for this purpose.

Packaging a VisualWorks image for Distribution, Part 2

May 29th, 2005

Less than a year later, I finally got around to writing the second part of my VisualWorks application packaging tutorial.

Firefox Rocks too!

September 22nd, 2004

Mozilla Firefox, Thunderbird’s browser counterpart, is rapidly becoming the browser of choice for Internet “power surfers.” It’s more secure than Microsoft Internet Explorer, and has all the functionality and then some!

It is customizeable with themes that allow you to choose the look-and-feel of your browser, and a vast assortment of extensions that add useful little features that you never would have thought of on your own, but quickly find indispensable.

Here are some of my favorite extensions:

  • Sage — Collects all your RSS and Atom feeds, making it quick and easy to check for updates on your favorite sites.
  • Adblocker — Like a spam-filter for the WWW. Like an e-mail filter, you have to spend some time “training” it by adding regexes for images to block, but soon you can surf the most banner-laden sites (like Yahoo!) with nary an image to bother you. You can also be extremely selective (for example, I generally like to see Amazon’s recommendations, but I do not want to see Paris Hilton’s horsey face in that jewelry ad they keep pushing, so Zap! No more Paris Hilton ads!)
  • Bookmark Backup — Ever lost your bookmarks file because you had to reinstall your browser? I have. Now you can have your bookmarks backed up to a separate directory automatically, every time you close Firefox.
  • TinyURL Creator — Automatically create a TinyURL from your current page or any other link, for quick pasting into a message.
  • Bandwidth Tester — Test your current connection speed any time.
  • IE View — If you’re like me and you use a bank or other financial institution that uses stupid IE ActiveX controls or other non-standard crap that makes their site unusable without Internet Explorer, this extension lets you quickly pop open the page you are looking at in MSIE. Also useful for cross-browser web development.

Thunderbird Rocks

September 16th, 2004

I have been using Mozilla Thunderbird as my e-mail client for quite some time now. Only one little feature (or rather, lack thereof), has annoyed me:

I normally have it set to block images in my e-mail from being loaded by default, since this is one way spammers harvest “live” e-mail addresses — even if you immediately delete the e-mail, as soon as you click on it you’ve connected to the remote server where the linked image is hosted, and an embedded link can tell the spammer that a live person just viewed e-mail sent to goober@iluvspam.com.

However, I also regularly receive e-mail with embedded images that I want to view — my daily MyComics subscription and some 3D graphics fora newsletters, for example. So before viewing those, I have to manually go to the Tool|Options menu, choose “Advanced Settings”, unlock loading of images, view the e-mail, and then go back and block loading of images again.

Not anymore! :)

Thunderbird 0.8 added a feature where you can block loading of remote images except in e-mails sent from addresses in your personal and/or collected addresses list. So now I can add the addresses of those graphics-laden lists to my collected addresses book, and viola! I automatically get to view images from senders I have designated, while everyone else is blocked!

It’s a small thing, it was probably trivial to program, but it has just made Thunderbird significantly more pleasant and convenient for me.

If you are still using Outlook or Outlook Express…. why???

Packaging a VisualWorks image for Distribution

August 16th, 2004

This is my first attempt at a tutorial. The process of packaging a VisualWorks development image as a runtime image for deployment is covered in much more detail in Cincom’s Application Developer’s Guide. However, I thought it would be worthwhile to write a simple illustrated guide to cover the basics.

The tutorial is here.

I welcome comments, corrections, etc. I will finish the tutorial with a how-to on compressing the image and using ResourceHacker to make a Windows executable shortly.

SmallRoller Update (v. 1.2)

August 12th, 2004

Just a small update to SmallRoller, with an auto-scaling probability chart. I have also made the sourcecode available.

What Java Owes to Smalltalk

August 12th, 2004

I have been writing programs in Smalltalk at home for my own edification, while in my regular job I am mostly working in Java nowadays.

When I first started learning Java, I had no idea — no idea — just how much Java owed to Smalltalk! I still didn’t really appreciate it until recently, when I started digging into Smalltalk more seriously. From the notion of everything descending from “Object” to garbage collection to the inheritance model to polymorphism, almost everything Java does is basically aping Smalltalk. Often not very well. I am using Java’s Collections a lot, and even though Collections are a relatively new transplant from Smalltalk, compare this:


for (Iterator iter = myList.iterator(); iter.hasNext(); ) {
myThing thing = (myThing) iter.next();
thing.doSomething();
}

with this:


myList do: [:thing | thing doSomething ]

I’m not just griping about syntax here; which one, conceptually, looks and feels more elegant, more intuitive? Look at all the method calls and casts you have to do just to iterate through a list of arbitrary objects in Java.

That’s the sort of thing I didn’t really notice until I started working with Smalltalk. Now I shake my head and wonder how much better life would be if it had been Smalltalk rather than Sun’s Java that had been poised to catch the wave in the 90s.

Now, some Smalltalk constructs still feel more awkward to me than the Java equivalents. For example:


if (boolValue) {
doThis();
} else {
doThat();
}

feels more intuitive to me than:


boolValue
ifTrue: [ doThis ]
ifFalse: [ doThat ]

But I realize this is a matter of taste, and being more comfortable with familiar syntax.

Frankly, I like Perl’s flexibility even better in this regard:


doThis() if $boolVal;

or


$boolVal or doThat();

Actually, this is all a good argument for programmers to be multi-lingual. I don’t think anyone can be a “master” of any language until they are fluent in many.

Here is a good article on the advantages of dynamic typing. I am becoming convinced that the “advantage” of static typing is that it provides minor babysitting services for bad programmers. Try writing a Java package that performs all sorts of calculations with int, double, float, and long data types and has to go back and forth between them…. the need to cast, convert, and develop special ways of handling “loss of precision” will drive you nuts.

SmallRoller update

August 9th, 2004

SmallRoller has been updated. Version 1.1 adds the ability to display a probability graph for the dice settings you choose. I have also made available an image file which will run on multiple platforms (including Macs and Linux), though it requires you have VisualWorks from Cincom installed first. Feel free to leave comments or bug reports.

The next improvements I plan to add, as time allows, will be the ability to calculate “open” die rolls and other non-standard dice combinations. Eventually I’d like to allow probability calculations for multiple game system and user-defined dice types. DicePro does a good job of handling almost every imaginable dice type, but does not calculate probabilities.

Syntax Highlighting in Smalltalk/VisualWorks

July 24th, 2004

I have been using Cincom’s VisualWorks for my Smalltalk programming. It’s a nice tool, but one thing I was really missing was syntax highlighting. Nowhere in VisualWorks’ Application Developer’s Guide is there any mention of syntax highlighting.

Finally, I stumbled across a reference implying that there was a way to do this in VisualWorks, and after a Google search, learned about RB Code Highlighter. This is a package in Cincom’s StORE open repository. StORE is accessible from within VisualWorks, but the documentation isn’t exactly clear. You have to poke around on the Cincom community blogs and Smalltalk wikis and newsgroups before you really start learning about these things.

IMO, Cincom has a great Smalltalk product but they don’t do a good job of marketing it or letting new users know about the tools available for it.

VisualWorks with syntax highlighting

Why Beginning Programmers Should Not Use IDEs

July 24th, 2004

As any programmer knows, a good Integrated Development Environment makes your programming time more productive and more enjoyable. IDEs do not make the actual think-work of software engineering and programming easier, however. As my philosophy of coding evolves, I am arriving at some programming principles. So I hereby present:

Fnordistan Software Engineering Principal #1:

Don’t do it in an IDE until you can do it in a text editor.

I first started learning Java in a teaching IDE called BlueJ. You know what? It hobbled my ability to program in Java until I cast off the shackles of the IDE and realized instead how things worked in the “real world” (inside my computer). I had trouble writing Java code, let alone compiling it, unless I had BlueJ to help me. I didn’t really understand where Java “existed” on my machine, or how I could have multiple JVMs in the same development environment, or how you dealt with things like classpaths and external libraries in JAR files.

When I finally took the time to do my coding in a plain old text editor (OK, I used vim — it’s cruel to expect even a beginner to do without syntax highlighting, at least!), and ran javac and java on the command line, things began to make sense. My IDE-constrained conception of “Java programming” fell away, and what now seems blindingly obvious came as revelation and increased competency.

This is not to bash BlueJ — I understand that it is a teaching tool, and it’s primarily intended to teach Object-Oriented concepts, not Java. But I didn’t really understand that at the time, and I chose BlueJ because all the “professional” Java IDEs intimidated me (as well they might a beginner!).

But the fact remains, operating solely within an IDE can leave you as a programmer “tightly coupled” to that IDE. I have even seen it happen professionally. Developers who do all their work within an IDE flounder when they have to step outside of it.

Now, bearing in mind all of the above, I love IDEs. They simplify and automate so many tasks, and put everything before you in such a visually appealing and easy-to-browse format, that going back to a plain old text editor and command line can feel like using long division with 10-digit numbers. Why should you, when a calculator can do the same thing in a fraction of the time and with zero chance of error?

But you shouldn’t use a calculator unless you can do without it if you have to. And you shouldn’t program in an IDE unless you can program without one.

I wouldn’t want to do a Java or C++ project without Eclipse or Microsoft Visual C++, respectively. But many times, the ability to step outside the IDE and run things on a command line, or package up a project so that it will build independently of the IDE, has been a lifesaver. (MSVC++ is particularly bad about this; by default it sticks all kinds of Microsoft-specific header files into every project, leaving you unable to compile with any other tool unless you strip them out.) I have become a firm believer in using Ant for my Java projects, and making sure that my Ant build script is set up so that it will run in whichever Java IDE I choose, or right from the command line, without any difference.

I would tell someone just learning any programming language that until they can write a hello world program, then compile and run it, without using anything more than Notepad.exe and the command shell (or on a Unix machine, vi and the command line), without even thinking about it, they should not touch an IDE. Really, they should be able to build at least a small project, preferably with some external libraries, with only a text editor and a command line. Once they can do that, then they can learn the power and convenience of an IDE.

That sounds pretty darn simple, right? Well, I have a sneaking suspicion that for some Java “developers,” that should have been a question during their interview….

(Java Solution)

SmallRoller

July 19th, 2004

I’m pleased to announce my first project available for public download: SmallRoller.

It’s really a simple little application, but I am pleased with it, and it’s my first real Smalltalk project.

Feel free to post bug reports or comments here.

Code is Poetry

July 16th, 2004

I am a computer programmer. I like coding. In the IT industry, coding is near the bottom of the professional ladder. If you want real money, you need to do high-level infosec work, become a Project Manager (i.e. a “suit”), or join a start-up. Some people think of “coding” as little more than typing into an IDE, but I think writing computer programs is a creative endeavor. You should take pride in the code you write. Good code is enjoyable to write and satisfying to reread. You look at it and think “This makes sense” or even “This is pretty clever,” and if it’s really good, “This is some fine-looking code!” Hopefully anyone else who reads it will think so too.

Bad code is hackwork, and you know it when you write it and you really know it when you read it. You can tell when you read code that someone either didn’t have the time or inclination to make beautiful.

Of course, in the real world the objective is to get things done, finish your project, make it work. The end users don’t care about “beautiful” code, and if your code works well, it’s likely that no one else will ever see it, unless you’re writing a library or something else intended for reuse. So of course many people never bother trying to write code as poetry; some probably think the idea is silly.

I want my code to be functional, but I also want it to be beautiful. I don’t yet have the 10+ years of experience that some people say are necessary to become a really good software engineer. But I think that coding, like any other form of writing, requires devotion to the craft to become good. Some people never devote themselves, others have enough natural talent that they start out writing good code. Most of us can never become Big Names in computer science, but we can at least write programs we are pleased with and which are useful to others. I think devotion to the craft requires enjoying it enough that you spend free time working on it, not just doing all your coding at work.

I am very interested in Richard Gabriel’s efforts to introduce a Master of Fine Arts in Software. I have flirted with the idea of going back to school for a Ph.D. in Computer Science, but if there were a program for a Doctorate of Fine Arts in Software, I’d definitely be interested in that.