Archive for the 'Campfire Stories' Category

Two (or three) waves of broken Captchas

Posted in Blogs on blogs, Campfire Stories on May 4th, 2009

Captcha is that technology you see when a site (like mine) is trying to keep spambots from posting spam comments or create junk accounts. Captcha uses a distorted image of a word like this

Captcha

The idea is that only an actual human can read this image and see the word within. The human is then asked to type this word correctly before they are allowed to enter a comment or create an account, etc.

There were three waves of failure for the Captcha technique.

  1. Initially, webmasters did not even distort the work in the picture. Spammers simply used OCR technology to “read” the picture.
  2. Next, webmasters used scrambled words like above. Spammers used better OCR.
  3. Webmasters improved the scrambling (see below). Spammers enlisted humans to do the OCR (!)

Improved Captcha

Enlisting humans to do OCR? Basically the image is relayed to users who are either paid money or paid by access to porn to solve the captcha and return the solution which can then be used to gain machine or script access to the target system.

Wikipedia does its usual great job explaining the whole Captcha thing.

Jeff Atwood does it best at Coding Horror: March 04, 2008 — CAPTCHA is Dead, Long Live CAPTCHA! 

Jeff also has a great thread on it in his discussion of his ongoing efforts to manage bad behaviour at his excellent StackOverflow site.

NTFS tricks and the full c: disk

Posted in Campfire Stories, Windows Details on March 9th, 2009

In the last week or so I’ve hit a point on my workstation where booting and running everyday apps has just slowed to an unbearable crawl.  This happened to me before but I switch machines enough that I usually avoid the issue by, well, switching machines.

This time I think I have to actually stop and fix it — because I won’t have a new machine for another week or so.

Dang. Forced learning.

DiskThrash

Using perfmon it is clear the disk is thrashing. Note the top two counters,  pages/sec (yellow) and Average disk access queue (blue)  are pegged but CPU (green) is mild.

This is with the machine basically idle. I have the usual suite of things like Google Desktop and Tortiose SVN running so I’m not shocked things are accessing the disk but the machine is almost unusable and it was running fine about a month ago. No, I checked, I don’t have a virus. In my life the problem has never been “it’s a virus.” It has always been “I did something stupid.” I guess viruses also fall into that I-did-something-stupid bin …

I’m sure some of you are ahead of the story and saying “has he checked his disk space?” Turns out I have almost 21% free space on a 320 GB drive. Is that a problem?

Yes.

This is a Windows XP system with NTFS on the C: drive. I actually have couple more 320 GB drives on this machine but they’re basically empty. Why are they empty? Dumb reasons. I have a bunch of alpha and beta quality projects going, each of which has all kinds of massive data sets and each of which the developers insist “install it on the C: drive — it doesn’t work quite right in other locations.” Sigh. They are mostly not my developers so I can’t explain (or yell) to them HOW STUPID IT IS TO WELD YOUR APP TO THE C: DRIVE.

But why is an 80% full NTFS partition a problem? When I started this I actually did not know why but I’ve known for years that NTFS disk performance goes to crap once you get north of 70% full or so — but why? I found some of the best information on this topic on this page by Mitch Tulloch at O’Reilly Windows Devcenter.

Based on Mitch’s descriptions, my best theory is that it’s a combination of the Master File Table (MFT) getting fragmented as well as space needed for the pagefile. This hints at two fixes: move the paging file and clear some disk space.

Getting into the disk management applet,

Diskmanager

I saw that Dell helpfully left me a 3.11 GB partition that was unused so I formatted it FAT32, declared a 3067 MB page file there, and removed the one on the C: drive.  Note: Ideally the pagefile partition would be on a physically separate disk but I’m working with what I have.

After a reboot I have:

Disks

Performance is much better — the machine now thrashes for about 3 minutes after boot and login as opposed to 15+ (!) minutes before moving the pagefile location.

However, I’m at 79% full (down from 82%) on the C: drive so I’m still in serious risk of MFT fragmentation so let’s clean up the disk.

Gina Trapani wrote this very helpful post about using WinDirStat to see what’s using space on your disk.

WinDirStat is a really cool tool!  I’ve learned about (and subsequently forgotten) this tool several times. I have on the order of 1,000,000 files on this system so the graphical tool to help me home in on the disk hogs is really helpful.

windirstat

After some quality time marvelling at all the cruft I had accumulated on my machine (why did I have two cygwin installations? Why did I have one? ) I moved or deleted  about 60 GB of stuff and got to around 38% free and the machine is running much better now.

The scary thing is that I absolutely “need” the 180 GB in use now. It was only a few years ago that 30 GB drives were ok…

It’s Done™ (a.k.a. it Works™)

Posted in Campfire Stories, The Art of Programming on June 10th, 2007

Have you ever had a disagreement about when something is actually done (as opposed to “not done yet”)?

I started learning about “done” when my parents started making me do chores around the house.

Doing dishes: Walk to sink, wash dishes, put in drainer. Done. Am I right? I though I was right.

“Did you empty out the dishpan?” my mom asked.

“No.”

“Well, you’re not done yet.”

Later, I say “now I’m done.”

“Did you wipe down the sink, counters, and stove?”

“That’s part of ‘washing the dishes?’” Even a ten year-old understands the concept of scope creep. The answer was yes but through my push-back I won a change in the name of the task. From then on “Doing the dishes” had to be called “Doing the dishes and cleaning up the kitchen.”

The bottom line: in a team environment the implementer can influence but does not own the determination of when a task is done. Like project scope, doneness is a value negotiated with the customer of the work.

I was recently asked to define “done” in my organization. This is a clear sign that something might be broken. Maybe it just means that there have been recent disagreements about done?

What is Not Done?

I often find that a helpful way to define a positive value is by defining the corresponding negative value.

Not Done is also “it doesn’t work.” To me this is an obvious corollary but I find that this does not occur to some people. The reverse is certainly true.

Not Done is when code “Works on My Machine.” I cringe when I hear “it’s working on my machine” especially since this is typically a response to someone saying “I can’t get it to work” This is ESPECIALLY bad when heard in response to “I can’t get it to build.” When a developer says “it works on my machine” they are really saying:

I do not know what this software depends on.

I do not know what’s involved in moving this software to a clean machine.

I have not checked in all the required code into the source code control system (and I probably don’t know what’s missing).

I do not really care that it does not work on your machine. I’m hoping that’s your problem…

Not Done is when code has not been tested. Untested code does not really work. This is because code that has not been tested invariably has some issues when actually exercised. I consider unit testing the absolute bare minimum level of testing before code can be considered done. An alternative to unit testing? Sure, a full blown test team.

Not done is when code cannot be reliably deployed. This is related to the “it works on my machine” issue. Reliable deployment means the developer has thought about all the steps, interactions and dependencies their code has and has documented them in detail sufficient to allow installation. Maybe the developer even went wild and wrote some install scripts or set up a full featured installer. It could happen.

Documentation

Is documentation required before code can be considered Done? If the code cannot be replicated, tested, or deployed without some written assistance then yes, it cannot be considered done until enough documentation has been written.

I find explaining to a developer that this piece of code is theirs to work on forever and ever until they write enough documentation to hand it off works as a powerful motivator to getting documentation written.

So What does Done mean?

Based on the discussion above I think I’ve outlined my definition of Done:

It has been tested.
It is reproducible and deployable on all supported environments.
It is documented sufficiently.
Most importantly (recalling my experience with the dishes), the customer agrees that it does what they want.

My little piece of Windows Vista

Posted in Campfire Stories, Windows Details on February 6th, 2007

null

“In addition to our summer and winter estate, he owned a valuable piece of land. True, it was a small piece, but he carried it with him wherever he went.”

From Woody Allen’s Love and Death.

So, what HAVE I been spending my time on? My little piece of the Windows Vista operating system.

For the last 20 months I’ve been building the digital locker assistant (DLA), a dedicated download client that works with Microsoft’s online digital locker, which is in turn part of Microsoft’s Windows Marketplace. Windows Marketplace is where that mysterious “Windows Catalog” link on your Start/Programs menu goes to.

Windows Marketplace supports direct browser based downloading. However, when the download is greater than 1-2 Gigabytes using the DLA is a much better way to go. The most popular use of the DLA so far has been buying and downloading entire copies of Windows Vista and Office 2007.

We were rather skeptical that users would want to download Vista or Office since they are really big downloads. However, earlier in the year the success selling and downloading super large games from Windows Marketplace convinced everyone that downloading Vista would be attractive to consumers. And indeed it has!

You can get the digital locker assistant two ways: If you have Windows XP, go to the Windows Marketplace website, create an account and download and install the MSI. It’s only a 1 meg download.

Or, it’s built into every copy of Windows Vista (except Server versions).

Actually building a part of Windows Vista was a huge effort but it’s really neat to install Vista and see my little piece in there. When I say “my,” it’s more like I’m using the Pluralis Majestatis, the Royal We. I was part of a team and We had LOTS of help.

I was the dev lead for the DLA for XP and Vista. Two very senior Windows developers with me at Vertigo, Chris Idzerda and Ralph Arvesen, rounded out the dev team (that is, they actually did most of the work). Initially, I was dev lead and PM but soon we needed more help with the process and got a full-time program manager, Anne Warren, who was also PM for the Windows Marketplace (WMP) website. The website dev team was some 15 developers and we had a build team of one (that should have been three). Our test team was in India so the dev/test cycle was almost 24/7, something like 24/6 – we’d hand off work in the afternoon and it would be tested all (our) night with a nice bug list waiting for us in the morning.

And then there’s the rest of the Vista team at Microsoft: really a cast of thousands. I think they ALL emailed me at least once. The High DPI functionality team. The Localization team (“do you know your UI looks really bad in Arabic?”), the Group Policy team, Remote Desktop team… you get the picture.

Let’s look at the app

In Vista there are two ways the digital locker assistant (DLA) may be invoked. The primary way is when you buy something at Windows Marketplace and it’s in your digital locker and you click “download.”

You may also browse to the DLA by finding it under the Vista Control Panel.

Then look under Programs or Programs and Features (using Classic view) and find “Manage programs you buy on line.” If you open this link you will invoke the DLA and if you have never sync’d up with your online digital locker you will see this:

If you have software in your online digital locker you can see it listed here by clicking on “Sign in if you already have a digital locker account.” Digital locker accounts are Windows LiveID (a.k.a. Passport) accounts mapped to a Windows Marketplace account. You’ll get a login prompt:

and after synchronizing with your online digital locker you’ll see all your purchased, free, and trial items listed. In my case (below) I clearly have a bunch of games in my locker. These were just to test downloading large items. Right.

Technology under the covers

The DLA is built in Win32/C++ as an ATL Windows application but we get some goodies from WTL as well. For those going “huh?” look at my post ATL and WTL resources.

At this point most people are asking me: why C++? Why not .NET and/or WPF? Or, if you’re using C++, why not MFC?

The DLA started (and is still available) as a downloadable application for XP. Our target users are what Alan Cooper would call “permanent beginners” (like that relative that always calls you for tech support…) — with a modem.

This means making the download as small as possible. Vertigo is a premier .NET shop but we could not use .NET because the 22 MB .NET runtime install kills us (that .NET never made it into the XP Service Packs… argh). Fortunately, we happen to have a few developers around (i.e., old geezers) who can do C++. We used ATL again to keep the size of the executable small.

In hindsight, it was just as well that the XP effort started in C++. Once we expanded the project to include being built into Vista we found that, in Windows System programming and the Vista source tree, C++ is expected and still king (See my post Has Microsoft flipped the Bozo bit on .NET? for a full discussion).

This meant that we could develop one source code base and, with some care, make it build in the Windows OS build system for Vista and VS2005 for XP.

Single source is nice but why not make a single binary that runs on Vista and XP? Sigh. We do — sort of, but it’s complicated.  From a programmer’s perspective, Vista makes one dramatic change from traditional Win32 applications and that’s in how localized resources are loaded.

To handle localization traditional Windows practice is to create an RC file for all resources (dialogs, images, sounds, strings, keyboard shortcuts, etc.) which are compiled into the resource DLL. Localization teams produce localized RC files based on your master RC file and these are all built into a suite of resource DLLs. At run time the application loads the appropriate resource DLL based on logic you have to write which looks at the calling thread’s locale settings.

Internal to the application is a language-neutral block of resources (typically English-US based) and if an appropriate external resource DLL cannot be found for the current locale settings, this internal block is used instead. This is known as “fallback” behavior.

Here’s the new twist in Vista: in Vista the OS loader (not the app) picks the resource DLL and locates it in memory where the app thinks its internal fallback resources are. This is expected behavior and currently only appears to work for a native Vista-built application so our “legacy” resource loading technique as used in XP was not acceptable to those who guard the Windows Source tree. Did I mention all the code reviews? Making Vista-style resource loading work in XP, while theoretically possible, was a task we did not choose to take on. So we ended up with one set of source code feeding two build processes; one for XP and one for Vista. Through careful coding there are remarkably few “if Vista do this, if XP do that” points in the code. 

While we currently block running the XP installer on Vista (in theory blocking installation of the XP DLA on Vista), it turns out that the XP DLA runs fine on Vista. I should not be suprised by this becuase we did quite a bit of casual sanity testing on this but it was not initially part of the test matrix. We found out by somewhat by accident as users were upgrading their XP machines (where the user had added the XP DLA) to Vista and then running the XP DLA.

For our downloading mechanism we hand off all download jobs to Microsoft BITS (Background Intelligent Transfer Service). While BITS works well for us I still think Micorsoft is tempting the gods by including “Intelligent” in their product name. BITS is the guts behind hwo Microsoft Updates are downloaded. I’ve also discovered that Google Updater uses BITS as well. What we gained by using BITS was automatic download management including background downloads, downloads that persist when our application is not running, downloads that seamlessly restart when the machine is rebooted, and lots of error handling algorthms that we did not have to write or maintain. I’d use BITS again if needed. We did have to build a simple HTTP download as well because some modem-based accelerators do not play nice with BITS.

Overall it was a great experience. While it was sometimes chaotic and exhausting, it was a lot of fun too. 

I’d do it again.

Really.

After I’ve had a couple years to rest.

The Dead PC

Posted in Campfire Stories, Hard Stuff on April 30th, 2006

One of our PCs died last week. What a chore. The machine was about three years old and there were no warning signs. The machine is at a desk in the bedroom and while I was watching TV one evening I heard it stop. I tried restarting; it whirred for a bit and then shut down. Uh-oh. Pulling the machine out I’m thinking “oh, it’s just a power supply.”

It’s never the power supply 

I have a bunch of really nice power supplies in their boxes from all the previous occasions I thought it was the power supply. My experience in over 20 years of “modern” PC ownership: it’s never the power supply. Not to say power supply failures don’t happen, it’s just that they don’t happen to me (yet). My experience has been: if the machine dies, it’s dead. There’s no fixing it. Pull the hard drives out and move on.

Ok, I heard a bunch of people say to themselves “wait, you could troubleshoot it and have it working in a few weeks after a few dozen trips to Fry’s and Radio Shack for some simple parts. An oscilloscope would show you …” Right. Look, $699 buys a lot of machine these days. $699 is much cheaper than several weeks of my “spare” time lost to chasing a problem (that the CPU really did fry because the heat sink was too full of dust).

What really fails? 

For me the devil has been miscellaneous “motherboard/CPU issues” (three times, counting this failure), and disk failure (once). I typically have 4-5 machines in use in the house at any one time. Over 20 years x 4.5 machines = 90 machine-years of PC use so with only four failures I think I’ve had very few problems. The one disk issue gave me lots of warning: it was an NT 4.0 system and the system log started showing disk errors. I was able to get everything off onto a new drive so I’ve never lost data (so far). I still have emails stored from 1990 (I’m not sure that’s a good thing).

What works?

Running machines all the time 24×7 is definitely better than turning them on and off once a day. I have a domain controller that is a Pentium Celeron 300 MHz built ten years ago. It’s had three drive upgrades but still runs fine.

Distribute critical data. I almost never make backups. There. I said it. Yeah yeah yeah, I know you’re supposed to make backups. EVERYONE knows you’re supposed to make backups.

At least I’m honest enough to admit I generally do not make backups. Anyone else who says they are continuously backed up is simply a liar.

Who has time to row fifty (or one hundred) CDs through the CD burner? It’s not that I have not tried. I backed up a 20-meg hard drive onto 3.5″ floppies once (remember 3.5 floppies?). I bought a tape drive once. Two hundred fifty whole Megabytes a tape. Modern hard drive capacity growth devours any backup strategy I can think of. Modern hard drives are also your fastest cheapest back up medium (see Raid 1 below).

Source Code Control: I use Source Safe for a lot of my data backup. I copy stuff to my laptop. I copy stuff to my work computer.

Raid 1 your disks: I buy SATA Raid 1 on all my new desktops. An extra $80 for a 200 Gig backup drive? That’s a no-brainer. Why bother with Raid 1 if I seemingly do not believe in backups? While I have never lost any critical data I DO mind how long it takes to rebuild a machine and get all my critical applications and tools re-installed.

Vacuum once in awhile: I now buy those cool cases with windows in the side and lights inside so it’s visually really offensive when it’s all full of dust inside.

Do we watch TV anymore?

Posted in Campfire Stories on April 2nd, 2006

One of my coworkers works remote from somewhere that’s, well, remote. A recent thunderstorm blew down his antenna and from three fuzzy stations he’s now down to one channel. And it’s not PBS.

He emailed us all at work and asked about satellite options and what kinds of things we watch on TV. Making a list I realized, I do not watch much TV.

When we were in grad school in Bloomington, IN, if you did not have cable you could only get the local college-hosted PBS station and frankly, between that and video tapes we were happy with it.

At my house these days we have something like 100 channels on cable (but none of the Movie channels like HBO — call me cheap) but I have noticed that we only watch a very few of those. Listing them out we watch (by network):

1.  ** FX
2.  ** TNT
3.  ** AMC
4.  ** BRAVO
5.  *  PBS
6.  *  CNN/Headline News (two channels but I think of them as one)
7.  *  HIST (History)
8.  one of the local news shows for 45 minutes in morning (CBS)
9.  A&E
10. DSC     (Discovery)
11. TWC     (Weather Channel)
12. COMEDY

** = watch these most – lots of movies.
* = watch more than some other channels.
No * = I watch occasionally.

Conclusions:

I could live with just the ** and the * channels. If I could ONLY have the 12 channels on the list I would almost never notice since this is really a two-sigma list (better than 99%) of what we watch.

Interestingly, 12 channels fits in the classic TV band: channels 2-13.

Other than the local news, I never watch any of the classic networks (ABC, CBS, NBC). I’m right at the cusp of the Baby Boomer Generation (the young end, thank you!). The networks are dead.

I never watch ANY serial shows (“series”). Ok, I’m lying, I watch series that are on HBO when they come out on DVD. Series on the networks are double dead.

Of course, my wife and I have our Netflix habit running at a steady 4 movies a week. For scale, those movies make up fully 80% of the time we spend watching TV. We also tend to multi-thread when watching broadcast channels but single-thread when watching a DVD. Movies still Rock – but not at the theater.

When did this happen? When I was growing up we watched network TV and we went to the movies every other weekend.

I’m looking at the listing in the local TV Guide and I see that if we subscribed to everything, there are 851 channels available. 

Guess somebody’s watching.