It is June, it is raining and our dog has very damp paws. His paws are like sponges and there are enormous wet paw marks on the floor. Happy days. (I’ll post of picture of said hound soon).

I haven’t posted for a while, and that is partly because my every waking moment (slight exageration) has been spent on trying to create dictionaries using wiktionary (as permitted by the creative commons license).


I looked into licensing dictionaries to add to SL and the cost is prohibitive. I naively thought that a big dictionary could be licensed for maybe £100 a year. It turns out that the real cost is prohibitive. Add some 00’s kind of prohibitive. Welcome to the real world.

But I need dictionaries …

1. I need them for part of my secret and as yet unstarted project. But dictionaries will be required.

2. I need them so I can create free dictionary aps for iPhone and Android devices.

3. There are a lack of free dictionaries (of a decent size) on the internet for languages with fewer speakers.

4. I want a dictionary page on SL with dictionaries and dictionary games – designed to work well on tables, pads and phones.

And so my only option was to create dictionaries myself. The way to do this is to use the data from wiktionary, parse it using some whizzy coding, create a flat file, index it and create a database.

I started with Italian (as I’m learning the language), and assumed that I would crack this relatively trivial task within days. An hour here. An hour there. Bing. I would have a dictionary.

It turns out, that while the Wiktionary is very easy for a human to read, it is a non-trivial task to write a program to parse it, and spit out a dictionary. In fact, it is a tedious, unforgiving and difficult.

Anyway, I’m a l33t programmer and I have teh skillz:)

So, after a lot of frustration, I have almost written some code to parse the Wiktionary and produce a dictionary.

I almost have an Italian dictionary with hundreds of thousands of words.  And when I do, I will add it to SL. I hope it will be as good or better than any of the expensive branded dictionaries.

It is, as they say, coming soon …



