Finally I can tweet in style... (using my own domain)

Translation Note: The Engels version of this content is being displayed because the Nederlands translation is unavailable.

I have my own 'ShortURL service' now!

But, very much in the Roderik spirit, when investigating this one task, I discovered that it could be done better if I tackled another task first. But I wanted that other thing done right, so I had to do another thing at the same time. Etc. Smile

It started out with Twitter. I got myself a Twitter account, for no reason really... I'm just cranking up my 'online presence' recently, and wanted to try out if Twitter would be fun/useful in any way. But ofcourse I want integration with my own site, or it ain't fun for a Drupal tinkerer Smile So I tried it out, one lost afternoon in between job interviews... and the twitter module could at least, out of the box:

  • display my tweets on my site
  • post a new tweet along with a new post, containing a URL to the post. (A shortened URL ofcourse, to save precious character space.)

(It may be able to do more easily, with a few defined triggers/actions. I haven't tried that out yet because I got sidetracked with the other interesting stuff.)

Then I discovered that with the shorten module installed, the URL ('shortURL service') used by the tweets referring to new posts, which is shorturl.com by default, is configurable. And with the shorturl module installed, you can use your own Drupal installation to generate short URLS with your own domain name.
And I had a short domain! ro.muit.nl/CODE is shorter than shorturl.com/CODE. It's still longer than other external services like is.gd, but an own shortURL service still has a certain coolness factor Wink After trying the shorturl module, I discovered that it's possible to use any domain you point to your Drupal installation, for the same shorturl (so ro.muit.nl/CODE resolves to the same thing as roderik.muit.nl/CODE right now). So I decided I'd make up an even shorter domain that I'll probably use to post shortURLs to other sites: g.wyz.nl - but for links to my own site I'll use ro.muit.nl; it just seems more fitting.

Ofcourse, as is usual practice, while installing shorten and shorturl modules I found bugs, which I fixed and posted patches for. (Which took me about a day. One of the bugs generated some discussion, which generated more work for me. I'll follow up on this, but didn't have time to do that properly yet.)

Then there was another issue... ro.muit.nl was still taken by a set of crappy old HTML 4 pages from 2001 Smile The task of cleaning those up, 'freeing' the domain and making it point to my current site has been lying around for ages. I didn't want to go through all the content to decide on what should be kept and what not... just incorporate them into my Drupal site, and worry about them later.

How to do that? I came across the import_html module and decided to try it out. Especially since I know some other ancient HTML sites that I should ideally (have other people) import into Drupal.

I was afraid that this module would be an impossible-to-tame beast. But testing it worked a lot better than I expected. There are some 'UI WTFs' in the import/settings screens (and ofcourse there was the obligatory PostgreSQL fix), but otherwise the module is quality stuff. It's been given thought, does good reporting and the maintainer has really made an effort to provide documentation and give people a good feel of the process they're about to perform. With that, the tests 'just worked'. Excellent!

But then the big question was... how to select the content (and language, and 'original created/modified time') to import from these HTML pages, and scrap the rest (header, footer, ugly half-manually made HTML menus)? The module uses XSLT to do this kind of stuff. And I don't know XSLT!

Ofcourse, this was a nice opportunity to learn the content behind even more buzzwords. And I like 'learning by practical example'. So I dove in. ...and discovered this XSLT stuff wasn't so easy that I could see what was exactly going on by just reading the provided example XSL template. But... after spending a night reading up and tracing the import process with my debugger to see why it wouldn't pick stuff up... I knew what selecters I could use to extract content from my crappy HTML, using a modified template.

That made me feel all good. But I wasn't there yet. Because the import_html module would not let me keep the modification times of the pages intact, or set the right language for them. I had to write code for that myself.

Soooo, I thought... let's whip out Python again and really use it for the second time ever! And it worked. I don't get cold feet or drown in documentation anymore. I still have to read up on every function call and operator I use... and I feel like I've forgotten everything from the python tutorial that I read so meticulously before producing something the first time... but still. I don't drown. I just

  • googled for different python libraries doing something with XSL or XML/HTML import
  • decided Beautifulsoup was the one for me
  • opened up a browser window to the standard library documentation, and while coding, browsed around for a subsystem/function every time I needed to do anything (SQL connection, string parsing, date parsing, ...)

... and came out with a working script. I think I like this way of working. I think I can write scripts again Smile

So then I could parse all files and update the Drupal database with dates & language. There was just one thing left... the language handling of aliases. The import_html module had nicely imported all old paths as aliases... but my old Dutch pages (luckily) all had prefix 'nl/' and my English pages had... no prefix. Drupal (in my multilanguage setup) can't really deal with that. (The aliases 'without prefix' work, as long as you don't fill the language.alias field. But then the next time you edit an English language node, that prefix is filled, and your old familiar path without 'en/' prepended, stops working.) But I found the i18nredirect module, which is pretty much made for this purpose (i.e. keeping old aliases working on a site that has moved to multilingual prefixes). So all was well.

So now I have...

  • the feeling I'm really starting to get the hang of using Python
  • made my first real acquaintance with XSLT
  • a little experience with importing HTML from old sites into Drupal, using both of the above and the import_html module
  • finally dismantled the old ro.muit.nl (and while I was busy anyway, reinstated muit.nl itself from a white page to its 1999 look Smile )
  • installed 5 Drupal modules
  • ...and, what this started out with: ended up (partly) integrating my new Twitter account into my website, with my own shortURL service.
  • Phew! There's more to do before the 'twitter integration' on my site gets pretty, but that'll need to wait. At least I have the feeling I can now start really using Twitter.

    Now let's see if I'll actually do it... Wink