Internet at paulcarvill.com, the home of Paul Carvill on the web

link: paulcarvill on twitter

link: paulcarvill at flickr

paulcarvill.com

Hi, I'm Paul Carvill, I'm a web developer. I'm currently working as Technical Lead at LBi, Europe's largest digital agency.

I also like walking, cooking, Bollywood and rock 'n' roll.

Archive for the ‘Internet’ Category

My India travel diary for 2007 and 2008, now online

Sunday, June 14th, 2009

indadiarythumb1I finally got around to typing up my handwritten diaries from my India trip back in 2008. It was a hugely enjoyable exercise reading through the diaries again 12 months later — probably the reason it took me so long to plough through them.

Why put them online? A couple of reasons. I wanted to be able to access them quickly wherever I was, as I often find myself talking to someone about a particular place or event in India that I want to be able to show them a more detailed description of, or sometimes just to remember the name of a hotel to recommend to someone. I also thought they might be useful for other travellers considering a trip to India. Before we went, other than IndiaMike I couldn’t really find any useful ‘on the ground’ reports of day to day travelling around India. In some ways this was a good thing, as we tend to travel extremely independently and this allowed us to travel without any preconceptions. But some people might feel they want to get a flavour of a place before they get there. Also, and probably the most pressing reason, I wanted another nice, simple idea to practice my Django development on.

So I put the new site here: http://www.indiadiary.co.uk. Please check it out and let me know what you think. I’ve included photos of the trip from Flickr and a recommended reading list of the books we went through as we travelled around.

Geocoding location data in a Google spreadsheet

Wednesday, June 3rd, 2009

The problem: I have a spreadsheet full of locations, addresses and place names that I want to publish, along with a map, for at least tens of thousands of people to view.

A solution: Easy — I can put it in a Google spreadsheet, publish it, add a Google map to a page, download the data, geocode the locations and display them on the map.

Another problem: While this is ok in most cases, with a large spreadsheet the geocoding can take a very long time, making my page appear unresponsive and slow. In addition, I have no way of checking that the location data is good enough to map with.

Another solution: Download the data, geocode it using Yahoo!’s Placemaker service, generate a new spreadsheet containing accurate latitude/longitde data and use that in place of the original. The client then does no geocoding their side, it’s all supplied along with the data. Everybody’s happy!

— Go straight to the spreadsheet geocoder! —

I’ve done just that with this PHP script. It takes a Google spreadsheet key, and you must tell it what columns your location data is in. It will download the spreadsheet data, concatenate those location columns, make a request to Placemaker to geocode each location, and return a new CSV file with the geodata columns appended on the end.

I’ve detailed here the various bits that make up the script. The workflow is as follows:

Capture spreadsheet data from user > Load in spreadsheet from Google > For each line in spreadsheet make a Placemaker request > Append geolocation data columns to spreadsheet > Output all results into a CSV file

The script is set to not autodisambiguate, meaning that if it’s not sure what location you’ve supplied, it will return all likely candidates, in order of likelihood. I should mention that Yahoo!’s Placemaker is utterly awesome in find out the ‘whereness‘ of things.

To build your own version of my script will need a Placemaker API key. Other than that, please feel free to copy and paste the code, fix it, amend it and let me know if it’s useful, or if it needs more commenting, or how I could improve it. I wrote this code to fix a particular problem I was encountering, but I’m sure it could work in a few more cases too.

Something to note before I start: the script doesn’t much like having commas in the location data in your spreadsheet. Because Google only output CSV with a comma delimiter, this upsets my CSV parsing. Any suggestions welcome.

This function gets some CSV data from a published Google spreadsheet using a supplied key:


<?php

function getCsvDataFromGoogle($spreadsheetKey) {
	$key = $spreadsheetKey;
	$output = 'csv';
	$apiendpoint = 'http://spreadsheets.google.com/pub?key='.$key.'&output='.$output;
	$ch = curl_init();
	$options = array(CURLOPT_URL => $apiendpoint,
	                 CURLOPT_HEADER => false,
	                 CURLOPT_RETURNTRANSFER => true
	                );
	curl_setopt_array($ch, $options);
	$r = curl_exec($ch);
       curl_close($ch);
	return $r;
}

This function makes a Placemaker geocode request:


function getPlacemakerGeodata ($location) {
	$key = 'MY_PLACEMAKER_API_KEY';
	$apiendpoint = 'http://wherein.yahooapis.com/v1/document';
	$inputType = 'text/plain';
	$outputType = 'xml';
	$focus = '28298150'; // sets focus to Great Britain, not sure how effective this is yet
	$autoDisambiguate = 'false'; // returns the 1 most-likely place, else returns many likely places
	$post = 'appid='.$key.'&documentContent='.$location.'&documentType='.$inputType.'&outputType='.$outputType.'&focusWoeid='.$focus.'&autoDisambiguate='.$autoDisambiguate;
	$ch = curl_init($apiendpoint);
	curl_setopt($ch, CURLOPT_POST, 1);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$results = curl_exec($ch);
	return $results;
}

This function does the bulk of the work, and makes calls to all the other functions:


function parseCsvData($googleSpreadsheetKey) {
	$lines=split( "\n", getCsvDataFromGoogle($googleSpreadsheetKey) );
	if($_POST['format'] == 'csv') {
		if($_POST['locationColumns'] == '' || $_POST['key'] == '') {
			echo "please go back and specify both your google spreadsheet key and which columns contain your location data (in comma separated format, zero-indexed e.g. 0,1,9)";
			exit();
		}
		else {
			// get location columns from url
			$locations = $_POST['locationColumns'];
			$splitLocations = split(',', $locations);
			// set headers to 'csv'
			header("Content-type: application/csv;");
			header("Content-Disposition: attachment; filename=yourgeodata.csv");
			$out = fopen('php://output', 'w');
			for($i=1;$i

This function parses the XML which gets returned from Yahoo! Placemaker:


function parsePlacemakerXML($results, $delineator) {
	if($delineator == 'comma') { $delStart = ''; $delEnd = ','; }
	else { $delStart = '<td>'; $delEnd = '</td>'; }

	$places = simplexml_load_string($results, 'SimpleXMLElement', LIBXML_NOCDATA);
	$locarr = array();
	if($places->document->placeDetails) {
		foreach($places->document->placeDetails as $p) {
			if($delineator == 'comma') {
				$locarr[] = $p->place->name;
				$locarr[] = $p->place->centroid->latitude;
				$locarr[] = $p->place->centroid->longitude;
				return $locarr;
			}
			else {
				echo $delStart.$p->place->name.$delEnd;
				echo $delStart.$p->place->centroid->latitude.$delEnd;
				echo $delStart.$p->place->centroid->longitude.$delEnd;
			}
		}
	}
}

This bit runs when you load the page and works out if you're submitting some data or just viewing the page. If you've submitted data, it runs the main function:

if(ISSET($_POST['submit'])) {
	parseCsvData($_POST['key']);
}

Or if you're viewing the page for 1st time, you get a form to fill out:

else {
	echo "<html><head><title></title></head>";
	echo "<body>";
	echo "<p>Please enter your spreadsheet key and specify which columns contain your location data (use comma separated list e.g. 9,10,11):</p>";
	echo "<form method=POST><p><label>Key:<input type='text' name='key' /></label></p><p><label>Location columns: <input type='text' name='locationColumns' /></label></p><p><label>Format: <select name='format'><option value='csv'>csv</option><option value='table'>table</option></select></label></p><p><input type='submit' name='submit' /></p></form>";
	echo "</body></html>";
}
?>

HTML for humans

Saturday, April 18th, 2009

The most recent Road to HTML5 blogpost is on the subject of link relations. To sum up in one sentence, link relations ‘explain why you’re pointing to another page’. But that is to miss all the detail and nuance of the article, so go there and read it if you’re at all interested in web development.

The whatwg blog continues to be a friendly, highly readable and discursive source of information on upcoming HTML5 developments. It also clarifies many misconceptions you may have had about previous HTML specs and the reasoning behind any changes, often in a satisfyingly wry style. On the subject of the ‘rev=made’ attribute it has this to say,

“The decision to drop the rev attribute [from HTML5] seems especially controversial. The same question flares up again and again on the working group’s mailing list: “what happened to the rev attribute?” But in the face of almost-universal misunderstanding (among people who try to use it) and apathy (among everyone else), no one has ever made a convincing case for keeping it that didn’t boil down to “I wish the world were different.” Hey, so do I, man. So do I.”

http://blog.whatwg.org/

Why people should build their own URL shorteners

Sunday, April 5th, 2009

Lots of blogposts regarding the evil that URL shortening services do appeared this weekend, from Jason Kottke (“URL shorteners suck”), Joshua Schachter, creator of Delicious (“on url shorteners”) and Dave Winer (“Josh is right, URL shorteners are risky”).

None of them are happy, with concerns, variously, about spam, speed, efficiency, transparency, longevity and what Joshua calls “the great linkrot apocalypse”.

I think the one idea that we should take from all of these arguments is that if you are using a URL shortening service you have no control over your links, now or in the future. One such service may get bought up by a nasty commercial entity who redirects all your existing short URLs to its own ends. Your URLs pointing to all your lovingly curated content will, effectively, become spam.

One solution to this quandry, which no one has mentioned, and one which large media organisations should pay attention to, is that building a URL shortener is really, really easy. I spoke to Simon Willison about it and he thinks a URL shortener will soon be the example app that someone builds to learn their way around a new language or web framework, like they currently do with a creating a blog. That way you get to solve many problems at once: control over your own links’ destiny and complete consumer confidence that your own-brand short URLs (gu.com/abcde, nyt.com/vwxyz) won’t take them someone nasty.

And the ability to shorten your own URLs isn’t necessarily restricted to large companies with lots of resources. Many people who want to use this sort of service already have all the tools they need — their blogging software. All that Movable Type or Wordpress, among others, need to do is add an extra database lookup table and pretty soon all their users can take care of their own URL shortening needs.

UPDATE: A useful comparison of existing URL shortening services (even the method of redirection matters: 301 is a permanent redirect, 302 a temporary one, with implications for SEO and link credit).

How to do music lists

Monday, March 16th, 2009

How to do a list of songs on a newspaper or magazine website:

And how not to:

  • The Telegraph’s 100 Greatest Songs Of All Time — in which the adjudicating panel of one – Neil McCormick – hilariously abandons grammar in favour of enigmatic SMS-length capsule reviews. Sample description of The Doors’ Light My Fire, “Provocative, sensual, slinky song weaving erotic desire.” And another one of U2’s Still Haven’t Found What I’m Looking For: “Gospel rock hymn of doubt and spiritual quest.”
  • Esquire’s 50 Songs Every Man Should Be Listening To — let me get this straight: you want me to click through at least 50 times, even more with ads, and if I haven’t heard the band you’re talking about the onus is on me to wade through the internet to find one of their songs to listen to? OK, thanks.

More coverage of the Guardian’s Open Platform API

Thursday, March 12th, 2009

Some more links to coverage of the Guardian’s Open Platform API announcement (some of these found via blogs.journalism.co.uk, thanks):

What people are saying about the Guardian’s Open Platform

Wednesday, March 11th, 2009

I’ve collected together some of the recent articles and blogposts about yesterday’s announcement of the Guardian’s Open Platform strategy and, below, some of the first apps that people have built:

And here are some of the first apps that people have built using the Open Platform API:

Guardian Open Platform

Wednesday, March 11th, 2009

There was exciting news yesterday morning, when we announced the next stage of the Guardian’s stated strategy to be the world’s leading liberal voice. The Guardian is opening out — making our content available for other people to use — and also opening in — allowing developers to build on our platform and deploy applications which extend its functionality.

So, the headlines from the announcement are:

  • Open Platform API
    • search, query, filter and discover content, keywords and tags from the Guardian’s archive
    • contains full textual content of all Guardian articles going back to 1999
    • currently in private beta (apply for a key)
    • free for the first 5000 queries per day
    • can be used for commercial purposes (you can make money by running ads with it)
    • it will at some point in the future be ad-supported on pages using the full content
  • Data Store
    • a curated collection of data sets
    • researched, verified and attributed to its source
    • hosted on Google Docs and free to use
    • covering subjects such as diverse as US economic data, environmental statistics, crime figures and religious information
  • Data Blog
    • accompanies the Data Store
    • will provide information around the raw data: how we sourced it, why we use that particular data set, what the information might mean

This constitutes a wealth of information to announce in one go, and it may take people some time to digest it all. The really exciting thing about this move is that we’re putting the full content of our articles out there for people to use. The implications for data mining, linguistic research and deep textual comparisons are endless, and I’m really looking forward to seeing what people come up with. Having context to the data is really important, so people can do much, much more than just link back to our site using a headline or an excerpt.

The Data Store is also a really bold move. Simon Rogers, one of our News Editors, and the journalists here put amazing amounts of effort into research, and here we are returning the fruit of their labour into the community. Of course, we use that research to report and editorialise, but here we give you the opportunity to derive your own patterns and meaning from the same data. The fact that this stuff has been manually sourced, collated and published makes it mean so much more, and I’m sure people, including other journalists, will find it an increasingly useful source of information for years to come.

I’ve collected some useful links here which are specifically related to the Open Platform:

I’ve also collected some of the news coverage and blogposts about the announcement here:

Forking hell! OSX, PHP, GD, Freetype problems? Read this…

Sunday, November 16th, 2008

So, I was trying to make a set of Moo cards, using the MOO API, as part of The Guardian’s first ever Hack Day. It’s very easy and fun to use, and I enjoyed the learning process of formatting the images and data and submitting the constructed XML to MOO to print the cards. But…

But, the formats available from MOO are quite restrictive. This is understandable, as they want to retain some control over quality and their own branding, which is held in high esteem. For example, you can only ever put an image on the front of the card, and text on the back. I wanted image and text on the front.

I was using PHP to create the XML to postto MOO, so now I needed to learn how to use ImageMagick to merge some text into the image I was using. Unfortunately I’m not a command line geek, so I tend to get stuck when someone tells me to compile PHP. Luckily, someone was on hand to help me install GD, which is considerably easier to use.

I used GD to merge text into the image using imagestring. But I wasn’t able to successfully specify the fonts to use – every image was rendered with the default system font. GD wouldn’t work. Then I tried using imagettftext. This resulted in a blank page. I was using GD 2.3.5 on PHP 5. Eventually I found a link which explains a problem with Apple’s default implementation of Freetype in GD that crashes GD if you try and specify a font.

The result? I installed Macports, and updated PHP and Apache that way, resulting in a new install with GD 2.3.7

And it’s all working now! Now I’ve gone over the problems, I’ll post a bit more about actually creating the cards next.

The Internet Fridge delusion

Thursday, October 23rd, 2008

I sat through a debate on Tuesday night that was more interesting for its audience and positioning than it was for its addressing of the motion — ‘the internet needs magazines more than magazines need the internet.’

The debate was held at the London College of Fashion just off Regent Street, and was organised by the British Society of Magazine Editors and editorialdesign.org. The audience was evidently a non-technical one, keen on understanding how the whole magazine/internet crossover thing might work. My colleague, a designer, later commented that the room was full of fear, fear that everyone’s hard won Quark or InDesign layout skills would prove insufficient for the brave new technological world. And rightly so.

The panel was made up of a variety of figures from the print world, some of whom had made forays into the web. It became obvious that we were in a decidedly nontechnical audience for whom the internet was an unknown quantity and whose main concern was replicating the magazine experience in web form.

Straight talking David Hepworth, the esteemed editor of The Word magazine and co-founder of such titles as Mixmag, Smash Hits, and The Face, nailed the argument with the night’s opening statement. Do not attempt to reproduce the magazine experience in web form. The printed word is glossy, definitive and final. The web is none of these things. To work on the web, said Hepworth, your offering must have humility, economy and personality. At least two of those things can be said to be absent from the UK’s magazine culture, with the third possibly endangered in the vast majority of the output.

Then at one point the discussion descended into one of gender politics stemming from the unfortunately all male panel.

The most glaring misjudgment, however, was uttered by Paul Kurzeja — creative director at Redwood, the world’s biggest customer publishing agency. Declaring the future to be one of technological revolution and infinitely diverse media, Kurzeja invoked that most misguided of delusions — YOU WILL HAVE THE INTERNET ON YOUR FRIDGE.

Why does this crazed obsession with the assumed permeation of technology into every area of our lives persist? I suggest it’s more an allusion to the quotidian nature of the fridge, the kitchen and its intrinsic presence in our life. It also assumes that we all have a huge, family-sized tank of a fridge, big enough to fit a screen on as well as an ice-crusher, smoothie-maker and little Rupert’s simply darling hamfisted scribbles supposedly meant to be Mummy and Daddy. But, really, I think the ‘empty bottle of milk’ is a perfectly adequate graphical model for the average consumer to reference when considering whether or not to buy more cow juice. Don’t you?