Corpus Sketcher


After completing the development of the Cambridge University Press Dictionary API site we were asked by the Press to propose ideas for a ‘sketcher’ tool that would allow them to demonstrate the interrelationships between words in the Cambridge English Corpus – a multi-billion word collection of written, spoken and learner texts, drawn from a huge range of sources including: newspapers, the web, books, magazines, radio, exams, schools, universities and everyday conversation.

We decided on a simple word cloud representation: an familiar interface for users and also flexible enough to respond to different visual and structural contexts. The result was the Corpus Sketcher, a client side HTML/JS visualisation tool that was driven by a D3JS framework (

Because the D3 framework is compatible with a range of platforms and devices whilst being web standards compliant it allowed us to visualise whilst retaining a large cross platform user base – necessary for the huge reach of the Cambridge dictionaries site.