Wednesday, 16 February 2011

Changing the guard

This topic leads to stories about leadership and change:

personally began led figure including personal made work role long image strong life close state leader year political years

There are nine cables:

06HAVANA8633 2006-04-20
Castro has reduced his public profile
since February, but rumors of debilitating ill health are

06BARCELONA179 2006-12-01
Jose Montilla received the medal signifying his position as President
from out-going President Pascal Maragall

07HAVANA5 2007-01-03
Cuban media carried a New Years message from
Fidel Castro noting the passing of 48 years since he assumed

07ISLAMABAD5388 2007-12-28
The Pakistan Muslim League has unofficially selected former Punjab Chief Minister, Chaudhry Pervaiz Elahi, as its candidate for the Prime Ministership following the January 2008 national elections

08ASUNCION358 2008-06-02
President-elect Fernando Lugo will need to rely on his diverse background to govern Paraguay and hold together the varied interests in his political coalition

As Gordon Brown lurches from political disaster to disaster, Westminster is abuzz with speculation about whether he will be replaced as Prime Minister and Labour Party leader

09PRETORIA954 2009-05-12
Jacob Zuma, the President of the ruling African National Congress party (ANC), was inaugurated as the fourth post-apartheid president of South Africa

09AMMAN1689 2009-07-28
Anti-Palestinian hooliganism and slogans denigrating the Palestinian origins of both the Queen and the Crown Prince led to the cancellation ...

09STOCKHOLM679 2009-10-30
WHO IS SWEDISH PRIME MINISTER FREDRIK REINFELDT? we wanted to give you some background on this composed and reflective individual ...

09SANTIAGO919 2009-12-02
In his short political career, Enriquez-Ominami
has distinguished himself primarily by bucking the political
establishment and refusing to toe the party line while
simultaneously leveraging his establishment connections

To explore these and other topics in the wikileaks corpus try the browser: you use your web browser's search function to find topics mentioning your chosen keyword, or just scroll through the list of topics to find one of interest, then explore the topics you uncover ...

Brasil's shootdown program

Did you know that Brazil has a program to shoot down planes suspected of smuggling drugs?

One of the 512 topics identified from the #cablegate corpus has the following keywords:

provide comdabra aircraft notes braf exchange machado safety shootdown information traffic gob ref abd procedures control air program force

This identifies three cables:

04BRASILIA1938 2004-08-02 20:08
GOB would provide information to the USG about the status of Brazil's shootdown program once that program begins

06BRASILIA2002 2006-09-20 19:07
Mission Brazil herewith recommends annual recertification by the President of Brazil's Air Bridge Denial ('shootdown") Program/p>

09BRASILIA1142 2009-09-14 21:09
Mission is confident that there has been no deterioration in Brazilian safety standards over the last year and recommends that the Presidential Determination on the Brazilian Shootdown Law be renewed for 2009

This is one of five topics containing the keyword 'aircraft'. The others deal with rendition, Brazils purchase of the FX fighter, Varig's problems and their implications for Boeing, and airline security.

To explore these and other topics in the wikileaks corpus try the browser: use your web browser's search function to find topics mentioning your chosen keyword ('aircraft' in the example above), and explore the topics you uncover ...

Tuesday, 15 February 2011

Wikitopics browser

Please try the first release of our topic-based document browser for the #cablegate corpus. We welcome your comments.

Use a recent Webkit browser - Google Chrome, or a nightly build of Safari.

The browser shows topics as word clouds. Hover over a topic to highlight the documents in which it occurs. Hover over a document to see which topics it contains.

Example use case:

The browser launches with 512 topics covering over 2,000 documents. Click on any topic, then search for Mubarak using your web browser's search. You will find three topics containing this word. Each of these leads to a small cluster of cables.

The first group contains three general briefings on Egypt's strategic position:

muslim aboul rueheg chief gheit kahl sudan president smuggling general goe gaza arab egyptians cairo mubarak egyptian soliman egypt

09CAIRO231 Briefing for Secretary of State: Aboul Gheit will explain Egypt's ability to
influence regional events ...
09CAIRO722 Mubarak sees Iran's attempts to exert influence throughout the region as Egypt's primary strategic threat
09CAIRO722 Omar Soliman explained that his overarching regional goal was combating radicalism

Two cables concern plans for Egyptian succession ... and make interesting reading in the light of recent events:

ruling shura egypt officer father son elegyptian seats reform presidency succession sadat pdas presidential ndp cairo mubarak gamal

05CAIRO7782 Gamal Mubarak reviewed his father's presidential election campaign and preparations for the upcoming parliamentary elections
06CAIRO2010 public profile of Gamal Mubarak has increased ... an effort to succeed his father is moving full speed ahead

The third topic yields a single cable that does not mention Mubarak – but mentions plenty of words that might be associated with him – an example of the indirect connections topic modelling may generate.

winograd survived acid nevsun infrastructure danish determination eliezer mubarak ben labor structure jones sneh gold war olmert peretz haslund

09REYKJAVIK204 the newly arrived Danish Ambassador to Iceland discusses his time spent in Iran

Exploring further ...

Topic modelling also reveals connections of a different kind. For example, the second most prevalent topic in 06CAIRO2010 is:

including long personally led life personal image figure role work made began state leader political years close strong year
This topic leads us to ten stories of impending or recent political succession around the globe.

Thursday, 30 December 2010


The index page at now provides the briefest of instructions, and access to the topic and document maps. This blog is at

I've made the links between topics and their nearest neighbours active. Hover over the link between two topics to see the ID of a cable that supports this link, and click on the link to visit this document.

I'll add similar functionality to the document maps tomorrow - the links between documents are supported by common topics.

I also plan to provide local maps of neighbourhood of each document in the document space and tools for selecting documents by choosing combinations of topics.

Wednesday, 29 December 2010

Wikileaks Topics



This blog will document an experimental use of topic modelling to develop a site for browsing a multithreaded collection of documents. We start from a collection of documents (each viewed as a bag of words), and use Latent Dirichlet Allocation (LDA) to model the each document as a mixture of a number of topics. A topic is a probability distribution over words. Once we choose a fixed number of topics, LDA provides a set of topics and the proportions in which they should be mixed in each document to best approximate our collection.

We use the MALLET tools from UMASS to perform this analysis.

We use the Wikileaks #cablegate collection of cables as an example corpus.

This is a work in progress. We hope it will rapidly improve.

Two documents are similar if they are modelled by similar collections of topics, so we can present the structure of the collection of cables by linking each one to its nearest neighbour. You can access the source of each document by clicking on the node that represents it.

Just as two documents can be linked by a common topic, two topics can be linked by the documents they have in common. So we can also present the structure of the set of topics inferred by LDA in a similar (actually dual) fashion. You can inspect the most frequent words in each topic by hovering over the node that represents it.

We are now working on linking these two views.

We are eager to have feedback on the interface and how it can help to discover information in a previously unconnected collection of documents.