Posts Tagged ‘international comparisons’
World Bank opens access to WDI
The World Bank has opened access to a flagship dataset that was mostly closed. Until recently, only a fraction of the thousand-plus data series that comprise the World Development Indicators (WDI) were available to non-paying customers in advanced economies. As of April 10, 2010, the World Bank has opened access to the complete dataset.
Trade Cartograms at UseR! 2010
A bit of shameless self-promotion! I will be presenting my work on trade cartograms at UseR! 2010. I’ll update this with a link to the abstract when it is listed there.
Earlier this year I posted on the use of cartograms to visualize dyadic trade flows.
About UseR!
useR! 2010, the R user conference, will take place at the Gaithersburg, Maryland, USA campus of the National Institute of Standards and Technology (NIST) from 2010-07-21 to 2010-07-23. Pre-conference tutorials will take place on July 20.
The conference is organized by NIST and funded by the R Foundation for Statistical Computing.
Following the successful useR! 2004, useR! 2006, useR! 2007, useR! 2008, and useR! 2009, conferences, the conference is focused on:
- R as the `lingua franca’ of data analysis and statistical computing,
- providing a platform for R users to discuss and exchange ideas how R can be used to do statistical computations, data analysis, visualization and exciting applications in various fields,
- giving an overview of the new features of the rapidly evolving R project.
As for the predecessor conferences, the program consists of two parts:
- invited lectures discussing new R developments and exciting applications of R,
- user-contributed presentations reflecting the wide range of fields in which R is used to analyze data.
A major goal of the useR! conference is to bring users from various fields together and provide a platform for discussion and exchange of ideas: both in the formal framework of presentations as well as in the informal part of the conference in Gaithersburg.
Prior to the conference, on 2010-07-20, there are tutorials offered at the conference site. Each tutorial has a length of 3 hours and takes place either in the morning or afternoon.
Export Trade Clusters
This post, as with the prior ones on trade clusters, aims to help visualize patterns of trade in the OECD from 50 years of partner trade statistics. The data is rich, meaning we should be able to develop rich intuition by exploring it visually.
These slides follow the method laid out in Jong-Eun Lee, “Two Maps for the World’s Trade Integration,” Applied Economics Letters, 11:4 (2004). All computations were performed in R.
Unilateral trade clusters using raw import flows
This set of dendrograms, again, is based on raw partner import flows from OECD. The dendrograms show complete linkages (all countries in a cluster exceed the threshold value for mutual trade flows), but the dyad is measured by the greater of the two trade flows.

This gallery shows an annual series of dendrograms using that dataset back to 1993.
Bilateral import clusters using raw trade flows
As promised, here are a new round of dendrograms using OECD trade data as a reciprocal distance measure among countries reporting. In trade, relationships matter, and these dendrograms show which relationships matter the most. Clusters are drawn by complete linkages, using the lesser of the two pairwise trade flows (greater notional distance).

The important thing about these dendrograms, relative to the ones posted the last few days, is that they take the raw trade flows themselves–not normalized for population, or total imports, or GDP–as the unit of analysis. This is actually a much more useful picture of trade than the normalized flows, because is suggests which relationships ought to draw the most water in trade politics.
The cluster algorithm isn’t a perfect way to capture the data; a few outliers can skew the presentation of the data somewhat. But it is the only good way I have seen to present cross sections of country-dyad data at a glance. It’s a very useful tool for presentation of descriptive statistics on international trade.
Unilateral import clusters in international trade
As with yesterday, these graphics depict complete clusterings in international trade, treating the partner country’s share of total home country imports as a raw distance measure. The greater the share, the closer the two countries are. For visual clarity, I have used logarithmic scales; so the scale at left doesn’t have any concrete meaning.

The clusters in this dendrogram indicate complete linkages, meaning that all of the country dyads in each cluster share a unilateral import concentration greater than the threshold value for the cluster. At 100% concentration, no country has a partner providing 100% of imports; so all the countries are separate at the bottom of the scale. At 0% concentration, countries all have at least some trade with one another; so one giant supercluster exists at the top of the scale.
Where do spam statistics come from?
Microsoft’s Security Intelligence Report seems to be the source of commonly quoted statistics about spam’s share of internet traffic. The ominous 97% figure is the fraction of email messages that are blocked by automated spam filters.
The point of the statistic is not that spammers have overwhelmed the Internet’s fragile bandwidth; but rather that using email without enterprise-class spam filters is all but impossible. Spam is generated in huge volumes to overwhelm spam filters, and it coevolves with spam filtering software.
According to a recent Cisco report, email and Web traffic account for somewhat less than 1/3 of total IP traffic. (That report includes projections out to 2013 and annualized growth forecasts.) So spammers aren’t going to break the Internet; rather, the aggressive growth of video, gaming, mobile data usage, and file sharing are changing the way network administrators monitor and shape traffic.
Two caveats to the Microsoft 97% spam statistic:
- It is published by a vendor.
- Other spam filters are not included in the survey.
For more on the description of the filters and the methods, you can visit the site and download the whole report. http://www.microsoft.com/security/portal/Threat/SIR.aspx
Also Telegeography has excellent free resources on international bandwidth and data traffic.
Numbers and Units in Political Risk Analysis
A major global vendor of political risk analytics gave a talk at the Fletcher School today. They rate political risks according to a 5-point scale, with copious documentation of how countries’ political life is compressed down into this one metric. As a colleague noted, nobody uses the political risk metric for anything, because it’s such a vague number. I disagree. At the end of the day, people would much prefer to use a simple, bad heuristic to complex information that doesn’t permit quantitative comparison. Many people will categorize countries on the basis of a 5-point scale largely because of the analyst’s good reputation.
Reputable academics use simple indices in quantitative studies all the time explicitly because someone has already standardized the observations. Social science variables are cost-prohibitive to measure otherwise. This phenomenon is especially true for studies that compare countries’ performance on democracy, corruption, and many other socially defined, abstract categories. The big danger is that, due to competitive business dynamics, the leading provider will define political risk analysis, and that the whole industry will coordinate its expectations and analytic frames around that algorithm.
Worldmapper
This is another great site for cartograms. The Worldmapper site has a wealth of world cartograms based on country statistics. For those of you that don’t remember, a cartogram begins with a base map, e.g., the Mercator projection of the earth. The cartogram distorts the map such that country boundaries remain contiguous, but the area of each country reflects the size of the statistic in question.
What this enables us to do is to make sensible comparisons of a huge variety of data at a glance. Whereas it would be difficult to absorb a table of statistics for every country in the world normalized by country size or by population, the world map gives a very rough, first cut in seconds.














































