Burma on the Blogs: Tools to Analyze the Burmosphere
If we want to monitor and analyze blogs with a focus on issues around Burma and Burmese, we can resort to a basic set of tools that was mainly introduced for business (identifying trends and optimizing keywords), but partly also for academic research and charitable goals. One particular limitation that you encounter where ever Burma meets the Web applies of course here as well: The low relevance of the Burmese consumer market for online business and the multitude and complexity of Burma’s lingual varieties have led to a situation where Burma’s languages are systematically neglected in any content-related archiving or data mining service. Even computer fonts for Burmese non-latin writing systems are not yet fully unified, available as default installation on computers and functional on all operating systems and software solutions, like word processors and web browsers.
Burma’s lingual gate to the outer world, and at the same time a lingua franca for Burmese in exile, is English, the language of IT and the most common foreign language of the world. Indeed, it is often overlooked that most people use it as a foreign language. Implications include for example a huge variety of spelling mistakes, in particular when one’s native language belongs to an entirely different family. If you are searching blogs about “Norway” written by Burmese, you should include the variants “Noway” and “Norwe”, or, in case of “Canada”, search also for “Canayda”, “Canda” and so forth.
Insufficient language competence – both on the side of Burmese people and Western software developers – therefore constitutes the main problem when searching the Burmese blogosphere by content, and that even where text is written in what ever variety of English. That being said, I will introduce some basic tools that still might come in handy when searching general guideline data.
BlogPulse: Topics and Trends
According to their own definition, “BlogPulse is an automated trend discovery system for blogs”.
Click here to go to the trend search for the keywords that produced the graph above. You will be able to search for nested logical conditions, which can be a powerful tool to analyze blogs. A similar service is available at IceRocket, see below.
Imagine you produce two graphs for the following “Trend Search Term(s)” (see image on the right):
(Burma OR Myanmar) AND ASEAN
(Burma OR Myanmar) AND (Sanctions OR Investments)
What you will see is the possible correlation between two topics, ASEAN and the question of sanctions, both in connection to Burma. When I tried it today I noticed matching patterns. (Note that the amplitudes are not normalized; if your search terms are too different in terms of popularity, the less relevant will look much smoother.) One could therefore conclude that if ASEAN is mentioned in blog posts about Burma, the topic is quite likely about sanctions and investments. Go ahead and try yourself to find a correlation between “ASEAN” and “Human Rights” in connection with Burma.
Worth mentioning is also BlogPulse’s Conversation Tracker, which analyzes linkages between blogs. It must be amazing to watch the interconnections. However, I haven’t yet managed to dig out a conversation about Burma. (PS: Here is one for the NYT City Blog.)
A similar tool for monitoring the frequency of keywords is BlogScope. It lets you list the latest posts with an abstract and a mouse-over preview for a given search term and compares the popularity of topics over time.
BlogScope offers a Boolean Query Constructor, similar to what you know from Google’s advanced search. (I did not, however, manage to enter a query into the field. Maybe a problem with browser compatibility?)
I did a basic research of the two notorious terms “Myanmar” and “Burma” – see yourself (the graph will be updated when you view it). You even can generate the code to embed a live graph on your own website or blog (Popularity Curve, Comparison Curve and a Summary Cloud that looks pretty much like what we know as tag cloud).
As in the case of BlogPulse, the validity of the results depends considerably on the total sample. BlogScope tells us in the page footer: “Monitoring over 53.28 million blogs with 1307.00 million posts”. According to their own information they are “removing non-english content and spam posts”. This information seems not to be entirely accurate, but maybe the language is determined by how the blogs identify themselves.
Herdict: Monitoring Accessibility
So much for looking into the frequency of keywords.
Another interesting approach is to monitor the accessibility of web content, and a clever way of how to tackle this is through crowdsourcing, as chosen by Herdict. In their own words:
Herdict is a project of the Berkman Center for Internet & Society at Harvard University. Herdict is a portmanteau of ‘herd’ and ‘verdict’ and seeks to show the verdict of the users (the herd). Herdict Web seeks to gain insight into what users around the world are experiencing in terms of web accessibility; or in other words, determine the herdict.
Analyzes generated by Herdict are great to get a rough idea about what is going on in other countries. Its value for statistical evaluation, however, might be quite limited since the composition of the test samples of probed websites is certainly affected by the users’ bias in favor of problematic cases. Also, you often can only guess what are the reasons for a website’s inaccessibility – among which can be, for instance, censorship by your own government and their allies, blocking of your country’s IP-addresses on the website’s end, or simply a technical problem.
Just a Simple Search
Sometimes you just want to search the blogosphere by search terms in order to obtain a topic related entry point.
To start with the indisputable giant on the Web, there is Google Blog Search. Unfortunately, the advanced settings don’t offer any of Burma’s languages, which makes this service less global than it tries to appear. Then, there is Technorati, the trailblazer in blog search and, in it’s own words, the “leading blog search engine and directory”. Technorati provides a search engine for posts and blogs, a blog directory and own original content. I have already mentioned IceRocket, which offers a trend analysis that is very similar to what I described above for BlogPulse.
Blog research tools appear to are limited to searching through content and analyzing the frequency of keywords and the accessibility of websites. Search engines offer a basic graphical presentation of popularity along a timeline.
Honestly, I have expected more. While you find numerous articles on how to promote your blog and identify market niches, on search engine optimization and how to get the most money out of your (and other people’s) writings, for a broad analysis of blogs you will eventually fall back to the standard web tools. This is surprising because blogs offer loads of hidden meta data (and mechanisms of bilateral communication like pings and trackbacks that would be difficult, yet not impossible to monitor – learn from Feedburner how to intercept news feeds). Maybe I have missed the wheat in the chaff, but I haven’t come across figures on multilingual or multiauthor blogs, available languages (by code), frequency of tags and categories (apart from what’s site-internal), usage of multimedia content and geo-tags. You can find blogs by searching for this information, but it doesn’t seem to be compiled for a statistical evaluation.