Oct 20, 2009

Wikipedia Page Analysis

Wikipedia has lots of scientific information, however, due to its nature, it is still not considered as a research resource.  This doesn’t mean it has to be ignored. I have checked some pages related with various topics in GIS field. Most of them are well-written, the information are actually quite accurate, several contributors are the professionals in the field. In this post, I like to check some metadata information of  “Mathematica” Page on Wikipedia, it may gives us some ideas about its quality.

Tools we need: Mediawiki API and Mathematica. There are plenty examples on how to use Mediawiki api. Basic procedure is to use Import[queryurl,”XML”], then parse xml to get the information we need.

Page revision history:

(* import  contributor and timestamp *)

url = "http://en.wikipedia.org/w/api.php?action=query&prop=revisions&\
titles=Mathematica&rvprop=user|timestamp&rvlimit=500&redirects$rvuser&\
format=xml";

xml = Import[url, "XML"];
rawdata= Cases[xml, XMLElement["rev", w_, _] :> w, Infinity];
data = {"user", "timestamp"} /. rawdata;

 

1

 

2

This page is constantly revised, we probably can assume the information on “Mathematica” page is up-to-date.

The information on the contributors is also interesting.

3 

We can dig out more information on the contributors:

(* import paged edited by each user *)

userpages[usr_] :=
  Module[{url, uxml, udata, unicase},
   url = "http://en.wikipedia.org/w/api.php?action=query&list=\
usercontribs&uclimit=500&format=xml&ucuser=" <> usr;
   uxml = Import[url, "XML"];
   udata = Cases[uxml, XMLElement["item", w_, _] :> w, Infinity];
   unicase = DeleteCases[Union["title" /. udata ],
     x_ /; (StringMatchQ[x, "User talk:" ~~ __] || StringMatchQ[x, "Talk:" ~~ __] || StringMatchQ[x, "User:" ~~ __])]; Map[usr -> # &, unicase]];

 

4

The common pages edited by these top5 contributors:

 5 

From the pages they have edited, they have worked on several topics closely related with Mathematica. This looks good, we may say they probably know what they are doing.

Update:

Download Wikipedia Notebook for the details.

8 comments:

Travis said...

Could you explain a bit about how to make the graphs / plots?
ListPlot[data] doesn't do anything for me :-(

Anonymous said...

notebook available download now.

Travis said...

Thanks mate - you're a champ! I didn't know about Tally or DateList...

Unknown said...

I just wanted to say I really enjoy your blog. Lot's of useful stuff here, and I really like how you explain things point-by-point. (Not to mention posting notebooks!) Keep it up!

Do you take requests?

Anonymous said...

There is one bug...

Consider the user has got spaces in the name ;)

Kenan Sulayman said...

This fixes the code:
@ In[122]
=====================
userurl[usr_] :=
Module[{url, uxml, udata, unicase},
url = "http://en.wikipedia.org/w/api.php?action=query&list=\
usercontribs&uclimit=500&format=xml&ucuser=" <>
StringReplace[usr, " " -> "%20"]; uxml = Import[url, "XML"];
udata = Cases[uxml, XMLElement["item", w_, _] :> w, Infinity];
unicase =
DeleteCases[Union["title" /. udata ],
x_ /; (StringMatchQ[x, "User talk:" ~~ __] ||
StringMatchQ[x, "Talk:" ~~ __] ||
StringMatchQ[x, "User:" ~~ __])]; Map[usr -> # &, unicase]];
=====================

Anonymous said...

thanks

Luigi Assom said...

First of all thanks a lot for this very useful blog!
I'd like to get in contact with you.
With a friend, PhD University of Mainz, we are developing a network model applied to worldwide food recipes, and would like to make it an application.
Looking at your posts it looks like with Mathematica we could already structure databases and integrate infos with APIs as you did.
We'd like to ask you for suggestions, or eventually to involve you in the project if you are interested!!
I hope to get in touch with you soon, please write me an email at luigi.assom at gmail dot com
I will post you back with more details and visions about!!
Many thanks, Luigi