Feb 26, 2014

Fetching data from HTML source

Parsing html to get the data we need can be very frustratingLucky, Mathematica has a powerful hmtl import function, you can import raw html data into several different formats. In my experiences, import html as "XMLObject" is usually the best way to go. 
Here is an example: OSCAR Nominees:
xml = Import["http://oscar.go.com/nominees", "XMLObject"];   
We are interested in the list of nomineed films

body = Cases[xml, XMLElement["div", {"class" -> "nominee-by-film"}, ___], Infinity];
Extract titles:
title = Cases[body, XMLElement["span", {"class" -> "title"}, value_] :> value, Infinity] 
Extract the number of nominees:
nominee =
   XMLElement["h1", {"class" -> "numberOfNominations"}, value_] :>
    StringCases[value, x : NumberString :> ToExpression[x]], Infinity] ;
Put these two together:
result = Sort[Transpose[{title, Flatten@nominee}], #1[[2]] > #2[[2]] &]
Let's draw a graph to show the top 10 of the most nomineed films:
oscar = Import["http://www.oscars.org/awards/academyawards/about/awards/images/side_oscar.jpg"];
BarChart[result[[1 ;; 10, 2]],
 ChartLabels -> Placed[Flatten@result[[1 ;; 10, 1]], After],
 BarOrigin -> Left, Background -> LightBlue, ChartElements -> {oscar, {1, 1}},
  Axes -> None, LabelStyle -> {Bold, Darker@Blue, 14}] 

For this particular example, you can also try to get the same information directly from WolframAlpha.

Related post: A discussion on Mathmeatica Stackexchange

Feb 19, 2014

Simple Guide on Geospatial Coordinates Transformation with Mathematica

A few questions on geospatial coordinates transformation have shown up in Mathematica.Stackexchange. Here is a very brief summary.

In US, you probably likely deal with two projection systems: State Plane Coordinates System and UTM.

1. State Plane Coordinates System
In the U.S. State Plane Coordinate System (SPCS), the transverse Mercator projection is used for states that are long in the north-south direction, a Lambert conformal conic projection is used for states that extend in the east-west direction, and the oblique Mercator projection is used for Alaska.

In GeoProjectionData, SPCS83IN01 and SPCS83IN02 represent Indiana Steate Plane east zone and west zone. SPCS83TX01 ~ SPCS83TX05 represent 5 zones from north to south in Texas. Tennessee has only one zone: SPCS83TN00. Here is an online interactive map on SPCS.

There are also SPCS27 series, which are based on NAD27 datum, however, it is quite rare to get the data in the old coordinate system.

One common mistakes is usually caused by the unit: meters vs feet. In Mathematica, the coordinate is calculated in meters, the data you get is probably in feet.

Related posts on stack exchange convert spcs to (lat, lon)convert (lat, lon) to spcs

2. UTM
Universal Transverse Mercator (UTM) coordinate system divides the Earth into sixty zones: UTMZone01 ~ UTMZone60.

In Mathematica 9, there is a problem with UTMZone data:
{"TransverseMercator", {"Centering" -> {0, -87},  "CentralScaleFactor" -> 1, "GridOrigin" -> {0, 0}, "ReferenceModel" -> "WGS84"}}
The scale factor: 0.9996 and the grid origin: {500000,0} shall be specified for coordinate transformation: 
 GeoPosition[{39.162147, -86.529045}, "WGS84"], {"UTMZone33",
  "CentralScaleFactor" -> 0.9996, "GridOrigin" -> {500000, 0}}]
Related posts on stack exchange convert between (lat, lon) and UTM

This problem is fixed in next version of Mathematica.