Parsing html to get the data we need can be very frustrating. Lucky, Mathematica has a powerful hmtl import function, you can import raw html data into several different formats. In my experiences, import html as "XMLObject" is usually the best way to go.
Here is an example: OSCAR Nominees:
xml = Import["http://oscar.go.com/nominees", "XMLObject"];We are interested in the list of nomineed films
body = Cases[xml, XMLElement["div", {"class" -> "nominee-by-film"}, ___], Infinity];Extract titles:
title = Cases[body, XMLElement["span", {"class" -> "title"}, value_] :> value, Infinity]Extract the number of nominees:
nominee =Put these two together:
Cases[body,
XMLElement["h1", {"class" -> "numberOfNominations"}, value_] :>
StringCases[value, x : NumberString :> ToExpression[x]], Infinity] ;
result = Sort[Transpose[{title, Flatten@nominee}], #1[[2]] > #2[[2]] &]Let's draw a graph to show the top 10 of the most nomineed films:
oscar = Import["http://www.oscars.org/awards/academyawards/about/awards/images/side_oscar.jpg"];
BarChart[result[[1 ;; 10, 2]],
ChartLabels -> Placed[Flatten@result[[1 ;; 10, 1]], After],
BarOrigin -> Left, Background -> LightBlue, ChartElements -> {oscar, {1, 1}},
Axes -> None, LabelStyle -> {Bold, Darker@Blue, 14}]
For this particular example, you can also try to get the same information directly from WolframAlpha.
Related post: A discussion on Mathmeatica Stackexchange
3 comments:
Welcome back!
This new post really came after a long while.
Please continue to keep this blog going!
nice information
SoftwareCorner
Thank you for sharing with us.
Mathematica
Post a Comment