I discovered the American Presidency Project's collection of historical party platforms over the holiday weekend, and couldn't resist diving into some simple text analysis and visualization. For example I could see how word usage by The Democratic and Republican parties changes over time:

Let's see how we can start building this visualization. Given a specific party platform page ID at presidency.ucsb.edu, import the page and pull out the main platform text:

partyPlatformImporter[pageID_] := Block[{raw, text}, raw = Import[ "http://www.presidency.ucsb.edu/ws/index.php?pid=" <> pageID, "Source"]; text = ImportString[ StringCases[raw, "<span class=\"displaytext\">" ~~ x : ___ ~~ "</span><hr noshade=\"noshade\" size=\"1\">" :> x][[1]], "HTML"]]

Import the HTML source of the index page for party platforms:

In[106]:= platformsRaw = Import["http://www.presidency.ucsb.edu/platforms.php", "Source"];

Find all tables in the source:

In[107]:= rawCases = StringCases[platformsRaw, "<table" ~~ ___ ~~ "</table>", Overlaps -> True]; In[108]:= tables = StringSplit[rawCases[[9]], "</table>"];

The Democratic and Republican platforms happens to be in the first two tables in this list, so split by row and create an Association of years and page IDs for each:

In[109]:= demRaw = StringSplit[tables[[1]], "<tr>"]; In[154]:= demPageIDs = "Democratic" -> <| StringCases[#, "<a href=\"http://www.presidency.ucsb.edu/ws/index.php?pid=" ~~ url : RegularExpression["\\d+"] ~~ "\">" ~~ text : RegularExpression["\\d+"] ~~ RegularExpression[" ?\\w*"] ~~ "</a>" :> text -> url] & /@ demRaw|> Out[154]= "Democratic" -> <|"2012" -> "101962", "2008" -> "78283", "2004" -> "29613", "2000" -> "29612", "1996" -> "29611", "1992" -> "29610", "1988" -> "29609", "1984" -> "29608", "1980" -> "29607", "1976" -> "29606", "1972" -> "29605", "1968" -> "29604", "1964" -> "29603", "1960" -> "29602", "1956" -> "29601", "1952" -> "29600", "1948" -> "29599", "1944" -> "29598", "1940" -> "29597", "1936" -> "29596", "1932" -> "29595", "1928" -> "29594", "1924" -> "29593", "1920" -> "29592", "1916" -> "29591", "1912" -> "29590", "1908" -> "29589", "1904" -> "29588", "1900" -> "29587", "1896" -> "29586", "1892" -> "29585", "1888" -> "29584", "1884" -> "29583", "1880" -> "29582", "1876" -> "29581", "1872" -> "29580", "1868" -> "29579", "1864" -> "29578", "1860" -> "29577", "1856" -> "29576", "1852" -> "29575", "1848" -> "29574", "1844" -> "29573", "1840" -> "29572"|> In[111]:= repRaw = StringSplit[tables[[2]], "<tr>"]; In[113]:= repPageIDs = "Republican" -> <| StringCases[#, "<a href=\"http://www.presidency.ucsb.edu/ws/index.php?pid=" ~~ url : RegularExpression["\\d+"] ~~ "\">" ~~ text : RegularExpression["\\d+"] ~~ RegularExpression[" ?\\w*"] ~~ "</a>" :> text -> url] & /@ repRaw|> Out[113]= "Republican" -> <|"2012" -> "101961", "2008" -> "78545", "2004" -> "25850", "2000" -> "25849", "1996" -> "25848", "1992" -> "25847", "1988" -> "25846", "1984" -> "25845", "1980" -> "25844", "1976" -> "25843", "1972" -> "25842", "1968" -> "25841", "1964" -> "25840", "1960" -> "25839", "1956" -> "25838", "1952" -> "25837", "1948" -> "25836", "1944" -> "25835", "1940" -> "29640", "1936" -> "29639", "1932" -> "29638", "1928" -> "29637", "1924" -> "29636", "1920" -> "29635", "1916" -> "29634", "1912" -> "29633", "1908" -> "29632", "1904" -> "29631", "1900" -> "29630", "1896" -> "29629", "1892" -> "29628", "1888" -> "29627", "1884" -> "29626", "1880" -> "29625", "1876" -> "29624", "1872" -> "29623", "1868" -> "29622", "1864" -> "29621", "1860" -> "29620", "1856" -> "29619"|>

Then join them together:

In[353]:= IDset = <|demPageIDs, repPageIDs|> Out[353]= <|"Democratic" -> <|"2012" -> "101962", "2008" -> "78283", "2004" -> "29613", "2000" -> "29612", "1996" -> "29611", "1992" -> "29610", "1988" -> "29609", "1984" -> "29608", "1980" -> "29607", "1976" -> "29606", "1972" -> "29605", "1968" -> "29604", "1964" -> "29603", "1960" -> "29602", "1956" -> "29601", "1952" -> "29600", "1948" -> "29599", "1944" -> "29598", "1940" -> "29597", "1936" -> "29596", "1932" -> "29595", "1928" -> "29594", "1924" -> "29593", "1920" -> "29592", "1916" -> "29591", "1912" -> "29590", "1908" -> "29589", "1904" -> "29588", "1900" -> "29587", "1896" -> "29586", "1892" -> "29585", "1888" -> "29584", "1884" -> "29583", "1880" -> "29582", "1876" -> "29581", "1872" -> "29580", "1868" -> "29579", "1864" -> "29578", "1860" -> "29577", "1856" -> "29576", "1852" -> "29575", "1848" -> "29574", "1844" -> "29573", "1840" -> "29572"|>, "Republican" -> <|"2012" -> "101961", "2008" -> "78545", "2004" -> "25850", "2000" -> "25849", "1996" -> "25848", "1992" -> "25847", "1988" -> "25846", "1984" -> "25845", "1980" -> "25844", "1976" -> "25843", "1972" -> "25842", "1968" -> "25841", "1964" -> "25840", "1960" -> "25839", "1956" -> "25838", "1952" -> "25837", "1948" -> "25836", "1944" -> "25835", "1940" -> "29640", "1936" -> "29639", "1932" -> "29638", "1928" -> "29637", "1924" -> "29636", "1920" -> "29635", "1916" -> "29634", "1912" -> "29633", "1908" -> "29632", "1904" -> "29631", "1900" -> "29630", "1896" -> "29629", "1892" -> "29628", "1888" -> "29627", "1884" -> "29626", "1880" -> "29625", "1876" -> "29624", "1872" -> "29623", "1868" -> "29622", "1864" -> "29621", "1860" -> "29620", "1856" -> "29619"|>|>

Make a list of common words we want to exclude:

In[356]:= commonwords = "America" | "American" | "Americans" | "Democratic" | "Republican" | "Administration" | "Federal" | "Government" | "government" | "programs";

And generate a WordCloud for each party in a specific year (in this case, 1960):

Row[WordCloud[ DeleteStopwords@ StringDelete[partyPlatformImporter[IDset[#party]["1960"]], commonwords], IgnoreCase -> True, ColorFunction -> ColorData[#color]] & /@ {<| "party" -> "Democratic", "color" -> "AtlanticColors"|>, <| "party" -> "Republican", "color" -> "ValentineTones"|>}]

I've attached a notebook with some more exploration. For instance, I wanted to see which words skew most strongly towards each party in a given year, so I got word counts by party for a single year. Then I selected words that appear in both platforms, where the difference in word counts is at least 10, and at least one party uses that word more than 20 times, and merge those Associations, and then plot:

See the attached notebook for further details including how to make the top figure in this post. Enjoy...