Yes, there is baseball in Austria. I play for a team in the far east of Austria and with a baseball team there comes a homepage. My friend had to spend a significant amount of time to update our website after our games on the weekend. But it always bothered me because the information is out there. Why should it be a manual job then? On baseballaustria.com the up-to-date standings, results and schedules are available. Also for our slowpitch softball team, everything we need can be found on abbqs.at. So I set out to automate the updating of our own website: Crazy Geese Rohrbach, that is.

(Note: if you are interested in all of the scripts and the code with full documentation; I uploaded a repository to GitHub. You can find it here: Update WordPress from R)

Post to WordPress directly from R

I don’t know a lot about web development. So I used WordPress to create our website. Which now comes in handy. There is a great package for R called knitr. knitr is great to automate dynamic reporting. But here I just needed one function from knitr. knit2wp makes it really easy to post R Markdown files to a WordPress blog or website. So if you have your R Markdown file ready and it produces the (HTML) output you want, you need only two simple lines.

First you set your WordPress credentials as global options. This is as easy as this:

options(WordPressLogin = c(<yourusername> = '<yourpassword>'), WordPressURL = 'https://www.<yourwordpressblog.com>/xmlrpc.php')

After that we can already post the output of our R Markdown file with this line:

knit2wp("<your_r_markdown_file.Rmd>", title = '<your title>', publish = TRUE, action = "editPost", postid = <yourpostid>)

All done! This function takes in your R Markdown file (I provide it with the full path to the file, just to be sure) as the first argument. With the „title“ argument you put in the title that the post or page should get. „publish = TRUE“ makes sure that the post is immediately published. You can put this to FALSE if you want to further edit it in your WordPress interface. „action = „editPost““ ensures that the existing post or page is edited and not a new one is created. Because of that you also have to include the post id with the „postid“ argument. You can find your post id when you edit a post. It will show up in the url as something like this „post=123“, so 123 would be the post id.

Although the „action“ argument has the value „editPost“ this doesn’t mean that it has to be a blog post that you are updating. It can also be a page. When I first tried it I got confused by this. What I’m doing throughout the whole code is to always edit pages, not posts.

Nice functions to work with data from the web

That’s easy enough. But in the R Markdown files I also use some cool functions I would like to share. I will focus on the schedule of the baseball league here. It is basically the same procedure with the standings and the softball league. The name of the baseball league is RLO. So I will name my variables accordingly.

To download the web page (or better to get the response from the web server) with the standings and schedule I use the GET function from the httr package. After that I use the xml2 and rvest packages to read out the content I want. With read_html I transfer the response into an xml document that is accessible with the html_nodes function. With html_nodes I look for nodes with the class „table“. There are two tables on the web page. The schedule is the second one. So I add an [2] to directly get what I need. Now our object schedule has the class xml_nodeset, which is still not very useful for working with it in R. So we use the really nice function html_table to transform it into a list. I sometimes get seriously confused with lists though. But the data.table package has a really nice function called rbindlist. rbindlist turns lists into data.tables (which are always also data.frames). And there it is, now we have a nice data.table imported from a web page. This is all done with these 5 short lines:

rlo <- GET("http://baseballaustria.com/regionalliga-ost/") rlo <- read_html(rlo) schedule <- html_nodes(rlo, "table")[2] schedule <- html_table(schedule) schedule <- rbindlist(schedule)

After that I use some common data wrangling functions from the data.table and dplyr packages. If you don’t know these packages I seriously suggest looking into them. There are many great tutorials out there for them.

mgsub: Multiple gsub = awesome

I also want to mention the mgsub package. gsub in itself is already awesome. But I often have the need to replace more than one string at a time. In our example of the baseball schedule I wanted to shorten some team names. In this example I only replace two strings which makes this seem overkill but let’s say we want to replace 5 strings at a time. Then I personally think this gives us more overview than 5 separate lines with gsub. (I’m happy about other suggestions!) Here it is:

schedule <- as.data.table(lapply(schedule, function(x) { mgsub(x, c("Rohrbach Crazy Geese", "Schwechat Blue Bats"), c("Crazy Geese", "Blue Bats")) }))

At the heart of it there is the mgsub function. It takes in a character vector, a pattern to look for and it’s replacement. Notice that it replaces in order. So the first string from the pattern is replaced by the first string in the replacement. Because mgsub takes in only vectors, I wrapped the whole thing into a lapply function. Finally I transform it back into a data.table. (Because frankly I love data.table)

If you want to stay updated on what I’m up to, please use the following form to sign up to my mail ist:

[wpforms id=“393″]

Pimp your tables in R Markdown with kable and kableExtra

With kable you can very easily create HTML tables in R (Markdown). kableExtra adds to this so you can style your HTML table. Here is the approach I went with:

blue_rows <- which(schedule$Spielort %in% "Rohrbach") schedule_format <- kable(schedule, align = "c")) %>% kable_styling(bootstrap_options = c("responsive")) %>% row_spec(blue_rows, color = "#01023C")

I wanted to do some simple things here: First to produce a responsive HTML table, center all columns and color every game that is played on our home field in blue. To accomplish this I first created a vector called blue_rows. I used the which function to determine which games are played at home (our home field is in Rohrbach and Spielort means exactly this). After that I use the kable function to define which table needs to become an HTML table and with the align argument I set all columns centered with „c“. Here you could also use a vector to set individual columns. Then I use a pipe (%>%) from the dplyr package and the function kable_styling to make it a responsive table. Finally, I use row_spec. It takes in the vector I created before and adds the color these lines should have.

In the R Markdown file I then close this chunk and set it to „include = FALSE“, so the code doesn’t show up in the output. Now I write a new chunk that simply shows the created table schedule_format. And here is how this looks: Schedule of the Rohrbach Crazy Geese

I hope this was of help for you. As I mentioned earlier you can find all the code in my GitHub repository. If you have feedback or have questions contact me anytime: bernd [at] berndschmidl.com

Cheers

Bernd

Update

I got even more lazy. With the scripts above I still had to remember to run the script that calls the R Markdown scripts. But I don’t trust myself to remember that. So I wrote a script that is executed every time I start my computer. I’m using Windows at the moment. So I wrote this little batch file and placed it in the startup folder. So simply take this code, put it in a .bat file and add it to the startup folder. You can find your startup folder by hitting Win+R and typing shell:startup. Here is the code:

set "your_path=<your_actual_path_to_R>" set PATH=%PATH%;%your_path% set PATH R CMD BATCH <path_to_script> del ".RData"

On the first line you need to find out in which folder your R.exe is. Then replace the <your_actual_path_to_R> with the path where R is located on your machine. Replace <path_to_script> with the path to the R script that posts to the WordPress website. The .RData file gets created every time the script runs. So at the end I delete it. Now it doesn’t stay in the startup folder.

If you want to stay updated on what I’m up to, please use the following form to sign up to my mail ist:

[wpforms id=“393″]

Gefällt mir: Gefällt mir Wird geladen...