— In the News — CERN makes public first data of LHC experiments CERN recently launched its Open Data Portal, which makes data from real collision events produced by LHC experiments available to the public for the first time. With good documentation and an interactive web-based visualization tool, this high-profile dataset is now easy to explore. Google, Stanford build hybrid neural networks that can explain photos Two groups of scientists, working independently, have created artificial intelligence software capable of recognizing and describing the content of photographs and videos with far greater accuracy than ever before. Here's a high-level view of how it works. How to publish better science through better data Scientific Data and Nature recently hosted an event that explores how different stake holders can collaborate with researchers to publish better science through better data management. Here's a great write-up of the event with links to videos coming soon. Building a complete Tweet index "Today, we are pleased to announce that Twitter now indexes every public Tweet since 2006." That's a lot of data. This post by Twitter's engineering team describes how they built the service that indexes half a trillion documents and serves queries with an average latency of under 100ms. If you like to build things, this is a good read with links to details.

— Tools and Techniques — D3 Deconstructor D3 Deconstructor is a Google Chrome extension for extracting data from D3.js visualizations. Along with the data, D3 Deconstructor also extracts the visual attributes for each element in a D3 visualization, such as position, width, height, and color. From there, the data can be saved as JSON or CSV. Synaptic - The javascript neural network library Synaptic is a javascript neural net library. It includes a few built-in architectures like multilayer perceptrons, multilayer long-short term memory networks (LSTM), and a trainer capable of training any given network. There are demos here and a Neural Net 101 for people just getting started with Neural Nets. Code and documentation are available at https://github.com/cazala/synaptic. Personalized Recommendations at Etsy The Etsy engineering team maintains the Code As Craft blog where they write about "Making a living with a craft we love: software." Their latest post is a detailed look into how they use data to create a recommendation system for ecommerce.

— Resources — Data.TheThirdPole — Third Pole Very well organized resource for finding datasets that are related to water in Asia. While that, in itself, may only be interesting to a niche group of data users, this site is worth exploring for its beautiful organizational scheme that makes data discovery and access super easy.