What is Web

The World Wide Web (commonly abbreviated as "the Web") is a very large set of interlinked hypertext documents accessed via the Internet.

The World Wide Web enabled the spread of information over the Internet through an easy-to-use and flexible format. It thus played an important role in popularizing use of the Internet. The Internet consists of a worldwide collection of computers and sub-networks exchanging data using wires, cables and radio links, whereas the World Wide Web is a huge set of documents, images and other 'resources' linked by an abstract 'web' of hypertext links and URLs.

.Fig.1-Versions of Web 1.0, Web 2.0 & Web 3.0

Web 1.0: In web 1.0 it is webmaster responsibility to keep the website updated and provide useful and informative content to the end user.

Examples of web 1.0 are Double click, Kodak express, personal website etc.

Web 2.0 is more than search for information. User is searching for information and experience. Users want to share their opinion, thoughts on his/her interested topic. Web 2.0 contains more organized and categorized content and signifies the phenomenal change that gradually overtook the ‘web-as-an-information-source’ Web 1.0 to ‘web-as-participation-platform’ Web 2.0. Flckr, bloglines, technorati etc. are some of the Web 2.0 applications.

Web 3.0 is defined as the creation of high-quality content and services produced by gifted individuals using Web 2.0 technology as an enabling platform. Web 2.0 services like digg and YouTube evolve into Web 3.0 services with an additional layer of individual excellence and focus.

Web 2.0-Emerged trend from Web Services

Mashup’s and Web 2.0 were the emerged trends from web services.

Web Services

Web services describes a standardized way of integrating Web-based applications using the XML, SOAP, WSDL and UDDI open standards over an Internet .

XML is used to tag the data, SOAP is used to transfer the data, WSDL is used for describing the services available and UDDI is used for listing what services are available. Used primarily as a means for businesses to communicate with each other and with clients, Web services allow organizations to communicate data without intimate knowledge of each other's IT systems behind the firewall.

XML

Extensible Markup Language, a specification developed by the W3C. XML is a pared-down version of SGML, designed especially for Web documents. It allows designers to create their own customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations.

SOAP

Simple Object Access Protocol, a lightweight XML-based messaging protocol used to encode the information in Web service request and response messages before sending them over a network. SOAP messages are independent of any operating system or protocol and may be transported using a variety of Internet protocols, including SMTP, MIME, and HTTP.

WSDL

Web Services Description Language, an XML-formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI, an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM.

UDDI

Universal Description, Discovery and Integration. It is a Web-based distributed directory that enables businesses to list themselves on the Internet and discover each other, similar to a traditional phone book's yellow and white pages.

Mash-ups & WEB 2.0

Mash-up

Mash-ups mix at least two different services from disparate, and even competing, Web sites.

A mash-up, for example, could overlay traffic data from one source on the Internet over maps from Yahoo, Microsoft, Google or any content provider. This capability to mix and match data and applications from multiple sources into one dynamic entity is considered by many to represent the promise of the Web service standard.

The term mash-up comes from the hip-hop music practice of mixing two or more songs.

With so many businesses and software companies building services on top of platforms, many expect to see the World Wide Web of today (called Web 1.0) transform into a full-fledged computing platform serving Web applications i.e., World Wide Web as a platform is Web 2.0.

Web 2.0, a perceived second generation of web development and design, that aims to facilitate communication, secures information sharing, interoperability, and collaboration on the World Wide Web.

Web 2.0

Web 2.0 includes RIA (Rich internet applications) that is for share their experience online. SOA (Service oriented architecture) is also used in Web 2.0. Flckr, bloglines, technorati etc. are some of the Web 2.0 applications that allow user to share their experiences and allow to edit and provide drag and drop facilities.

RIA (Rich internet applications)

Ajax and flash are key technologies that are used to make rich internet applications.

Ajax (Asynchronous JavaScript and xml) is basically used for overall application usability. Whereas flash is used for better presentation and user experience.

With the help of these applications user can drag and drop their interesting and informative content, create a link, edit and tag these links online and can share at the same time.

Blogging is also one of the best example of Web 2.0, where user can write something about their interests and linking to something that they find interesting, share via RSS feeds (really simply syndication).

Although most of the blogs were manually updated, but now some RIA is developed to maintain it easily to automate the maintenance of such sites, and the use of some sort of browser-based software is now a typical aspect of "blogging".

RSS - Really Simple Syndication and Rich Site Summary. RSS is an XML-based format for content distribution. Webmasters create an RSS file containing headlines and descriptions of specific information. While the majority of RSS feeds currently contain news headlines or breaking information the long term uses of RSS are broad.

RSS is a defined standard based on XML with the specific purpose of delivering updates to web-based content. Using this standard, webmasters provide headlines and fresh content in a succinct manner. Meanwhile, consumers use RSS readers and news aggregators to collect and monitor their favorite feeds in one centralized program or location. Content viewed in the RSS reader or news aggregator is place known as an RSS feed.

Web 3.0:Web Sites as Web Services

Web has terabytes of information available to humans, but hidden from computers that are difficult for machines to process. Web 3.0, which is likely to be a pre-cursor of the real semantic web, is going to change this



Fig.2-Web 3.0 will likely plug into your individual tastes and browsing habits.­

Layers of Web 3.0

Web 3.0 divided into three (and a half) distinct layers.

API services form the foundation layer. These are the raw hosted services that have powered Web 2.0 and will become the engines of Web 3.0 — Google’s search and AdWords APIs, Amazon’s affiliate APIs, a seemingly infinite ocean of RSS feeds, a multitude of functional services, such as those included in the StrikeIron Web Services Marketplace, and many other examples. Some of the providers, like Google and Amazon, are important players. One of the most significant characteristics of this layer is that it is a commodity layer.

Aggregation services form the middle layer. These are the intermediaries that take some of the hassle out of locating all those raw API services by bundling them together in useful ways. Obvious examples today are the various RSS aggregators, and emerging web services marketplaces like the StrikeIron service. There will be some lucrative businesses operating in this layer, but it’s not where most of the big money will be made.

Application services form the top layer. This is where the biggest, most durable profits will be found. These will not be like the established application categories we are used to, such as CRM, ERP or office, but a new class of composite applications that bring together functionality from multiple services to help users achieve their objectives in a flexible, intuitive and self-evident way.

Serviced clients are the ‘and-a-half’ layer. There is a role for client-side logic in the Web 3.0 landscape, but users will expect it to be maintained and managed on their behalf, which is why to call these clients ’serviced’. Whether those clients are based on browser technology or on Windows technology is moot point that shall also be returning to. After all, everyone will want to know what role Microsoft might play in Web 3.0.

In 'Web 3.0' web sites will be transformed into web services and will expose their information to the world. The transformation will happen in one of two ways. Some web sites will follow the example of Amazon, delicious and Flickr and will offer their information via a REST API. Others will try to keep their information proprietary, but it will be opened via mashups created using services like Dapper, Teqlo and Yahoo! Pipes.

The net effect will be that unstructured information will give way to structured information

Unstructured to Structured - Web Scraping

Web Scraping is essentially reverse engineering of HTML pages. It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information. The actual data is mingled with layout and rendering information and is not readily available to a computer. Scrapers are the programs that "know" how to get the data back from a given HTML page. They work by learning the details of the particular markup and figuring out where the actual data is.

Fig.3-Web Scraper

Dapper, Teqlo, Yahoo! Pipes - scraping technologies

Yahoo! Pipes, a new app from Yahoo! focused on remixing RSS feeds. Teqlo, has recently launched focuses on letting people create mashups and widgets from web services and RSS. Before both of these, Dapper launched a generic scraping service for any web site. Dapper is an interesting technology that facilitates the scraping of the web pages, using a visual interface.

It works by letting the developer define a few sample pages and then helping her denote similar information using a marker. This looks simple, but behind the scenes Dapper uses a non-trivial tree-matching algorithm to accomplish this task. Once the user defines similar pieces of information on the page, Dapper allows the user to make it into a field. By repeating the process with other information on the page, the developer is able to effectively define a query that turns an unstructured page into a set of structured records.

The net effect - Web Sites become Web Services

Fig.4-The net effect of apps like Dapper and Teqlo

So bringing together Open APIs (like the Amazon E-Commerce service) and scraping/mashup technologies, gives us a way to treat any web site as a web service that exposes its information. The information, or to be more exact the data, becomes open. In turn, this enables software to take advantage of this information collectively. With that, the Web truly becomes a database that can be queried and remixed.

Scraping technologies are actually fairly questionable. In a way, they can be perceived as stealing the information owned by a web site. The whole issue is complicated because it is unclear where copy/paste ends and scraping begins. It is okay for people to copy and save the information from web pages, but it might not be legal to have software do this automatically. But scraping of the page and then offering a service that leverages the information without crediting the original source, is unlikely to be legal.

But it does not seem that scraping is going to stop. Just like legal issues with Napster did not stop people from writing peer-to-peer sharing software or the more recent YouTube lawsuit is not likely to stop people from posting copyrighted videos. Information that seems to be free is perceived as being free. The opportunities that will come after the web has been turned into a database are just too exciting to pass up.

Web Sites as Web Services

There are several good reasons why Web Sites (online retailers in particular), should think about offering an API. The most important reason is control. Having an API will make scrapers unnecessary, but it will also allow tracking of who is using the data - as well as how and why. Like Amazon, sites can do this in a way that fosters affiliates and drives the traffic back to their sites.

The old perception is that closed data is a competitive advantage. The new reality is that open data is a competitive advantage. The likely solution then is to stop worrying about protecting information and instead start charging for it, by offering an API. Having a small fee per API call (think Amazon Web Services) is likely to be acceptable, since the cost for any given subscriber of the service is not going to be high.

Fig.5- Web Sites as Web Services

As more and more of the Web is becoming remixable, the entire system is turning into both a platform and the database. Yet, such transformations are never smooth. For one, scalability is a big issue. And of course legal aspects are never simple. But it is not a question of if web sites become web services, but when and how. APIs are a more controlled, cleaner and altogether preferred way of becoming a web service. However, when APIs are not available or sufficient, scraping is bound to continue and expand. As always, time will be best judge.

Conclusion

Web 2.0 uses the Internet to make connections between people; Web 3.0 will use the Internet to make connections with information. The Semantic Web, Web 3.0 will be able to interpret user input and tailor the Web surfing experience to make it more relevant and personal.

'Web 3.0' will lead major web sites transformed into web services - and will effectively expose their information to the world.

References