Aperture a Java framework for getting data and metadata

Project name From Merriam-Webster Online: Main Entry: ap·er·ture (sounds like this)

Pronunciation: 'ap-&(r)-"chur, -ch&r, -"tyur, -"tur

Function: noun

Etymology: Middle English, from Latin apertura, from apertus, past participle of aperire to open

an opening or open space : HOLE a : the opening in a photographic lens that admits the light

b : the diameter of the stop in an optical system that determines the diameter of the bundle of rays traversing the instrument

c : the diameter of the objective lens or mirror of a telescope

News

July 1, 2010 Aperture 1.5.0 released!

An evolutionary bugfix and improvement release. It brings new and improved functionality for getting data from OpenXML and mbox files, as well as initial support for direct crawling of remote Samba without having to mount them first. Moreover, intensive usage has revealed some loopholes in our email processing code that resulted in loss of email content and attachments - all have been fixed. This rounds up as the best Aperture release since the last one. Details in release notes

December 31, 2009 Aperture 1.4.0 released!

The 1.4.0 release is a proof that the codebase is really starting to mature. Previous releases brought revolutions (RDF2Go, NIE, maven). This one reaps the rewards of the work done in the past and concentrates on evolutionary improvements and bugfixes. Details in release notes

August 11, 2009 Aperture 1.3.0 released!

This release bears the mark of Maven. We used it because the users wanted a way to cherry-pick the Aperture functionality and dependencies they need, either because they wanted a smaller footprint (aperture 1.3 brings in more than 18 MB of jars), or for legal reasons (e.g. some dependencies use the LGPL license some people don't like). Now, thanks to Maven magic you can reap direct benefits from the modular architecture of Aperture and choose exactly the mix of classes you need.

October 28, 2008 Aperture 1.2.0 released!

After three years of development Aperture is stable enough to drop the .beta suffix from the release. 1.2.0 leverages architectural improvements made in 1.1.0.beta to bring support for compressed archives and to streamline email processing. A completely new service - the DataSourceDetector allows applications to provide suggestions to users about the data sources on their desktops. A host of bugfixes and minor improvements rounds the image of the leanest and meanest version of Aperture ever made. Enjoy.

May 18, 2008 Aperture 1.1.0.beta released!

This release aims to reduce the memory footprint and increase the range of data sources explorable by Aperture. It adds support for MP3 files, vcards and mbox mailboxes. The Architecture has been extended with the concept of a SubCrawler, that can crawl DataObjects returned by other crawlers thus opening possibilities for the development of new, even better exploration components.

November 12, 2007 Aperture 1.0.1-beta released!

This release bears the mark of the Nepomuk Social Semantic Desktop - a major research initative where research institutes and commercial companies from around Europe. Aperture is used as one of the pillars of a next-generation platform that will revolutionize the way people organize and use the data stored on their computers. The input from the Nepomuk Community drove us to implement a host of new features that make Aperture more useful, more flexible and more powerful.

May 31, 2007 Aperture 2007.1 alpha 4 released!

The entire Aperture Framework has been rewritten to utilize the RDF2Go framework. It is now completely independent from the underlying RDF store. Aperture registries and factories can now be used in an OSGi environment as services. The infrastructure allows for on-the-fly deployment of new extraction components.

November 2, 2006 Aperture 2006.1 alpha 3 released!

This release adds support for crawling ical calendar files. The MIME type detection has been extended to support many more file formats. Extended the tutorials. There are numerous bugfixes and small improvements.

March 6, 2006: Aperture 2006.1 alpha 2 released!

This release adds support for crawling file systems, web sites, IMAP and Outlook mail boxes. Furthermore, the number of supported file formats has increased significantly.

Features

Crawl information systems such as file systems, websites, mail boxes and mail servers

Extract full-text and metadata from many common file formats

View files in their native applications

Ease of use: easy to learn, easy to code, easy to deploy in industrial projects

Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms

Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)

Supported File Formats

Plain text

HTML, XHTML

XML

PDF (Portable Document Format)

RTF (Rich Text Format)

Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher

Microsoft Works

OpenOffice 1.x: Writer, Calc, Impress, Draw

StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw

OpenDocument (OpenOffice 2.x, StarOffice 8.x)

Corel WordPerfect, Quattro, Presentations

Emails (.eml files)

ical files

VCARD files (.vcf)

archives (zip,tar,gz,bz2)

Crawlers

Crawlers support the extraction of information from heterogenous data sources. At the moment we support the following source types:

File Systems (local, remote, removeable media)

Websites and intranets

IMAP e-mail servers

Microsoft Outlook (alpha)

Internet Calendar (ical) files

mbox mailboxes

thunderbird addressbooks

apple addressbook

Support

At this moment the project is still in alpha stage and we provide only limited support. If you have any questions about the project, feel free to join the development mailinglist and ask us.

Development

To use Aperture in your own projects, read the wiki for information about requirements and code examples.

If you are interested in contributing, feel free to contact the project admins or join the development mailinglist. We are very interested in new extractors and other contributions including crawlers.