|
|
Sections
InternalUse
|
UPDATE: Projects that were succesfully completed are listed in SoCResults2005
The following projects are proposed as Open Source in the Google's Summer of Code initiative. While other projects in the same initiative have, so to say, a well defined "industrial" nature, those that we propose here have a consistent research component and will therefore require more than just implementation skills. Scientifically sound analysis, synthesis and design skills will be considered as important as those in high quality implementation. If you have notes, questions or proposals for either new projects or modification of these ones, please write to g.tummarello@deit.univpm.it and this page should be updated as soon as possible to reflect your contribution. Note 1: Although one of the core topic at Semedia is Multimedia Metadata (see for example the MPEG7AudioEnc (cache) and MPEG7AudioDB (cache) projects at SourceForge), no projects proposals are listed here with this topic as the "implementation to research" ratio which seems excessively low for the spirit of the Google sponsorship. If you're really motivated in this field, however, dont hesitate to contact us and discuss a potential multimedia metadata related projects. Note 2: We truly encourage anyone, not only interested students, to submit ideas and feedbacks that might make the projects more interesting and fruitful to the Semantic Web community. Mentorship positions for specific proposals (if it is possible to assign them outside our research institution, we'll have to check with Google about this) might become available for individuals with specific expertese in the area. Update 4th of June, Sesame suggested (and mentored) projects are available see below. Update 7th of June, more projects added Proposals: Music URI Shared identifiers for concepts and entities are key enablers of the Semantic Web. Once identifiers have been agreed upon, software agents and humans alike can exchange knowledge about resources in any conceptual domain, e.g. using RDF graphs. For this purpose, the W3C Semantic Web initiative allows physical and conceptual resources to be identifiable via URIs. While there exist domains in which assigning unique identifiers is straightforward (e.g. published books directly map to the URN:ISBN: URI Space), mapping music resources to unique identifiers proves to be a non trivial task. The challenge lies in developing an infrastructure enabling us to collapse the variety of forms a music resource may be encoded in, to a unique identifier, thereafter making it possible for a handful of semantically rich annotations to be associated with it. To augment the established concept of music-related metadata such as artist name, track name, album name, etc, a URI will enable linking to a number of meaningful, high-level semantic information, such as music genre, mood, or even pictures, reviews, comments, votes, sale offers, etc. We propose that this problem can be best addressed by combining audio fingerprinting with standard text-matching techniques, in order or utilize all possible metadata that can be extracted from an audio file. Semantic Web API abstraction project While a lot of semantic web APIs are available (Jena, Sesame, Kowari etc..) especially for the java language, there is no standard set of interfaces and wrappers so that middle ware RDF toolkits or higher level RDF based API can be built regardless of the underlying api/srdf storage. The project will certainly not start from scratch, but rather from earlier discussions and code to build upon (See jrdf, classes in the Simile projects etc..). From Schema to Java Object model:reloaded One of the fundamental issues of developing Semantic Web applications is the mapping of the model into objects native to the programming language. While Java has been a very popular language for implementation on the SW, such mapping appears to be non trivial, given Java single inheritance mechanism. The proposal is to create a class generator and update system for java that given an RDFS schema or OWL ontology will create the appropriate classes using either java reflection techniques or OO design patterns to emulate multiple inheritance and ontology constructs as needed. Goals of the project are creating something that allows maintainability of the code as the ontology partially changes. It might be of use to start from the design (or the code, licence permitting) of existing approaches (see for example Kazuki (cache)). Semantic Web Newsgroups Can we create newsgroups like system to serve semantic web annotations (RDF) instead of text messages? What would be the intelligent way of doing it, that is, what would it mean to "join a group" ? How to make so that a "post" reaches someone who is interested either in the concept described in the post itself or in somebody strongly related? Starting from the design and the tools of the RDFGrowth algorithm and the DBin.org (cache) project, we are ready to provide you with both design ideas for the protocol and an easy way to generate a client. Issues such as access control will have to be considered in the design and are expected to be of great practical interest. RDF Cryptography Once an RDF graph has been serialized to a binary file, it would be clearly straightforward to apply existing cryptographic techniques in order to provide simple "on/off" access control. In this project we would like to explore a different approach, potentially leading to new and interesting applications: cryptography at "model" level. Similar to the way RDF MSG theory, allows digital signature to cover just pieces of an RDF model, it might be conceivable that graphs could contain "parts" which are readably only by those with an appropriate access key (and yet be fully RDF compliant with any existing triple store software). How would this exactly work? Would such techniques implicitly exhibit insecure aspects? (E.g. revealing too much of the encoded content, or providing a way to attack the key used by statistical means?) RDF Textual Encoding Framework 0.1 There are reasons to believe that it would be very interesting to use Semantic Web tools and languages like RDF and OWL for tasks traditionally performed with XML mark-up. Among these, is textual encoding of literary documents and or manuscripts. (see this explorative paper, ELPUB2005). While the idea of using RDF might be very fascinating indeed, there is a huge amount of previous work and legacy standards that would simply make a standardization effort not realistic at the moment. With this project, however, the aim is to deliver a lightweight API simple GUI to allow researcher and interested individuals to experience the the idea, features and possibilities of RDF textual encoding. Among the required first steps will be a definition of a number of interesting use cases and of a lightweight but sufficiently powerful "encoding ontology" to cover them. FOAF Smusher Mentor: Jeen Broekstra (www.openrdf.org) A fundamental problem in distributed knowledge is the notion of identity: how do we decide that two objects, described in different locations, are the same or different objects? In particular in FOAF data this problem is pertinent: how does one decide that two described foaf:Person concepts are the same person? The Sesame JFoaf project (http://www.openrdf.org/issues/secure/BrowseProject.jspa?id=10020) aims to provide a set of tools and object models to make life easier for FOAF developers in aggregating, integrating, querying and manipulating FOAF data. A FOAF Smushing application would be a tremendous asset to this project. A fundamental question to be solved is: how do we decide that two entities are the same? Many heuristics exist but an application that integrates a consistent approach is missing. Apart from the theoritical problems many scalability and performance issues will have to be dealt with. SPARQL For Sesame Mentor: Jeen Broekstra (www.openrdf.org) The W3C DAWG is currently busy defining a set of protocols and a query language for the SW under the name SPARQL. An implementation of the current working drafts on top of Sesame would not only be an asset to the Sesame framework but also yield valuable feedback to the W3C working group to fine-tune the current proposals. Distributed Querying Mentor: Jeen Broekstra (www.openrdf.org) Sesame's SAIL API is an abstraction layer that lends itself well to hiding storage details from the higher functional layers. Implementation of a generic Mediator SAIL would enable distributed storage and querying for Sesame, vastly improving its potential scalability. Issues to be considered are: a. query performance and query partitioning. b. mediator genericity: can this be set up in such a way as to allow deployment in a wide variety of settings? Existing work in this area exists as an early proof of concept by a student at the Vrije Universiteit Amsterdam (documentation available) but this left many questions regarding performance unanswered, and no stable and deployable implemenation of the ideas exists. Generic RDF Browser Mentor: Jeen Broekstra (www.openrdf.org) or Semedia A Useful application on top of many RDF knowledgebases would be a RDF/RDFS/OWL-enabled browsing tool that gives a human-friendly overview of the contents of a Sesame repository, in a scalable way. This can be implemented as a JSP application but nevertheless requires significant insights in user interface design and a fair understanding of scalability issues involved. An RDF browser can be as feature-rich as time and/or money allows and as such this is not only a very useful addition to a Semantic Web toolkit, but also a fun tinkering project. Model-View-Controller for the Semantic Web Mentor: Daniel Schwabe (cache) As the Semantic Web becomes more widespread, more applications are being built upon it. A popular architecture for (traditional) web applications is the Model-View-Controller? (MVC), for which there are several frameworks available, in various platforms (e.g., Struts, Rails). It would be interesting to: 1. Have one such framework based on the Semantic Web, with persistence provided in a suitable repository, such as Sesame. Both the Model or the View, or both, could be specified in RDFS or OWL 2. Allow programming in the framework by extending some programming language (e.g, Java, Python, Ruby) in such a way that the model elements become objects in the language (similar to the "From Schema to Java Object model:reloaded" project) 3. Extend the Model with a pre-defined vocabulary that makes the specification of the navigation in the final application closer in abstraction level to the problem domain (i.e., provide a semantic specification of the navigation) The resulting environment should allow both interactive and declarative specifications of the application, and generate a running application from these specifications. Semantic Web Wiki in PHP: the Ontowiki system In contrast to existing approaches, which try to combine the Wiki and Semantic Web paradigms by integrating RDF triples into Wiki texts in a special syntax, the goal of this project is rather technologically independent and complementary to existing Wiki systems. It tries to apply the main Wiki philosophy (of making it easy to correct mistakes, rather than making it difficult to make them) to collaborative RDFS/OWL knowledge base editing. The main goal of Ontowiki is to rapidly simplify the acquisition and representation of instance data from and for end users (while ontology schema editing may be still done using conventional tools as Protégé or Powl). The Ontowiki system will consist of the following components: - a permanently visible class tree - clicking on a class leads to filterable lists of instances - clicking on an instance results in rendering all triples related to the instance either using a appropriate ontology class template (such as Foaf, vCal, BibTeX) or a simple table with a row for every property - even though the rendering and editing should be highly configurable, reasonable defaults will be selected automatically (e.g. by considering the data type of a literal) - link to related information (of the same type) for a distinct instance - easy editing of triples in an inline-editing mode: to every information chunk displayed a small editing icon is attached, enabling users to edit the corresponding statement - editing statements, i.e. finding and relating the right properties or resources, is supported by an interactive search for related information while the user types - literals may be edited using different widgets, such as for plain text, HTML-Snippets, dates - all changes are versioned and may be independently rolled-back - discussion about all triples will be enabled by making use of the RDF reification vocabulary - easy to use full-text search on all literal data, search results will be filterable by classes or properties - measurement of the popularity of resources - display of recent changes grouped by users (making the changes) or resources All these components will be highly integrated to allow a consistent user interface experience and to combine all relevant information in a certain stage of the knowledge acquisition or review process. Services for BuRST Mentor: Jeen Boekstra BuRST is a simple format for the distributed management of bibliographic information using the RDF-based RSS 1.0 format. (See http://www.cs.vu.nl/~pmika/research/burst/BuRST.html ) Publications of an individual would be advertized as an RSS channel linked from the author’s homepage, which can be subscribed to, read by newsreaders, referenced from blogs, tagged using Dublin Core metadata etc. Bibliographic metadata is attached to the items on the channel and can be processed by software that is aware of the schema. BuRST uses the FOAF vocabulary to uniquely identify authors of publications. We are looking for developers to create a set of independent, web-accessible services for BuRST using Sesame and the JFoaf library http://www.openrdf.org/issues/secure/BrowseProject.jspa?id=10020 . This includes services to convert existing bibliographic formats such as Bibtex into BuRST (partial tool support is already available), support for the update of BuRST files, a crawler to collect such files, a smusher to detect duplicates in the collected data and a search/browse interface for publication collections. Created by: admin last modification: Tuesday 13 of September, 2005 01:34:02 UTC by admin |
login
Login
Search
|