What is a Semantic Wiki?
The word Semantic is related to the meaning carried by any communication in words, code, drawing, anything. The idea of Semantic Wiki (or semantic web) is to write or create information in a way that make it easy to process and use by software agents. Many people use the term Semantic data and metadata interchangeably. In the case of Semantic Wiki it means to giving richer meaning to data managed via a user-maintained website or wiki. Both mean information about information. Semantic data can be expressed in a set of exactly three pieces of data. Together these pieces are called statements.
A simple example of semantic data is HTML's 'bold' tag. A word, paragraph, or sentence can be put between an opening <b> and a closing </b>. In most cases this just means the text in the middle is drawn in italics when it's viewed. But some software will use the tag to otherwise mark the information as bold. For example, screen readers for the blind can change tone or inflection for <b> tags. In short, it's not directly drawn to the screen, but the technology in the middle can help make that un-drawn information useful.
Another example of semantic data is the "address" of a place. We can make a wiki page for a place, like Sam's Pizza, but then we'd like to give it an address. We can simply write the address on the page, or we can use the [[address]] macro to identify the address of the page in a way that's more meaningful to Sycamore. Because of this, the address of the page gets plotted on a map. The idea behind the semantic effort is to allow us to express other ideas, like "address" or "category" in a way that is understandable to computers, while being painless for people.
Importance of the Semantic Wiki
When data has meaning to both a machine and a person wonderful things can happen. What social networking sites like MySpace have done to the dating scene and to personal relationships, the semantic web will do to all electronically stored information. You will be able to submit metadata queries like show me all objects of type "restaurant" that are open after midnight less than 5 miles from my home. And, seriously, don't you want a god damn hamburger at 2AM?
Requires no Extra Effort
Wiki's are user-maintained. This means that user's input data and maintain the relationships between other data and pages on the site. The best place to give meaning to data is at the source. When a user is adding a phone number for a restaurant they give it a name (usually "phone number") and a value. So they are already providing semantic data about that information it's just a matter of making it easy to store. The "subject" of the statement can be extracted from the current page.
Use cases
When talking about this, it's good to have an idea of what we'd like to be able to use this for. Here's some potential uses that wiki users would like (and have devised ways around, in some cases):
-
Single-element "tags" or "categories". "This page is a stub" or "This page needs a photo" or "This page is a restaurant".
-
The current way of achieving this is via a normal wiki link. Because we can, for a given page, figure out what pages link to it (using [[LinksHere]]), we can exploit this to create "tags" or "categories," in a way. For instance, on the Davis Wiki pages which are stubs are identified by placing [[Include(Stub)]] on them. This includes the contents of the "Stub" page in the page we want to mark as a stub. The "Stub" page has a link to the page "Stub/Definition", and no non-stub pages link to "Stub/Definition". Thus we can produce a list of pages which are stubs by simply asking [[LinksHere(Stub/Definition)]].
-
Such a strategy using links can be made more geek-pallitable by choosing a set "namespace" for which to link to. We could, for instance, link to "Category/Name of the category" to ensure that nothing else linked to it. Then we would simply link to ["Category/Restaurant"] when we wanted to mark a page as such.
-
The problem with this approach is that it doesn't generalize for anything more than a single "key," and it isn't really clear what is going on — is this a link or is this something else? We use links to express simple links between pages that are part of the page content, not really to express ideas that are removed or abstracted from the page content.
-
To replicate this with the current proposal, we'd simply mark a page like so: Category := Stub and Category := Photo request. We can add more information in, doing something like Stub since := 2006-02-01 and PhotoRequest since := 2006-02-01.
-
Taking details that are sitting in "formatted tables" and making them searchable. Many pages on the wikis have tables up at the top with similarly named data. These pages are usually created from something like the "Business Template." For instance,
delta of venus. The wikis tend to keep a convention for these pages' format, which means that adapting to a set convention for semantic data wouldn't be too hard for editors. In the Delta of Venus example, we have a set of attributes: Location, Hours, Phone, Email, Website, Menu (which is a link to the menu). The goal of this semantic effort is to allow these things to be queried upon, and pages to be built based on them in a straightforward way.
-
To achieve this using the current Sycamore system isn't really possible.
-
To replicate this using the current proposal, we could write:
-
Location:
-
Longitude/Latitude (attributes): [location:=-12.34,23.456]
-
Neighborhood (a relation): [district:=Downtown Davis] <- This could be calculated once we have the long/lat and the boundaries of "Downtown Davis"
-
Hours:
-
[hours:=Monday through Wednesday 8am to 10pm, Thursday through Friday 8am to midnight, Saturday 8am to 2pm] <- These would probably need to be generated by a special UI for usability reasons
-
[happyHours:=Monday through Friday 3pm to 6pm]
-
Phone: [phone:=(530) 753-8639]
-
Email: [email:=info@deltaofvenus.org]
-
Website: [website:=
http://deltaofvenus.org]
-
Est: [established:=January 1st, 1990]
-
Menu (a relation): [menu:=Delta of Venus/Menu]
-
Tag, Label or Category: [tag:=restaurant]
Integrated w/ mapping, a'la a map for Apartments. Map of all pages with a given tag / value.
Unified Syntax Examples
see /API for one possible inplementation
Possible formats for either a relation or attribute (the unified syntax):
Wiki Page Links
The following examples link a wiki page to another wiki page and gives that link a specific meaning.
-
[business_type := Restaurant] (this uses the default namespace: class)
-
[class:business_type := Restaurant] (this explicitly sets the vocabulary namespace)
Literal Values and Offsite Links
The following examples link a wiki page to an off-site URL or describes something about the current page.
-
[phone := (916) 123-4567] (this uses the default namespace: class)
-
[class:phone := (916) 123-4567] (explicitly sets the namespace)
-
[tag := neighborhood]
-
[location := -123.456, 54.23]
With this form, we can explain the idea to newcomers as a way of "tagging the page with a value," in a sense. Since each predicate has a pre-defined type (managed via a special page) there is no need to explicitly state the type.
Another motivation behind this format is that on flickr people began tagging images with non-tag-like data, such as geocoordinates using a technique known as machine tagging. People started using a format very similar (geo:long=123.456), and flickr ended up supporting this format and allowed querying based upon it. Check out this discussion for more. (In our case, we'd have geo long := 123.456)
Sycamore Plugin
There is interest in a Sycamore Plugin that will bring Semantics to the local wiki world. This development effort is currently in the planning phase.
Database Needs
Copy source XML into
this editor to view the schema details.
The Semantic MediaWiki plugin is relatively complete and has been used as a reference for analyzing database requirements. We might consider changing from their standard a bit. I propose using the following database structure.
Source ERD file: metadata.xml
A demo of the ERD Editor is available also.
Tables
curPages
This table is already a part of the Sycamore schema. It is shown here for illustrative purposes. Here are some sample rows:
+--------------------+ | pagename | ... +--------------------+ | East Sacramento | ... +--------------------+ | Sacramento | ... +--------------------+ | California | ... +--------------------+
namespace
This table provides a way to store multiple namespaces (or vocabularies) in a single wiki. The most used entries in this table are class, literal, and wiki. It is a good idea to try to use other namespaces like foaf, dc, etc but it should be up to the community to enforce these rules. These namespaces will likely be imported on initial installation (or upgrade). There doesn't necessarily need to be a way to add namespaces (yet).
+-----------+---------------------------------------------+ | alias | uri | +-----------+---------------------------------------------+ | class | http://wikispot.org/class/ | +-----------+---------------------------------------------+ | literal | http://wikispot.org/literal# | +-----------+---------------------------------------------+ | wiki | http://sacramento.wikispot.org/ | +-----------+---------------------------------------------+ | dc | http://purl.org/dc/elements/1.1/ | +-----------+---------------------------------------------+ | dcterms | http://purl.org/dc/terms/ | +-----------+---------------------------------------------+ | wikipedia | http://en.wikipedia.org/wiki/ | +-----------+---------------------------------------------+ | rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# | +-----------+---------------------------------------------+ | rdfs | http://www.w3.org/2000/01/rdf-schema# | +-----------+---------------------------------------------+
type
This table is for associating a particular data type to a "name" or "predicate". An example predicate is a phone number. In the example below the predicate class:phone is of type integer. Phone numbers can be stored as just numbers (ie 9161234567) and they can be parsed into a readable format (ie (916) 123-4567) as needed. The following types are expected to be supported:
-
integer
-
string
-
float
-
geoPoint
-
geoLine
-
geoArea
-
currency
-
wikipage
-
url
-
datetime
-
datetime range
+-------+----------------------------+ | alias | name | type | +-------+----------------------------+ | class | phone | integer | +-------+----------------------------+ | class | neighborhood_of | wikipage | +-------+----------------------------+ | class | city_in | wikipage | +-------+----------------------------+ | class | tag | string | +-------+----------------------------+
metadata
This table defines metadata for pages stored in the wiki. This table follows the subject-predicate-object rule because the pagename is always the subject.
+----+-----------------+-----------------+-----------------+----------------+--------------+ | id | pagename | predicate_alias | predicate | object_alias | object | +----+-----------------+-----------------+-----------------+----------------+--------------+ | 1 | sacramento | class | city_in | wiki | California | +----+-----------------+-----------------+-----------------+----------------+--------------+ | 2 | east sacramento | class | neighborhood_of | wiki | Sacramento | +----+-----------------+-----------------+-----------------+----------------+--------------+ | 3 | east sacramento | class | tag | literal | neighborhood | +----+-----------------+-----------------+-----------------+----------------+--------------+
Interface Needs
This feature requires some changes to the user interface.
Edit Interface
The edit interface might need to be modified for certain data types like location, areas, etc. This could just be css/dom javascript magic to insert the content into the edit textbox.
Can an example be given of an interface for editing that allows these relationships to be expressed?
Perhaps a popup window/iframe that shows a google map where when a location is clicked the window closes and the lat/lon is entered as a point like so: "Point(-143.23,235.30)".
Search Interface
There needs to be an advanced search option that allows results to be returned based on relations and attributes. The search's options could be based on the items searched for — for instance, a "within x miles of.." operator when you're interested in searching based on "location". We could tailor the interface to allow you to select a number of <types> to base your search on, and each type has a different way of allowing you to query it.
The types could be pluggable, so each type could have two python files associated with it. One that would tell us how we want to query the database for the information we want, and the other telling us how to represent the information to the user once we've gotten the search results (as well as how to represent the information when it's on a page — e.g. an address should be linked to the map associated with the address point).
Code Organization
This feature should be able to be implemented as a plugin but it does require database changes and extra search capabilities. There will need to be a few custom pages including one to allow adding entries to the type table.
See /Caching for thoughts about making this fast.
Other Semantic Wiki Projects
There are other projects that are relatively far a long in the development process. One to watch for is Semantic MediaWiki which already has some early adopters.
-
Semantic Wiki State Of The Art, contains a large list of existing Semantic Wiki Prototypes
-
IkeWiki is a Semantic Wiki
Semantic Web Links
-
Far's Sycamore Tag sketchup. He's been playing with the idea of a tagging system for the wiki. Regardless of the outcome, the tags system should become a subset of the semantic efforts here. Perhaps we could even use a "tags" interface for adding metadata to a page — a'la flickr (a box, you click, type a value, press return). More complex relationships could be expressed by typing in the "type key := value" form of a "tag." For better or for worse, people are accustomed to the idea of a "tag" now, and using "tag" is more general than "category."
User interface
The UI for all of this is really important. We need to keep it just as easy as it is now for people to change addresses, phone numbers, and so forth.
One option would be to have a button that reads 'Metadata' in the edit area.
One option for editing metadata.
Or maybe we should show all the metadata stuff in the normal editor interface, keeping everything in the same place. We could also make it so that wherever we display the data we allow it to be clicked on and edited, inline-style.
There are a few different ways of dealing with the display of the metadata. One is to embed the data directly into the page content, in the same way things like links and macros are embedded in a page's text. All data entered would be displayed right where it was entered.
Another way is to have data entered into the page's body (or via a separate interface), but then only displayed when it was signaled by some sort of [[get(value)]] macro. This has the advantage of ultimate control over the presentation of the information. It has the drawback of making it potentially confusing to change the data — how does the average joe know how to change the address of the page? What used to be just "300 Main Street" is now [[get(address)]]. This confusion could probably be mitigated by displaying the metadata fields right in the edit interface, and when initiating a quick edit on the area, somehow entering into an edit for just that metadata field..
Another way is to have the data entered and displayed in a way that's separated from the page's body (e.g. at the bottom of the page). This has the advantage of making it easy to see where to change the information — you change it right where it's displayed — but it has the disadvantage of not allowing for careful control over where the data is displayed, and being redundant (the address will still probably be entered on the page). We could still allow for something like [[get(value)]] in this case..
I'm leaning toward thinking the last option here is the best, UI wise. It may not be super pretty, but I think it's the most obvious. —PhilipNeustrom
Mockup
UI mockup goes here
Ultimate goals
Consistent with the goals of the Sycamore project, the aim of the semantic feature will not be to produce a proof of concept system for semantic research, but rather a solid, easy-to-use semantic system that will help us access information more easily.
Questions
Note: You must be logged in to add comments
Can you explain the purpose of the semantic_datatype table? It seems like it's supposed to be some sort of dispatch. The regexp matches and then that tells us what the type of the statement is, and we use that how in this model? (Basically, why doesn't the semantic_datatype associate itself with the predicate?)
2007-03-20 13:12:33 semantic_datatype: I've added some notes about my thinking. —Sc0ttBeardsley
2007-03-20 17:56:15 Why is there no relationship between the semantic_attribute table and the semantic_relation table? After we create a datatype and place it into the semantic_attribute table, doesn't that datatype get associated with the predicate of the semantic_relation table? I can define a phone number's form and say that it is a phone number, but then when I encounter another phone number I'd like to use the same row from semantic_attribute, as the predicate is the same. (Though, I notice the semantic_attribute is tied to a specific page, too. I suppose I need more clarification as to its purpose. I know it's for identification of types, but I'm not sure how it's fitting in.) —75.31.44.27
2007-03-20 20:00:11 The semantic_attribute table is said to "[define] attributes for objects stored in the wiki", and has subject, predicate, and object. But so does the semantic_relation table? What's the purpose of that table? —75.31.44.27
2007-03-20 21:05:08 I think the attribute table needs to be changed. the predicate should be able to be one of the standard predicates in DC, FOAF, etc. I'd like it also to be a custom datatype also. The semantic_relation table is for showing relations between two real world things. One of those things is represented as a page in the wiki the other can be another page in the wiki or some other real world thing. For example "Shakey's Pizza" "Is A" "Restaurant" the "Shakey's Pizza" has the local wiki's namespace so it represents the page in the wiki. —Sc0ttBeardsley
2007-03-20 21:14:27 So what you had in mind is that the relationship is always between a page and another object? —75.31.44.27
2007-03-20 21:18:20 Yes, generally a page and another object... the page can either be referenced in the subject or the object... The database structure would allow storing relationships between two non-page objects but that's not all that interesting for our purposes. —Sc0ttBeardsley
2007-03-20 23:32:38 To recap relation vs attribute: relations connect two objects and attributes connect an object to a literal (like a date/number/string). There is a good discussion about this on MediaWiki's blueprint page. —Sc0ttBeardsley
2007-04-20 03:06:49 I like the simplier schema. We should talk a bit about markup and UI. [Established:=Date("1990-01-01")] versus Date Established := 1990-01-01. If we say a type can only be one word then the latter markup would work well, I think?
We could also opt to have this as a somewhat disjoint UI from the normal editing interface (a "Metadata" button?). I'm really not sure how the UI for all of this ought to work, but I think it's actually the most important part, as we have to make this easy to use (otherwise it won't be used). —PhilipNeustrom
2007-04-20 03:46:40 re: metadata button: I was thinking of just embedding it into the text of a page. The metadata wouldn't have to be displayed by default. Say you have a macro called Metadata that takes 4 args (predicate_alias,predicate,value,display_flag). This macro would add an entry to the metadata table, then if the display_flag is set it would display the metadata in a predefined format (based on the type of the predicate). For example I have a restaurant page with the following macro call: Metadata(class,tag,expensive). This would add the metadata about this restaurant being expensive but it would not display that information on the final page. As far as getting a list of valid predicate_aliases and predicates, yes it might be nice to have some sort of tool (ajax?) that will pull up the already defined vocabularies. It is important to have something like this because we want everyone speaking the same language. —Sc0ttBeardsley
2007-04-20 04:17:34 A comment about types: I think we should offload the type of a metadata item onto another special page. Instead of having the page editor define the type of a metadata item inline it should be a separate procedure. This makes it slightly more difficult (not impossible) to add new vocabulary words. The goal is to get people to use a small set of words to describe data. If we tie type to the predicate (aka keyword aka name) elsewhere then we can both limit the syntax required for the page editors while still knowing what type of data they should be entering. So the special page will allow page editors to add a new vocabulary word (and it's type) on the fly. This will essentially be an interface to the type table. —Sc0ttBeardsley
2007-04-21 01:11:51 Ahh yes, I see where you're going Philip. I kinda like the separate metadata interface. I think there would have to be a drop down menu for the "tag" field as it is labeled in the UI screenshot. We should talk more about this though. —Sc0ttBeardsley
We'll talk more, but just as an idea here's what flickr does to solve the "figure out what tag to use" issue:
2007-04-21 12:23:12 You could have the "metadata" entered directly on the page in some sort of metadata block. That block could have a bunch of display options including "hide". This way the user would he able to use [[get(address)]] if they needed and not bet redundant. In the Confluence wiki system the metadata "block" is simply a macro with a body:
[[metadata(hide)]] name = Joe phone = (555)555-5555 address = 123 Main Street [[metadata]]—StephenDay
2007-04-21 23:27:53 ya, that should work, but I'm worried about how we'll be adding a new tag/name though. I guess it'll be OK to just allow new tags and just set their type to the default (which is as a string). Since every tag/name has a specific type you'll want a way to define this when adding the tag/name. Also, just a note on why every tag/name should have a specific type: this will enable a uniform display of that name. So phone numbers will always be displayed as: +1 (XXX) XXX-XXXX —Sc0ttBeardsley
Using the not-in-pagetext-form we can have the "Tag" or "name" (whatever we end up calling it) field cause the second field to change once it's entered (javascript magic)..
More ideas about using a macro with a body (or some king of in-page metadata block):
[[metadata(display=False)]] || name || Joe || firstName || || phone || (555)555-5555 || phoneNumber || || address || 123 Main Street || streetAdress || || number of pets || 14 || int || [[metadata]]
Most users will already know how to make tables. If they don't put in the last column, just use a default, which could be derived from the name in many cases. —StephenDay
2007-04-23 03:06:35 Given the schema here, how would we search for something based on location without getting all of the locations in the database and scanning through them? Will string sorts/indexes be sufficient for our efficiency purposes, given that we keep all values in a consistent format in the DB? In what cases would we run into problems with this approach? This is just an example problem. —PhilipNeustrom
2007-04-23 03:27:14 You bring up a good point Philip. It would be best to store spatial data using spatial extensions to MySQL/Postgres. Do you have any suggestions? Perhaps a separate table for each type? —Sc0ttBeardsley
How do other semantic projects deal with this issue? One possible solution is to not allow sorts of generic types. Instead, we allow sorts of types that have accompanying python extensions that tell us what to do. E.g. dates.py tells us how to sort dates and how to store dates (we create and use a separate table), times.py tells us how to input and query for times, and geo.py tells us how to input and query for latitude/longitude. We would still update and use the tables we have now, but when we're querying against or inputting a type with an associated set of rules we follow those. (We use both tables for input, and use the native table for queries). —PhilipNeustrom
I'm not sure about other semantic projects. A lot of projects including SemanticMediaWiki use
XSD to define their data types (defining and storing are different, though). As far as searching by geo location, I've not seen how it's done in the backend. We could use the
casting functions to mold the data into what we wanted it to be. Of course then we could use something
like this to calculate distance between objects.
2010-02-15 23:39:53 Notes on derived metadata sets...
Instead of single fields start with a base metadata form that consists of things that will need to be tracked about all pages:
-
Page Name — Changing this effects a page rename?
-
on Wiki — to allow moving pages between wikis. Links get rewritten as part of a move and as part of a revert?
Then from this create some metadata that uses this base, but adds groups of appropriate fields. At the first tier it is just separating pages out into general classes, but eventually you'd have (base -> business -> restaurant -> sushi) with each level adding appropriate new fields.
Fields should allow radio buttons, checkboxes, text, numeric, regex validated, range (for prices), date time, time matrixes (for open hours), photos. each field should track if it is required or not, and allow for a default value.
In the same way that people can now create templates allow for the creation of metadata sets. only admins will be able to remove fields from a set because doing so would remove that data from all pages that use that metadata set. alternatively have a central library of metadata that any wiki can subscribe to.
It becomes very easy to produce reports off the information. (base -> business -> contractor -> painting) for instance would allow any page using that set to see at a glance all of the contractors, their license numbers, and their price ranges and ratings automatically. It should report for the local wiki first and then report separately for nearby wikis with an option to view the same information across all wikis. —JasonAller