Database Development - Sycamore Wiki

Tables in alphabetical order

all_pages
current_pages
events
files
links
mapcategorydefinitions & mappointcategories
map_points
metadata
old_files
oldimageinfo
oldmappointcategories
oldmappoints
pagedependencies
thumbnails
user_favorites
users
user_sessions

Goals for first pass

create short, meaningless primary keys see my note :) —philip
identify foreign keys

A big issue w/ foreign keys: users id. We're packing "anon:ipaddr" into the userEdited/etc fields in some places and using that, so we'll need to separate this out so that we can use users' ids as a foreign key. Proposed solution: change sycamore code so that when there's no logged user id for a page revision we assume it's anon and look in the userIP field.

identify where more research in the code or database manual is needed with ** notation
standardize types to work with pg and mysql (for now) (btw does pg have a timestamp? yes it does, but there are issues with using the db-native time fields and dealing with different time zones and time conversions. we should stick with the unix time for now.) **'
identify hidden 1:many and many:many relations, and create intermediate tables to handle those relations if necessary
identify duplicate data and move to one single location
work on existing database - no new features
standardize names to lowercase with _ as separator naming convention - current db is some of this and some mixedCase. (note, not all changes of this type are listed in this document but will be in the new schema).

I am not convinced (i've been going back and forth) of the idea that we should create short and meaningless primary keys (auto increment) when we have meaningful natural primary keys and no visible performance issues related to using them. The rationale was that using integer primary keys will speed up joins, but I am unsure that this is an issue for our dataset (or if it is even true? i am not sure i can say what the db is really doing in these cases). Before I'd feel comfortable creating meaningless primary keys and dealing with the cruft associated with using them I'd want to see what the actual gain would be — meaning, a query that is using such natural keys and a query that's not and the speed difference (or the query plan associated with it that shows a clear difference¹). If there was a real sizable performance difference and it was worth the cruft then we'd certainly ought to go with it! I just don't want us to over-engineer something that may not need it.

Also, consider the fact that using auto incremented primary keys ends up being a scale issue on the other end of things — once you have more than one machine working together on the database you have to maintain these auto-incremented keys and can't partition your dataset without doing voodoo. So, I'd want this to fix performance issues we're seeing in the short term. It just seems like, looking at it, that it will end up making the whole database much more complex if we do use these meaningless auto-incremented fields. —PhilipNeustrom

Tables in alphabetical order

all_pages

add:

id bigint

change:

name -> curpages_id bigint

useredited -> users_id bigint

comment -> varchar(255) (might as well give user more space)

edittime -> edit_time timestamp

edittype -> edit_type enum (COMMENT_MACRO, DELETE, RENAME, SAVE, SAVENEW, SAVE/REVERT)

remove:

propercased_name - does this need to be in this table? does it change? **

keys:

pk: id

fk id -> curpages.id

fk users_id -> users.id

current_pages

add:

id bigint

change:

cachedtext -> cached_text why is it bytea? **

edittime -> edit_time timestamp

cachedtime -> cached_time timestamp

useredited -> users_id bigint

keys:

pk: id

fk: users_id -> users.id

note:
propercased_name is the name that the user intended for the page. this will be the single location for that name, since we don't track name revisions (right?) **

We do track this.

we have a cascade issue if a page is deleted - many other tables have page as fk. trigger to point those pages to page 0 "deleted page"? **

pass 2 should add fields to implement cache expiry

events

change:

uid -> id bigint

posted_by -> users_id bigint

keys

pk: id

fk: users_id -> users.id

note:

this needs a redesign contingent on better event functionality. this change is just to make it work for pass 1.

files

add:

id bigint

change:

uploaded_by -> users_id bigint

uploaded_time -> timestamp

keys:

pk: id

fk: uploaded_by -> users.id

note:

file to page is 1:many. Find out if the attached_to_pagename stuff is used. If so, we need to create an intermediate file/page table. **

imagecaptions/imageinfo

This needs a redesign. This should be a 1:many table relating one file to the many pages where it may be displayed (if that relationship is even necessary or used). All the info can be combined. Research image usage first before combining. **

Files can only be displayed on a single page.

links

add:

id bigint

change:

source_pagename -> source_pageid bigint

destination_pagename -> destination_pageid bigint

remove:

pagename_propercased

keys:

pk: id

fk: source_pageid -> curpages.id

fk: destination_pageid -> curpages.id

mapcategorydefinitions & mappointcategories

These aren't used - remove. We need to coordinate with Phil to see how he wants to handle legacy Daviswiki maps.

map_points

add:

id bigint (not the id that's there)

change:

pagename -> current_pages_id bigint

x -> lat decimal(7,2)

y -> long decimal(7,2)

created_time -> timestamp

created_by -> users_id bigint

id -> sequence_number (use to track which address this is on the page)

remove:

pagename_propercased

keys:

pk: id

fk: current_pages_id -> current_pages.id

fk: users_id -> users.id

note:
the id is not currently used so the current address is 1:1 with pages. need to update the address macro to update the sequence number and render multiple addresses on a page.

metadata

not used, delete. this looks like it was supposed to be tags once upon a time.

I just deleted all references to metadata (change #674 ) -Scott Beardsley 5/20/07 7:22 PM

old_files

same changes and keys as files plus:

change:

deleted_by -> bigint

keys:

fk: deleted_by -> users.id

oldimageinfo

probably a delete - aren't all the changes in images tracked because a text macro is used to embed them? **

oldmappointcategories

delete - not used

oldmappoints

not used and probably not necessary since changes in address are textual and tracked with the changes to the page. A redesign of the geo-features might want to track changes, but that's out of scope for the first pass.

pagedependencies

this is the dual of links, right? if a links to b, then b depends on a? delete if there's no other use for it **

This is used for included pages. Do not delete this.

thumbnails

need to research image relation more before fixing this. a thumbnail is just another image - might be better to use one image class for all **

user_favorites

add:

id bigint

change:

username -> users_id bigint

page -> current_pages_id bigint

viewtime -> timestamp

fk:

pk: id

users_id -> users.id

current_pages_id -> current_pages_id

note: find out what viewtime is used for **

Tracking the time the page was last viewed so we know if they need to be notified.

users

change:

id -> bigint

last_page_edited -> bigint

last_save -> datestamp

join_date -> datestamp

remove:

language - all null in rocwiki db

css_url - all null in rocwiki db

theme_name - seems to be ignored in rocwiki db **

tz_offset - looks like garbage in rocwiki db **

keys:

pk: id

fk: last_page_edited -> curpages.id

note:

rc_bookmark - doesn't that duplicate the max(date) in userfavorites where user_id = user.id?**

No, this is a different bookmark. This is for setting the "clear observed changes" on the rc page.

user_sessions

change:

user_id -> users_id bigint

expire_time -> timestamp

note:

what does secret do ? **

It's for authentication.

1PostgreSQL is actually rather nice with its query plan display. "EXPLAIN <query> tells you a lot in postgres that you don't see when you look at the MySQL plans