Goals for first pass
-
create short, meaningless primary keys see my note :) —philip
-
identify foreign keys
-
A big issue w/ foreign keys: users id. We're packing "anon:ipaddr" into the userEdited/etc fields in some places and using that, so we'll need to separate this out so that we can use users' ids as a foreign key. Proposed solution: change sycamore code so that when there's no logged user id for a page revision we assume it's anon and look in the userIP field.
-
identify where more research in the code or database manual is needed with ** notation
-
standardize types to work with pg and mysql (for now) (btw does pg have a timestamp? yes it does, but there are issues with using the db-native time fields and dealing with different time zones and time conversions. we should stick with the unix time for now.) **'
-
identify hidden 1:many and many:many relations, and create intermediate tables to handle those relations if necessary
-
identify duplicate data and move to one single location
-
work on existing database - no new features
-
standardize names to lowercase with _ as separator naming convention - current db is some of this and some mixedCase. (note, not all changes of this type are listed in this document but will be in the new schema).
I am not convinced (i've been going back and forth) of the idea that we should create short and meaningless primary keys (auto increment) when we have meaningful natural primary keys and no visible performance issues related to using them. The rationale was that using integer primary keys will speed up joins, but I am unsure that this is an issue for our dataset (or if it is even true? i am not sure i can say what the db is really doing in these cases). Before I'd feel comfortable creating meaningless primary keys and dealing with the cruft associated with using them I'd want to see what the actual gain would be — meaning, a query that is using such natural keys and a query that's not and the speed difference (or the query plan associated with it that shows a clear difference1). If there was a real sizable performance difference and it was worth the cruft then we'd certainly ought to go with it! I just don't want us to over-engineer something that may not need it.
Also, consider the fact that using auto incremented primary keys ends up being a scale issue on the other end of things — once you have more than one machine working together on the database you have to maintain these auto-incremented keys and can't partition your dataset without doing voodoo. So, I'd want this to fix performance issues we're seeing in the short term. It just seems like, looking at it, that it will end up making the whole database much more complex if we do use these meaningless auto-incremented fields. —PhilipNeustrom
Tables in alphabetical order
all_pages
add:
id bigint
change:
name -> curpages_id bigint
useredited -> users_id bigint
comment -> varchar(255) (might as well give user more space)
edittime -> edit_time timestamp
edittype -> edit_type enum (COMMENT_MACRO, DELETE, RENAME, SAVE, SAVENEW, SAVE/REVERT)
remove:
propercased_name - does this need to be in this table? does it change? **
keys:
pk: id
fk id -> curpages.id
fk users_id -> users.id
current_pages
add:
id bigint
change:
cachedtext -> cached_text why is it bytea? **
edittime -> edit_time timestamp
cachedtime -> cached_time timestamp
useredited -> users_id bigint
keys:
pk: id
fk: users_id -> users.id
note:
propercased_name is the name that the user intended for the page. this will be the single location for that name, since we don't track name revisions (right?) **
We do track this.
we have a cascade issue if a page is deleted - many other tables have page as fk. trigger to point those pages to page 0 "deleted page"? **
pass 2 should add fields to implement cache expiry
events
change:
uid -> id bigint
posted_by -> users_id bigint
keys
pk: id
fk: users_id -> users.id
note:
this needs a redesign contingent on better event functionality. this change is just to make it work for pass 1.
files
add:
id bigint
change:
uploaded_by -> users_id bigint
uploaded_time -> timestamp
keys:
pk: id
fk: uploaded_by -> users.id
note:
file to page is 1:many. Find out if the attached_to_pagename stuff is used. If so, we need to create an intermediate file/page table. **
imagecaptions/imageinfo
This needs a redesign. This should be a 1:many table relating one file to the many pages where it may be displayed (if that relationship is even necessary or used). All the info can be combined. Research image usage first before combining. **
Files can only be displayed on a single page.
links
add:
id bigint
change:
source_pagename -> source_pageid bigint
destination_pagename -> destination_pageid bigint
remove:
pagename_propercased
keys:
pk: id
fk: source_pageid -> curpages.id
fk: destination_pageid -> curpages.id
mapcategorydefinitions & mappointcategories
These aren't used - remove. We need to coordinate with Phil to see how he wants to handle legacy Daviswiki maps.
map_points
add:
id bigint (not the id that's there)
change:
pagename -> current_pages_id bigint
x -> lat decimal(7,2)
y -> long decimal(7,2)
created_time -> timestamp
created_by -> users_id bigint
id -> sequence_number (use to track which address this is on the page)
remove:
pagename_propercased
keys:
pk: id
fk: current_pages_id -> current_pages.id
fk: users_id -> users.id
note:
the id is not currently used so the current address is 1:1 with pages. need to update the address macro to update the sequence number and render multiple addresses on a page.
metadata
not used, delete. this looks like it was supposed to be tags once upon a time.
I just deleted all references to metadata (change #674 ) -Scott Beardsley 5/20/07 7:22 PM
old_files
same changes and keys as files plus:
change:
deleted_by -> bigint
keys:
fk: deleted_by -> users.id
oldimageinfo
probably a delete - aren't all the changes in images tracked because a text macro is used to embed them? **
oldmappointcategories
delete - not used
oldmappoints
not used and probably not necessary since changes in address are textual and tracked with the changes to the page. A redesign of the geo-features might want to track changes, but that's out of scope for the first pass.
pagedependencies
this is the dual of links, right? if a links to b, then b depends on a? delete if there's no other use for it **
This is used for included pages. Do not delete this.
thumbnails
need to research image relation more before fixing this. a thumbnail is just another image - might be better to use one image class for all **
user_favorites
add:
id bigint
change:
username -> users_id bigint
page -> current_pages_id bigint
viewtime -> timestamp
fk:
pk: id
users_id -> users.id
current_pages_id -> current_pages_id
note: find out what viewtime is used for **
Tracking the time the page was last viewed so we know if they need to be notified.
users
change:
id -> bigint
last_page_edited -> bigint
last_save -> datestamp
join_date -> datestamp
remove:
language - all null in rocwiki db
css_url - all null in rocwiki db
theme_name - seems to be ignored in rocwiki db **
tz_offset - looks like garbage in rocwiki db **
keys:
pk: id
fk: last_page_edited -> curpages.id
note:
rc_bookmark - doesn't that duplicate the max(date) in userfavorites where user_id = user.id?**
No, this is a different bookmark. This is for setting the "clear observed changes" on the rc page.
user_sessions
change:
user_id -> users_id bigint
expire_time -> timestamp
note:
what does secret do ? **
It's for authentication.
- 1PostgreSQL is actually rather nice with its query plan display. "EXPLAIN <query> tells you a lot in postgres that you don't see when you look at the MySQL plans