Friday, March 9, 2012

OO Database Design performance question

I am hoping someone has had a similar table design scenario and has a slick solution that i am not thinking of.

Basically I have the following tables (kind of OO based design in this case).

Create Table Entity (EntityID int, EntityTypeID)

Create Table Company (EntityID, CompanyName)

Create Table Contact (EntityID, FirstName, LastName)

Create Table Group (EntityID, GroupName)

Basically an Entity can be a Company, a Contact or a Group based on the EntityTypeID

The trick becomes when I get an EntityID is how to get the name of the Entity in an efficient manner.

You can make a view that does a Union to all three tables like such...

select CompanyName as Name from Company

union all

Select FirstName + LastName as Name from Contact

union all

Select GroupName as Name from Group

Perfomance wise this stinks. I have also tried a left join to all the tables and that performs better, but still not great when you are joining a bunch of records (such as Orders back to get the name of the Entity on the order).

I was thinking of an indexed view but it doesn't let you do a left join or a union. A trigger also comes to mind but I have other triggers on those tables and I really don't want to create any more. I could also add all the different name columns to the entity table but then I end up with a bunch of null values.

Any creative ideas on how to create a function, view, indexed view, etc that will make this efficient?

Thanks!

OO designs often fail to translate well to a high-performance transactional database. Subtypes are especially troublesome in this regard.

The problem here is one of abstraction. The more you abstract something, the more it becomes difficult to query in a practical way. Some package software takes this to the extreme with the concept of the "thing" table. works great on the whiteboard but kills performance as tables get large.

A function is probably going to be your best option. Write a T-SQL function that chooses the table for the lookup based on the type column. You can either pass in the type as a parameter or look it up in the function.

something like:

if @.type = 1 then
select @.retval = companyname from Company

if @.type = 2
select @.retval = firstname+lastname from Contact

if @.type = 3
select @.retval = groupname from group

|||

The key here is that if you are going to use subtypes, (and I am a big fan) you must keep everything that is common between them in the same table, otherwise you are just asking for trouble. The problem here is that you have to be careful with subtypes not to overdo them, and in ths case I think you may have done that. In SQL, if you are going to do a subtype, the supertype really needs be compellingly similar and have a similar purpose.

For example, a vehicle type might be proper for an auto shop, because all vehicles are tracked the same, then each subtype has some differences. Or you might have a patient supertype (records, names bills, etc) and then subtypes for male patients and female ones.

In your case, the subtypes have nothing to do with one another so there is no point and all you are doing is forcing this OO design into a database. You could fix your performance problems by adding a Name column to the Entity table and update it using a trigger on each of the supertypes, but unless there really is some compelling similarities that merit a supertype, you should rethink your design.

|||Take a look at the AdventureWorks sample database. IMHO, it has a very good normalized representaion of Contacts/People/Customers, etc.|||I agree that a function will work however when it comes to searching based on name the function is going to scan which obviously wouldn't be good.|||

i see your point however I argue that it does need an OO design. For example a Customer that orders can be a contact or a company. This really is only an issue when dealing with searching. A contact for name has a Salutation, FirstName, LastName, MiddleName, Suffix while a company might have CompanyName and DBAName. Both are customers however their "name" structure is entirely different. Would you then put all of those fields on the Entity table and leave some null depending on the EntityTypeID? I can see going with a Name1 and Name2 column on the Entity table and those fields have different meanings based on the EntityTypeID. Thoughts?

|||

It is not really an OO design, it is a common database thing. In your case, if you do mean a customer, which can be a person or a company, then yes, this is a valid thing to do. However, I am not so sure I wouldn't make the user choose the subtype before doing a search, since it is pretty unlikely that they wouldn't know if they were looking for one or the other.

I might also do the search in each table seperate from one another, perhaps in different connections from your UI. Either way I would certainly not create generic columns for names, as that would make the design harder to use, and I wouldn't put the names in the customer table with nulls where values don't quite work because then the searching issues are exactly the same (perhaps worse) than with two tables.

Otherwise, like I said before, I would add a name column to the customer table that was managed by a trigger on the child tables. So I would populate the customer.name column with person.firstName + ' ' + person.lastName (or something like it) for persons, and company.name for the companies, if you really need to search both in a single statement.

|||

Stark77 wrote:

I agree that a function will work however when it comes to searching based on name the function is going to scan which obviously wouldn't be good.

I don't think SQL Server has function-based indexes like Oracle, however you might be able to create an indexed view that includes the function as a column.

if you schemabind your function and your view (the view will have to be pretty simple) then this may allow you to create an indexed view, which is similar to a materialized view in Oracle, and that may help your performance.

worth a try if you haven't thought of this option.

Another option would be to use a trigger to denormalize the correct name into the parent table.

No comments:

Post a Comment