Sunday, April 3, 2011

Best approach for a table with lots of responsibilties

I'm currently working on creating electronic version of various request forms. Each form will more than likely require some specific data about the user. For sake of argument, here's a simplified version.

      Form                                 Required Info
 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯              ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯             
 Mileage Claim               Does user have a lease car?
 Overtime Claim              Contract type (permanent/temp). Salary
 Procurement Request         User role (supervisor, technician)

I currently don't know how many forms in total will end up on this system, or what other extra information about the user they may need. But it's safe to assume they'll all require standard stuff like name, email, payroll number etc.

What approach would you take to model this? Throw it all into the users table or create lots of tables that reference User?

From stackoverflow
  • Throwing it all into the users table is a recipe for a growing "Ball of Mud" pattern. Eventually you'll have a table with 300 fields - first name, last name, address, work address, summer address, ownsaleasecar, likespizza, ....

    I'd have them all share a user object. The user would reflect what they all held in common. If they used different kinds of users (e.g. some where people who had personal information, and some were, say, employees with entirely different datasets) it might be reasonable to have users and employees, or some such. The main point is that I wouldn't try to shoehorn different types into the same table.

    edit- an additional point is that glomming different datatypes together makes enforcing integrity impossible - suppose you have "RentalCarUser" and "EmployeeUser" grouped into the same table, and you have the field "DoesUserHaveLeaseCar" for the RentalCarUsers. Well, then, it's going to be null or have a meaningless default for the Employees, and if you actually want to enforce that every rental car user has to have that information, you can't enforce it on the database level (field !=null) because you've got additional users for whom that value isn't applicable. Adding a trigger to fill in "NA" for the Employees doesn't help, because you've then got a whole bunch of records with "NA" and you can't easily tell if that's correct or missing data.

  • you got your 3 tables.

    users_tbl
    ---------
    id
    name
    ...
    
    
    form_tbl
    --------
    id
    name
    ...
    
    
    formRequirement_tbl
    -------------------
    form_requirement_id
    form_id
    and/or Flag
    

    Now you got your super-complicated logic.

    bool CheckRequirements($formid, $userid)
    {
        $arrayOfFormRequirementIds = goGetFormRequirementIds($formid);
        $result = false;
        foreach($requirementId,$andOrFlag in $arrayOfFormRequirementIds)
        {
           switch($requirementId)
           {
              case 1:
                   $sql = "Select 1 from leases where userid = $userid and active = 1";
                   $result |= executeSQLQueryScalar($sql);
                   if($andOrFlag == AND && !$result) 
                        return false;
                   break;
              ...
            }
         }
      return $result;
    }
    

    Now is the entire thing being driven off a giant switch statement? Yes. But by doing so, you can make the logic about what forms the user can fill out extremely complicated:

    • User cannot have a null phone number
    • User must have an active car lease (which could be difficult to figure out)
    • User must be between 20 and 50 years old, have an active e-mail, and live in Montana, unless he's over 60 and lives in Florida

    If you simply stored column names to check against nulls, you wouldn't be able to do the complicated logic. If you stored single bits in the user table like "canFillOutMarriageForm" you would have to wait until the job that updates those is run before the user can go fill out the marriageForm.

    Gary Willoughby : But by doing so, you can make the logic extremely complicated? Is that an anti-pattern?
    Tom Ritter : No, that comment refers to the form the user may or may not fill out: business logic. I updated it to be clear(er).
  • I would design the database in a modular fashion using the User's name (if uniqueness is enforced) or a User ID as a key to reference all the related information. You will do it better and learn more from modularizing the information into several tables.

  • It seems like you are not familiar with database normalization, so I suggest reading some tutorials on that. Here is but one example: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

    Where you want to start is with the paper forms that you are being asked to model in the database. Each field on the form becomes a field in your (unnormalized) database. Since you are dealing with multiple forms, you will probably have some fields that are in common. That is a good clue that you should abstract those into a separate table. For example, it sounds like all three forms will have a place for the employee to write their name to identify who is filling out the form, so you'll need an Employee table.

    From there, just follow one of the many normalization tutorials on the Internet. Third normal form is as far as most people go, but even second normal form are improvements. You may often find that your design will already be in first normal form.

0 comments:

Post a Comment