Fun SQL Tricks

So I recently started a new project that has a few requirements that are a little more complex than a simple web site. Exactly why these requirements are part of the project doesn’t really matter, nor does the specific nature of the project. However, along the way to implementing it, I learned a few things that will probably make the overall coding for the project simpler. These include the use of SQL triggers, stored routines, and foreign key constraints. Read on for a discussion of the first level of the fun.

Obviously, before I can start with the details, I need to enumerate the specific requirements that I had for this particular portion of the project. These requirements are:

  • User accounts with authentication that can be activated and deactivated, with unique login IDs for active accounts but multiple inactive accounts can have the same login IDs as any other active or inactive user accounts.
  • Unique object identifiers for all first class objects in the system. A first class object is an object that might be referenced generically by other parts of the system. User accounts are first class objects for reaons that will be clear later. Object ID allocation should happen automatically similar to how a MySQL AUTO_INCREMENT primary key would.
  • A reasonable audit trail must be kept for all changes to the system. That includes modifications to user accounts. It would be ideal if most of this audit trail could be maintained automatically without having to implement code everywhere to record the audit records for everything.
  • Once created, no first class object can ever be deleted from the record since it may be referenced by anything else in the system.

The other critical aspect of the process is which technologies I will be using to build the project. In this case, I am using PHP 8.1+ and MariaDB 10.6+, the pluses meaning later versions are acceptable but those are the versions I currently have installed on my development system. For today’s discussion, however, the PHP aspect is not relevant.

Unique Object IDs

This is the first requirement which proves to be somewhat interesting. While I could simply create a generic key/value scheme on top of the relational database model, that is far from ideal. Indeed, it prevents using most of the useful features modern RDBMS (Relational DataBase Management Systems) provide. Instead, I will maintain the usual separate tables with their own unique structure for each first class object type. Instead, I need a way to generate unique identifiers for each record that are unique across all first class objects along with a way to associate a generic first class object identifier with the type of object it references. For normal operations like table joins and the like, this imposes no extra restrictions since the identifiers will be unique within each table. Instead, I am imposing an extra level of uniqueness on the identifiers.

Obviously, I could use a sequence to generate the unique identifiers, then make an entry in an object type reference table and then create the relevant object in the object specific table. That, however, requires three queries for every insert, and it requires the application code to know how to manage the object identifiers. I also wish to avoid having predictable object identifiers, which a monotonically increasing sequence would create, but that isn’t a strict requirement. After thinking about it for a while, I decided I would use a table called “object_as follows:

>DESCRIBE object_ids;
+----------------+-------------+------+-----+----------------------+-------+
| Field          | Type        | Null | Key | Default              | Extra |
+----------------+-------------+------+-----+----------------------+-------+
| object_id      | varchar(50) | NO   | PRI | NULL                 |       |
| object_type    | varchar(25) | NO   |     | UNKNOWN              |       |
| object_created | datetime(6) | NO   |     | current_timestamp(6) |       |
+----------------+-------------+------+-----+----------------------+-------+

As you can see, I’m using an object ID that can be up to 50 characters long which is the primary key . Another column allows for an object type name of up to 25 characters. And, for audit purposes, I’m recording the timestamp of when the object ID is created.

But that leaves the problem of actually generating an object ID. This is where a stored function comes into play:

DELIMITER //
CREATE OR REPLACE FUNCTION lwas_object_id(otype VARCHAR(25))
    RETURNS VARCHAR(255)
    NOT DETERMINISTIC
    MODIFIES SQL DATA
    SQL SECURITY INVOKER
    BEGIN
        DECLARE id VARCHAR(50) CHARACTER SET binary DEFAULT NULL;
        DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' SET id = NULL;
        WHILE id IS NULL DO
            SET id = '';
            FOR i IN 1 .. 16 DO
                SET id = CONCAT(id, CHAR(FLOOR(RAND()*256)));
            END FOR;
            SET id = REPLACE(REPLACE(REPLACE(TO_BASE64(id), '/', ''), '+', ''), '=', '');
            INSERT INTO object_ids (object_id, object_type) VALUES (id, otype);
        END WHILE;
        RETURN id;
    END;//
DELIMITER ;

This function takes an object type name as a parameter and generates a sequence of 16 random bytes, encodes it as base 64, and removes the plus sign, slash, and equals sign from the result. This gives a mixed case alphanumeric result around 20 characters long. There’s no need to be able to reverse the base 64 encoding since this is just an opaque ID. Once the value is generated, it is inserted into the object_ids table. If there is a duplicate key collison, the generation is repeated. Once it succeeds, the generated ID is returned.

The astute among you will have realized that this is a case sensitive identifier and that may cause some excitement. Obviously, the duplicate key check would cover that, but why artificial limit the identifier space? Observe that DECLARE statement. Note that “CHARACTER SET binary” part? That makes it a binary string which a case sensitive collation. Now let’s take a closer look at the definition of the object_ids table:

CREATE TABLE `object_ids` (
  `object_id` varchar(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `object_type` varchar(25) NOT NULL DEFAULT 'UNKNOWN',
  `object_created` datetime(6) NOT NULL DEFAULT current_timestamp(6),
  PRIMARY KEY (`object_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Note that the object_id column is declared with the “ascii” character set with the “ascii_bin” collation. Why not binary? Well, I wanted to restrict IDs to the 7 bit ASCII range. The “ascii_bin” collation gives the case sensitive comparision result desired for the object IDs.

Using the IDs

Okay. Now we have a way to get unique IDs, how do we use them? Well, the naïve solution would be to simply reference the above function in the INSERT statement when creating a user. So let’s look at the users table and what such an INSERT statement would look like:

>DESCRIBE users;
+---------------+---------------------------+------+-----+---------+-------+
| Field         | Type                      | Null | Key | Default | Extra |
+---------------+---------------------------+------+-----+---------+-------+
| user_id       | varchar(50)               | NO   | PRI | NULL    |       |
| user_login    | varchar(250)              | NO   | MUL | NULL    |       |
| user_password | varchar(250)              | NO   |     | NULL    |       |
| user_name     | varchar(250)              | NO   |     | NULL    |       |
| user_status   | enum('active','inactive') | NO   |     | active  |       |
+---------------+---------------------------+------+-----+---------+-------+
>INSERT INTO users VALUES (lwas_object_id('user'), 'john', 'password', 'John Smith', 'active') RETURNING user_id;

Observe the RETURNING clause for the INSERT statement. That allows INSERT to return values just inserted into the table. This is particularly useful for obtaining the values of calculated columns like the user_id column in this case.

The problem here, though, is that it requires everything that inserts into the users table to know about the lwas_object_id() function and actually use it. Surely there’s a better way? And it turns out there is. This is where TRIGGERS enter. But just before we do that, let’s see the full definition of users:

CREATE TABLE `users` (
  `user_id` varchar(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `user_login` varchar(250) NOT NULL,
  `user_password` varchar(250) NOT NULL,
  `user_name` varchar(250) NOT NULL,
  `user_status` enum('active','inactive') NOT NULL DEFAULT 'active',
  PRIMARY KEY (`user_id`),
  KEY `user_login` (`user_login`),
  CONSTRAINT `users_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `object_ids` (`object_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

The foreign key constraint ensures that the user_id will be a validly allocated objectd_id. However, it doesn’t remove the need for users of the table to actually allocate that ID manually. Perhaps we can specify a “DEFAULT” for the user_id column? Alas, that doesn’t work with MariaDB because default values cannot reference stored functions. But we can do it with a “BEFORE INSERT” trigger:

CREATE OR REPLACE TRIGGER user_create1
    BEFORE INSERT ON users FOR EACH ROW
    SET NEW.user_id = lwas_object_id('user');

With that trigger, it doesn’t matter what we specify for the user_id column when inserting into the users table. In all cases, it will be replaced with a generated object ID of type “user”. Basically, the INSERT statement will cause two rows to be inserted. One in the object_ids table and one in the users table.

There is one more detail to cover here. What happens if the object ID is successfully allocated but some other aspect of the insert fails? That is where the “ENGINE=InnoDB” part of the table definitions comes into play. InnoDB is a transactional engine so it means a failure will cause the transaction to roll back. In the default mode where each statement is its own transaction, it will simply be as though the insert never happened with no new object ID allocated. As part of a larger transaction, the whole transaction will roll back, including the object ID allocation.

We may also want to have a “BEFORE UPDATE” trigger that raise errors to prevent messing with the records for allocated object IDs. The DELETE case is covered by having a foreign key constraint on all tables using the object IDs.

Conclusion

So what is the practical upshot of all this? Basically, with a stored function, a trigger, and an extra object_ids table, we can create a system where unique IDs can be generated and used similarly to an AUTO_INCREMENT column in MariaDB without the various users of the regular tables having to know anything about how the object IDs are generated.

Next time, I’ll look at the other requirements for users.

Leave a Reply

Your email address will not be published. Required fields are marked *