Fun SQL Tricks III

I’ve been spending a bit of time detailing my recent adventures with MariaDB for a project. You can see those articles here: part 1, part 2. I will assume you have read or at least perused those articles. This time, on to that other requirement: audit logging.

What to Store

Before audit logging is something that can even be implemented, I need to define what information I need to store, and which component of the system is responsible for storing it. That turns out to be less obvious that it would seem. Based on my tinkering around with this project and decades of experience operating servers, I have determined the following:

  • What. As much information as can reasonably be stored and which might reasonably be useful should be. The key here is the work “reasonable”. That is very situational and can be limited by everything from the storage space available to information availability in the first place. Since this is a greenfield project, however, I can pretty much define whatever I want here. That leaves things like storage, and whether it is practical to store some information. And also which information is worth storing in the first place. In this case, I can also disregard storage requirements because the project has plenty available.
  • Who. Information should be stored by whichever aspect of the system is responsible for managing the underlying objects. So if the canonical storage location for user information is the database server, then the database server should be responsible for generating audit logs for it. In cases where the thing to be recorded is actions of a process, that is better done by the process itself, though where it touches objects owned by another part of the system, that owner should do its own logging.
  • Where. The logs should be stored in a location that is readily accessible to the part of the system doing the storing which has as few dependencies as possible. For the database server, that is probably a database table. For application logic on the web server, that is probably a log file on the server. Naturally, these logs should be replicated to another location on some appropriate schedule, but the generation of the logs in the first place needs to be as reliable as possible with as few external dependencies as possible.

Okay, so what does that mean for this project? Well, I’mconcerned with user records in the users table here. That is clearly owned by the database server so I should be recording our audit information there. And that is what I am going to do.

Storage Structure

After much tinkering and iterating the scheme, I landed on an arrangement that uses a central audit_meta table that records various details about a particular event. This is combined with a sort of shadow table for each table that needs audit logs. These audit tables will have an identical structure to the main table except that one additional column will be included – the timestamp the entry was created. In the case of the users table, the audit table is called, creatively, audit_users. Here are the two table definitions:

CREATE TABLE IF NOT EXISTS audit_meta (
    audit_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    object_id VARCHAR(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
    audit_type ENUM('insert', 'update', 'delete', 'unknown') NOT NULL DEFAULT 'unknown',
    audit_time DATETIME(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
    audit_dbuser VARCHAR(250) NOT NULL DEFAULT CURRENT_USER(),
    audit_lwasuser VARCHAR(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '<system>',
    PRIMARY KEY (audit_id),
    KEY object_id (object_id),
    FOREIGN KEY (audit_lwasuser) REFERENCES object_ids(object_id),
    FOREIGN KEY (object_id) REFERENCES object_ids(object_id)
) CHARSET=utf8mb4;
CREATE TABLE IF NOT EXISTS audit_users (
    user_id VARCHAR(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
    user_login VARCHAR(250) NOT NULL,
    user_password VARCHAR(250) NOT NULL,
    user_name VARCHAR(250) NOT NULL,
    user_status ENUM('active', 'inactive') NOT NULL DEFAULT 'active',
    audit_timestamp DATETIME(6) NOT NULL DEFAULT current_timestamp(6),
    PRIMARY KEY (user_id, audit_timestamp),
    KEY user_id (user_id),
    KEY audit_timestamp (audit_timestamp),
    FOREIGN KEY (user_id) REFERENCES object_ids(object_id),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
) CHARSET=utf8mb4;

You can see the audit_meta table has a few obvious columns, including the timestamp of the event, the ID of the object the event is related to, and the type of the event. The ID column is there to give a guaranteed unique key and is using BIGINT just in case the logging gets out of hand. The audit_dbuser column will the the user connected to the database server. The audit_lwasuser column is intended to be the actual system user who triggered the event. Where that is unknown, <system> will be used. Note that both the object_id and audit_lwasuser columns must reference a valid key in the object_ids table, which means “<system>” must be a recorded object ID. That entry will be created with the installation oif the system. The real question is where the value of audit_lwasuser comes from. Unfortunately, this requires whatever connects to the database server to set a variable on the connection to define that, but the mechanics of that are beyond the scope of this particular discussion.

There isn’t really anything in particular to see about the audit_users table except that the user_id column is defined with a foreign key constraint on both the object_ids and users tables. That ensures that the user_id always references an actual user and not an ID associated with something else. It also serves as an additional lock on entries in the users table giving assurance that those entries cannot be deleted.

Doing the Logging

Now, on to doing the actual audit logging. Since deleting users isn’t allowed, I just need to concern myself with inserts and updates. I’m sure you’ve seen where this is going. To the triggers!

First, for updates, I used the “after update” event and it looks like this:

DELIMITER //
CREATE OR REPLACE TRIGGER user_update2
    AFTER UPDATE ON users FOR EACH ROW
    BEGIN
        DECLARE dt DATETIME(6);
        SET dt = log_event('update', NEW.user_id);
        INSERT INTO audit_users SELECT *, dt FROM users WHERE user_id = NEW.user_id;
    END;//
DELIMITER ;

I’ll get to that log_event function in a bit. It adds the entry to the audit_meta table and returns the timestamp for the entry there. That is actually important because in order to match up an audit_meta entry with an audit_user entry, both the user_id and the timestamp are required which means that timestamp absolutely has to match. Then the INSERT statement simply copies the row from the users table with the extra timestamp column into the audit_users table. Note that this isn’t necessarily the best way to do this since it requires that the columns are all in the correct order and none are missing. However, the other way would require enumerating all the columns and explicitly setting them to the value from the new row.

There is one other aspect of this which is worth pointing out. This will yield an audit entry for every update, even if no changes occur. There isn’t a nice way to check of the NEW and OLD rows differ under MariaDB currently meaning it requires comparing every column with respect to NULLs. I may add that to the project at some point. For now, it is left as an exercise for the interested reader.

Now, for insertions, I needed to modify the original trigger on the “after insert” event. Here is the new insertion trigger:

DELIMITER //
CREATE OR REPLACE TRIGGER user_create2
    AFTER INSERT ON users FOR EACH ROW
    BEGIN
        DECLARE c INT;
        DECLARE dt DATETIME(6);
        IF NEW.user_status = 'active' THEN
            SELECT COUNT(user_id) FROM users WHERE
                user_login = NEW.user_login
                AND user_id <> NEW.user_id
                AND user_status = 'active'
                INTO c;
            IF c > 0 THEN
                SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Attempt to add active user login name that already exists';
            END IF;
        END IF;
        SET dt = log_event('insert', NEW.user_id);
        INSERT INTO audit_users SELECT *, dt FROM users WHERE user_id = NEW.user_id;
    END;//
DELIMITER ;

It should be fairly obvious what was added here since it looks basically identical to the update trigger. Unlike for the update case, there will only ever be one entry for insertion since you cannot insert the same row twice.

That sorts out the audit_users table handling. Now for the log_event function which handles the audit_meta table, which is as follows:

DELIMITER //
CREATE OR REPLACE FUNCTION log_event(
        etype VARCHAR(25) CHARACTER SET ascii COLLATE ascii_bin,
        oid VARCHAR(50) CHARACTER SET ascii COLLATE ascii_bin
    ) RETURNS DATETIME(6)
    NOT DETERMINISTIC MODIFIES SQL DATA SQL SECURITY INVOKER
    BEGIN
        DECLARE dt DATETIME(6);
        DECLARE un VARCHAR(250);
        SET un = IF(ISNULL(@lwas_user), '<system>', @lwas_user);
        SET dt = CURRENT_TIMESTAMP(6);
        INSERT INTO audit_meta (audit_type, object_id, audit_time, audit_lwasuser) VALUES (etype, oid, dt, un);
        RETURN dt;
    END;//
DELIMITER ;

This function takes an event type and an object ID. it first obtains the actual system user, if it has been set. Note that it is stored in the @lwas_user variable for the connection, which will be NULL if it has not been set. Then it gets the current timestamp with microseconds. Once it has both of those, it inserts the relevant information into the meta table then returns the timestamp.

It’s worth observing that if the contents of @lwas_user is not a valid user_id, or if the type or object ID are invalid, this will raise a constraint violation which will cause the entire transaction to roll back. In this particular project, that feature is desirable, but it may not be for yours.

It seems sensible that a stored routine might be used to set the @lwas_user variable in a manner that checks it against the users table and raises an error if it doesn’t match.

Conclusion

All of the bits above and in the previous two parts have been tested on MariaDB 10.6 and work as expected. I am not, however, under any illusions that this is a good way to do this, nor that this implementation is particular good. Take that as a “buyer beware” caution. I will be using this structure, or something evolved from it, in my particular project, but you probably want something different.

That said, I thought I would share this in the event that anyone finds it useful.

Finally, before you post comments saying I should use this or that SQL construction instead, and I can hear you typing furiously already, make sure your suggestions are valid for MariaDB because that is the database management system I am using. I am aware that certain other DBMS have constructions that can accomplish some of this better. Feel free to share things that could be done better on another DBMS, but make sure to indicate that to avoid causing confusion.

Anyway, that’s all for this little adventure for the moment. I may share more of the development process for this project down the road.

Leave a Reply

Your email address will not be published. Required fields are marked *