August 30, 2016

Automate Deleting Taskflow Log

We use HFM taskflows.  A lot.  When a taskflow runs it generates log information so you can check on the status of the execution and examine any errors to effect remediation.

But like the EPMA logs, these log entries stay around forever until you delete them.  On the Task Flow Status page you see each task flow status on a line with a check box.  To delete a log entry you click the check box to select the entry and click the Delete button.  Child's play.


Since these are check boxes you would think, as a standard GUI practice, you could check multiple boxes and click the delete button to delete multiple log records.  You would be wrong.  You have to select the log records one at a time to delete them.  Isn't that convenient?

So if you don't constantly do maintenance you wind up with a bunch of useless information in your log viewer.  Searches take longer and if you want to delete entries it takes a lot of time to complete.  What's a mother to do?

Analysis

We use the taskflows to monkey with our HFM applications.  Taskflows are found in the Consolidation Administration tab so you might think the information for taskflows is in the HFM table.  But we can also do EPMA related things like run data synchronizations and redploy applications.  So where do we find the tables that manage the taskflows and logs?  In the Shared Services database, of course.

It took me some digging to figure that out, but the first clue was finding a table named [ces_wf_instances].  SELECTing from that tables shows the field process_name with the names of my taskflows.  Other useful fields include starttime, a datetime field that we will use to find all the old log entries, and workflow_id and taskid which are used as references to other tables.  workflow_id is the taskid with the string "wf-" as a prefix.  workflow_id is also part of the primary key so it is unique across the tables.



The nice thing is that taskid values are always in ascending order.  I don't have any way to prove this but I suspect that taskid is based at least in part the datetime of the activity.  This means that when we process the other tables we can define a cutoff date, find the first taskid in the [ces_wf_instances] for that date, then delete anything that has a taskid less than that.  The not so nice thing is that in all the other tables the taskid field is named task_id.



What other tables will we process?  There are 13 tables with the ces_ prefix but some of them are empty and others are unrelated to logging.  The [ces_tasks] table lists tasks within the taskflow, [ces_messages] contains more detailed messages about the task.  Both tables contain a field named objectvalue which is an image blob.  This is a large hex string that gets processed to show its part of the workflow status.  Both of these tables are referenced by task_id which, like in [ces_wf_instances], is sequential, so we can still use the strategy of deleting records less than the target task_id.



There are also the [ces_participant] and [ces_participant_events] tables which have details on the steps in the taskflow.  Like the [ces_tasks] and [ces_messages] tables these tables have the objectvalue image blob.  The [ces_participant] is referenced by workflow_id which matches the field in the [ces_wf_instances] table.  There is also a field named participant_id which contains the workflow_id, the user, and the name of the step in the taskflow.  This participant_id is a foreign key in the [ces_participant_events] log.  Because there is a foreign key constraint between the tables when we delete records from the [ces_participant] table the records with the matching participant_id in the [ces_participant_events] table also get deleted.

Solution

So now that we know all the players, let's draw up a play to get us to the end zone.  We will build a SQL script that defines our cutoff date, find the last taskid associated with that date, then delete records from the other tables where the taskid is less than the one for the target date.

There are different tactics to define the cut off date.  If we are running this process manually maybe we just want to define the date with a static string:
DECLARE @PriorDate DateTime;
SET     @PriorDate = '2015-01-01';
But if we want to create a scheduled process we can calculate the desired cut off date.  For example, the following code will use the DATEADD() function to set the cut off date as 3 months prior to the current date:
DECLARE @PriorDate DateTime;
SET     @PriorDate = DATEADD(M,-3,GetDate());

Now that we have the date, we need to find the target taskid.  The taskid is sorted sequentially so we take the TOP 1 taskid where the starttime is greater than or equal the cut off date.  Note that we use greater than or equal to (>=) rather than just equal to (=) in case there was no taskflow execution on the cut off date.  This gives us a taskid where anything less will be prior to our cutoff date.  Also note that we make the @LastID variable nvarchar(100) which matches the data type used in the [ces_wf_instances] table.
DECLARE @LastID nvarchar(100);
SET     @LastID = (
  SELECT TOP 1 taskid
FROM [ces_wf_instances]
WHERE starttime >= @PriorDate
)

For the [ces_messages] and [ces_tasks] tables we can do a simple delete where task_id is less than our target taskid.  But for the [ces_participant] table we need to get the workflow_id for all of the records prior to our cut off date.  One way to do this is to use the IN clause and build a list of workflow_id from the [ces_wf_instances] table where the starttime is less than the cut off date.
DELETE FROM [ces_participant]
WHERE workflow_id IN (
select workflow_id
from [ces_wf_instances]
where starttime < @PriorDate)

Remember that because of the foreign key constraint deleting records from [ces_participant] will also delete the associated records from the [ces_participant_events] table.  Also note that because we need the [ces_wf_instances] table to do this delete we have to make that the last table from which we delete records.

The entire SQL script is:

-- Cut off date
DECLARE @PriorDate DateTime;
SET     @PriorDate = DATEADD(M,-3,GetDate());
--SET     @PriorDate = '2015-01-01'
PRINT 'Cut off date = '+convert(VarChar(32),@PriorDate)

-- Get the first datskid of the cutoff date
DECLARE @LastID nvarchar(100);
SET     @LastID = (
select top 1 taskid
from [ces_wf_instances]
where starttime >= @PriorDate
  )
PRINT 'Last TaskID = '+@LastID

-- These tables are simple deletes
DELETE from [ces_messages] where task_id < @LastID
DELETE from [ces_tasks]    where task_id < @LastID

-- [ces_participant] table has a foreign key constraint against [ces_participant_event]
-- table.  Deleting records from [ces_participant] deletes records with the same
-- particpant_id from the [ces_participant_event].  So to process these two tables we
-- get a list of workflow_ids where starttime is less than @PriorDate
DELETE FROM [ces_participant]
WHERE workflow_id IN (
select workflow_id
from [ces_wf_instances]
where starttime < @PriorDate
   )

-- We could also have deleted based on task_id
DELETE FROM [ces_wf_instances]
WHERE starttime < @PriorDate

This gives us a fairly simple SQL script we can schedule as a SQL agent job or any other means appropriate for the environment.  No manual steps, no muss, no fuss, and we keep the taskflow log and the underlying tables lean and manageable.

No comments:

Post a Comment