February 14, 2017

More Trouble with Temp Files

The title of this post kind of gives away the plot before I even start the story.  But it does give me a chance to use this picture in my blog.


We update our HFM application metadata monthly.  We first deploy the changes in our development environment and test any required changes to member lists, rules, etc.  After the metadata is loaded into EPMA the HFM application has to be redeployed.  In this case the redeploy failed.  The message displayed was almost as useless as a tribble:

The custom error module does not recognize this error.
This is like something you would find on the DailyWTF, an error that tells you nothing more than it's an error.

In the Consolidation Administration tab was an error message that was a little more descriptive, but equally unenlightening.

Could not determine wsdl ports.
Searching the Oracle knowledge base and binggoolging turned up nothing useful.  In cases dealing with workspace we sometimes resolve unusual issues by clearing the browser cache and reopening the browser, but that didn't help.

The deployment failed at 6% which normally indicates a problem with EPMA.  Everything on the EPMA server looked normal and there were no obvious errors in the logs.  We tried restarting just the EPMA services which did not help.  We then restarted all the EPM services to close all connections, flush java caches, and try to clear up whatever was causing the error, but the redeploy still failed at the same step.

Then I checked the HFM server which in our case is a different host than EPMA.  The C: drive was almost full.  Because this is a dev environment I don't have any alerts configured so I didn't have advance warning of the problem.

Analysis

A handy tool for finding disk hogs is the sysinternals du utility.  It runs from the command line but the syntax is easy and you can use keyboard shortcuts to quickly drill down to find the problem folders.

While DU can scan a whole drive and find the hogging subfolders this can take a bit of time.  My strategy is to check just one level use the -L 1 paramenter, then drill down from there one level at a time.  This is usually a quicker way of finding the offending folder.

Here I look in the C:\Users folder because I suspect the problem is one of the profiles.  Clearly the problem is the profile taking over 5Gb with an account that starts with r and ends with v.

Using DOS shortcuts I hit the up arrow key to repeat the last command, add the backslash, type an [r] to start the username, use the [Tab] key to auto-fill the rest of the name, and press [Enter] to see results for the next level down.  Repeat this technique until you find the problem folder.

In this case it was the user profile of the service account that runs the HFM and other EPM servcies.  In the AppData\Local\Temp folder there were a bunch of temp files, many of them tens of megs in size.  After deleting all the temp files from previous years we freed up 5Gb of drive space and the deployment succeeded.


Even though this processing is handled by EPMA, there is still data being written to the HFM server.  Presumably this is so HFM has something to process once its turn in the deployment comes.

Conclusion

This is another episode where temp files don't get cleaned up after execution.  Also note that the .tmp files use the convention of a 4 hex digit as part of the name.  Had we not run out of drive space it is conceivable we would have run into name collision like we did with the Permanent Temporary Files.

This has become another task during the monthly maintenance where logs and other items get purged or truncated to keep the logs manageable and drive space clear.

What is curious in all of this is why aren't these temp files cleaned up as a matter of course?  While it makes sense to keep temp files around for troubleshooting if a process fails, surely whatever process generates these things could have a final step of cleaning up its droppings after it receives a success notice.  Even children know enough to clean up after their dog.