If you look at BTS performance blog (can be found here) then you know that you have to monitor the size of the spool table. And this is true.
Some long time ago (sorry, didn’t had time to blog about it that time) we have faced situation, when problems were caused not by multiple messages in the spool table, but rather a lot of service instances associated with BizTalk Isolated host.
We had a situation, when the application that was bound to BTS isolated service host stopped working. After long, long, long, long, long time of research together with MS PSS we have found out that it was due to >15 billion of service instances hanging there . Yes, it is true, not million - Billion (with 9 zeros).
When the SOAP receive port was moved to another isolated host - it started working again, but still there were more than 15 billion service instances in the DB. As you can imagine, DB size was also pretty impressive.
And then … another “project of the year” with MS PSS was started. Goal was to remove the service instances from DB in a supported way.
All utilities (like BtsMsgTerm) were failing with “Out of memory exception” (I would expect that) and scripts were failing by not removing instances any way. At the end I took the initiative.
First - the SQL script that verifies the situation (to see if the instances are getting terminated, when the script runs):
DECLARE @cActive bigint
DECLARE @cSuspended bigint
SELECT @cActive = count(*) FROM dbo.BizTalkserverIsolatedHostQ WITH (NOLOCK);
SELECT @cSuspended = count(*) FROM dbo.BizTalkserverIsolatedHostQ_Suspended WITH (NOLOCK);
SELECT @cActive+@cSuspended
After some research with BizTalk help I have end up with some VB script that terminates the instances in batches, but there were two issues:
a) “SELECT * FROM MSBTS_ServiceInstance WHERE HostName = ‘BizTalkServerIsolatedHost’” was not returning all instances (I would not expect someone returning 15 billion objects either).
b) Sometimes when looping through collection the same instance at the end was returned zillion of times (strange one)
Therefore I had to restart the script again and verify if the next instance received from collection is not the same as previous one.
So, after these modifications, with the help of VBScripts famous On Error Resume Next and simple command line script - I’ve got rid of those billions of messages in a few days in a supported way.
Of course, the issue was resolved step by step: first by the SP1. If You have not applied for some reason - you definitelly should do it NOW.
Later, there was another KB: http://support.microsoft.com/kb/908157
What is the morale of this post:
a) Check the database sizes
b) Monitor important tables (e.g. spool)
c) Add service instance counter to your monitored objects
If you are experiencing something like that and want to get the script, just give me short e-mail, but in general you should do the job by taking the sample and modifying it appropriatelly.