Dynamics 365 Finance and Operations workload is typically memory intensive, stateful, and SQL-bound. Work involves fetching data from the database, applying business transformations, and then committing it back to the database. A transaction block can include each work unit, starting with ttsbegin and ending with ttscommit.
If a SQL connection loss happens in the middle of a transaction, it leads to transaction rollback (ttsabort) causing it to fail and hence loss of work. This transaction can be initiated from:
- An interactive browser session, say when a user clicks on button on a form.
- O-Data/custom service call to an end point in F&O or an external service call.
- Batch processes trigger a transaction/unit of work, including DMF jobs.
What is TransientSqlConnectionError X++ exception?
A brief connection loss to the Azure SQL database (0.01%) can last up to few milliseconds or seconds is known as Transient connection loss to SQL database and it leads to TransientSqlConnectionError exception.
This could happen due to unplanned outages from the SQL side, which can be a failover, DNS issues, or a planned outage on the SQL side.
- Interactive session: If the connection loss happens in an interactive UI session, it can be easily recovered by refreshing the page/form and starting the process again.
- Batch mode: This connection loss, unless specifically handled leads to Batch job ending in ‘Error’ state.
Error messages during TransientSqlConnectionError:
- [Microsoft] [ODBC Driver 17 for SQL Server] Unspecified error occurred on SQL Server. Connection may have been terminated by the server.
- Cannot select a record in [TableName]. There is a temporary problem with the connection to the database. Please retry later.
- Cannot execute a stored procedure. There is a temporary problem with the connection to the database. Please retry later.
- Batch task failed: Cannot execute the required database operation. There is a temporary problem with the connection to the database. Please retry later.
Note: Not all SQL connection errors reported in F&O telemetry are SQL availability issues.
Sometimes long running SQL queries/blocking transactions may be killed by Microsoft auto healing and monitoring tools like DAMS resulting in similar signature errors as above. This leads to operations not being suspended.
How to handle Transient SQL connection loss?
Ways to manage:
- Use Try-catch block to catch TransientSqlConnectionError exception and then retry the operation.
- Implement BatchRetryable interface with isRetryable = TRUE.
1. Try – Catch & retry the TransientSqlConnectionError exception –> Traditional way
Wrap the transaction in a try-catch block and then retry the outermost transaction or retry application code logic in a new session. This approach has existed since PU 21. See code snippet below:
Fig 1: Code snippet depicting traditional approach
a. Example of the traditional approach in standard application
Checkout standard method: DataMaintenanceActionDataPopulation::filleTempDB()
Fig 2: Standard code depicting traditional approach
b. Challenges with the try-catch retry approach
- Write the same boiler plate code every time such exception needs to be handled. This includes:
- Explicit catch block to catch the exception.
- Evaluate if retryCount < MaxRetries.
- Pause the execution by adding sleep (no. of milliseconds) logic.
- And then retry the process.
- Less uptake of this approach by Developers because of the above code.
- As a best practice, moving to cloud leads to increased responsibility to handle such exceptions. TabularGrid (Grid)
2. BatchRetryable interface
To enable automatic retries of batch jobs during Transient SQL connections loss, Microsoft has introduced isRetryable flag as a part of BatchRetryable interface in v10.0.18.
- This flag is used by Batch framework to determine if the individual batch job should be automatically retried. Hence, the batch task to be retried upon Transient SQL connection error should be marked with isRetryable = TRUE.
- Evaluation should be done beforehand to check if the implemented batch task/job is safe to retry, before implementing this interface and setting the flag isRetryable = TRUE.
- This action is perfored to check if the operation being performed in the batch task/job is idempotent, i.e., it would give the same result even after multiple retries.
Fig 3: Transition Timeline of standard batch jobs to BatchRetryableInterface performed by MS over PU updates
Fig 4: Standard BatchRetryable interface class
a. How to implement BatchRetryable interface?
To enable automatic retries, follow the below steps:
- Implement ‘BatchRetryable’ interface in your custom batch class.
- Override isRetryable() method to return true.
This interface can be implemented at the following locations:
- SysOperationServiceController extended classes in Suspiration framework.
- RunBaseBatch extended batch classes.
- SrsReportRunController extended report controller classes.
b. Example of BatchReryable interface implementation in standard application
Checkout standard class: VendInvoiceAutomationMatchingController below.
After adding all the missing components, the form design will look like this:
Fig 5: Example BatchRetryable implemented standard class
c. Parameters added in Batch task for retries as part of BatchRetryableFramework
Two parameters are available to track the retries for Batch job/task.
- Actual retries: To get the current value of the retry count for Batch job/task.
- Maximum retries: To get the maximum value of the retry count for Batch job/task.
Fig 6: Parameters added in Batch task/job for retries
3. How to test Batch retries in Cloud hosted Dev environments?
a. Stop/Start, Pause/Resume OR Restart the SQL Server (MSSQLSERVER) service.
Fig 7: Options available on Windows Services app
b. Use following commands to stop/start SQL server service:
net stop mssqlserver
net start mssqlserver
Fig 8: Commands in Command Prompt
c. Stop/Start, Pause/Resume OR Restart the SQL Server from SSMS.
Fig 9: Options available in SQL server management studio (SSMS)
Batch class configuration overrides
To unregister our batch class from automatic retries, in case of Transient SQL connection loss, at runtime without making any code changes, we can make use of the form: ‘Batch class configuration overrides.’
Path: System administration > Setup > Batch class configuration overrides.
A new record can be created for the desired batch class and the ‘isRetryable’ check box should be unchecked.
By default, this form is blank for newly deployed environment. This form is only supported for marking batch class ‘isRetryable’ = FALSE and not vice-versa. It is only used in the case where we implement Batch retries via ‘BatchRetryable’ interface.
1. Conditions for batch class to appear in lookup on ‘Batch class configuration overrides’ form
a. RunBaseBatch extended batch classes
i. Must implement ‘Batchable’ interface. RunBaseBatch implements ‘Batchable.’
ii. Either override method: canGoBatchJournal() & return true OR batch class must be decorated with [SysOperationJournaledParametersAttribute(true)] attribute.
b. SysOperationServiceController extended classes OR SrsReportRunController extended report controller classes
i. Must implement ‘Batchable’ interface. Both SysOperationController & SrsReportRunController implement ‘Batchable.’
ii. Batch class must be decorated with [SysOperationJournaledParametersAttribute(true)] attribute.
Technical Demonstrations
Let us now check how Batch retries can help us in retrying Batch job/task in case of ‘TransientSqlConnectionError’ & ‘Error’ exception, with the help of some examples.
We will be using the Sys-Operation framework in the following demonstrations.
We have the following four classes:
Controller: BatchRetryableDemoControllerWithIsReryableFlagTrue,
BatchRetryableDemoControllerWithNoRetry
Contract: BatchRetryableDemoContract
Service: BatchRetryableDemoService
1. Controller
a. BatchRetryableDemoControllerWithIsReryableFlagTrue
This class implements ‘BatchRetryable’ interface and overrides isRetryable() method to return true.
Fig 10: BatchRetryableDemoControllerWithIsReryableFlagTrue
b. BatchRetryableDemoControllerWithNoRetry
This is a normal controller class not implementing ‘BatchRetryable’ interface.
Fig 11: BatchRetryableDemoControllerWithNoRetry
2. Contract
a. BatchRetryableDemoContract
This class is used to declare the following input parameters for our batch class:
i. Input parameter A
ii. Input parameter B
iii. Iterations – Count of number of iterations to perform.
iv. Delay in milliseconds – No. of milliseconds to pause the execution for, via sleep logic.
v. Maximum retries – The max. no. of times it is desired to retry the batch job/task.
Fig 12: BatchRetryableDemoContract
3. Service
a. BatchRetryableDemoService
i. Method: performOperationWithTransientSQLErrorExcption()
Fig 13: BatchRetryableDemoService::performOperationWithTransientSQLErrorExcption()
The above method performs a calculation on the input parameter A & B and assigns the calculated values to the respective output variables. In ideal scenario, the output values are inserted and insert logic is executed.
But in case of Batch retry, if the output values already exist in the table: ‘BatchRetryableDemoTable,’ then the update logic is executed. After every iteration, the execution is paused for the desired number of milliseconds to introduce the Transient connection losses to the SQL database for demonstration purpose.
i. Method: performOperationWithThrowErrorException()
Fig 14: BatchRetryableDemoService::performOperationWithThrowErrorException()
iii. Method: performNonRetryableOperationWithTransientSQLErrorExcption()
Again, this method performs the same operation as the performOperationWithTransientSQLErrorExcption() method. It differs in the update operation performed for Batch retries.
In case of Batch retry, if the output values already exist in the table: ‘BatchRetryableDemoTable,’ then the update logic is executed. In this case, the calculated value for output parameters A & B are added to the original values for these parameters.
This makes the update operation non-qualified for Batch retries as the final value for output parameters -> A & B change in every retry and hence, this operation is not idempotent.
Like earlier, after every iteration, the execution is paused for the desired no. of milliseconds to introduce the Transient connection losses to the SQL database for demonstration purpose.
Fig 15: BatchRetryableDemoService::performNonRetryableOperationWithTransientSQLErrorExcption()
Results for Technical Demonstrations
Please refer to the document below for the complete results of all the possible scenarios implemented for demonstrations.
Results for Technical Demonstrations
Important Observations
Post implementation, we found some interesting results. Below are the important observations:
1. BatchRetryable implemented and isRetryable = TRUE
- Here, the operation gets retried multiple times with proper logging of error messages in Infolog for each retry for TransientSqlConnectionError but not for ‘Error’ exception (error message is only logged for last retry when all retries have failed).
- ‘Actual retries’ & ‘Maximum retries’ parameters in Batch task do not behave as per expectation in case of ‘TransientSqlConnectionError’ exception.
- For ‘Error’ exception, they behave as per expectation.
- Make sure batch job is safe to retry before implementing this interface and marking the flag isRetryable = TRUE.
2. BatchRetryable NOT implemented
- If parameter – ‘Maximum retries’! = 0 in Batch task, batch job/task is retried (Max – 5 times) irrespective of the type of exception.
- However, there is no logging of intermittent error messages for each retry. Error message is only logged for last retry when all retries have failed.
- Also ‘Actual retries’ & ‘Maximum retries’ parameters in Batch task behave as per expectation.
3. Batch class configuration overrides
a. Setup examples
i. CustPaymentJournalController
This report controller class implements BatchRetryable interface & overrides isRetryable() method to return False. Review the screenshots below to understand the results when adding a record for this class in the ‘BatchConfigurationOverrides’ form and marking or unmarking the isRetryable flag.
Fig 16: ‘Batch class configuration overrides’ setup form configs for CustPaymentJournalController
ii. VendOnHolUpdate
This RunBaseBatch class implements BatchRetryable interface & overrides isRetryable() method to return True. Check the below screenshots to understand the results if you add the record for this class in ‘BatchConfigurationOverrides’ form and either mark/unmark isRetryable flag.
Fig 17: ‘Batch class configuration overrides’ setup form configs for VendOnHolUpdate
b. Comparison of Error or Info log messages for TransientSqlConnectionError exception
i. First Error message
Fig 18: TransientSqlConnectionError – First Error/ Infolog message
The system generates the above error message in the following cases:
- The system implements BatchRetryable with isRetryable = TRUE and applies Batch class configuration overrides with ‘Maximum retries’ set to 0.
- BatchRetryable NOT implemented with ‘Maximum retries’ = 0.
ii. Second error message
Fig 19: TransientSqlConnectionError – Second Error/ Infolog message
The system generates the above error/Infolog message when it implements BatchRetryable with isRetryable = TRUE.
Useful code snippets
1. RunBaseBatch extended batch classes
Fig 20: Useful code snippets – RunBaseBatch extended batch classes
2. SysOperationServiceController extended batch classes
Fig 21: Useful code snippets – SysOperationServiceController extended batch classes
Key Learnings
- Do not use catch block to handle the type of exceptions for which you want to retry the Batch job/task execution. For e.g. – do not use catch blocks to catch System. Exception and TransientSqlConnectionError exception if you want to retry execution in case of Transient SQL connection loss.
- During a batch job/task retry, the system retries it from the very beginning. For e.g. – Say during insertion of 20 records, if the TransientSqlConnectionError occurs at the 15th record then the retry starts from the 1st record and not from the record on which the error occurred (15th). Please make sure batch job is safe to retry beforehand, it must be idempotent.
- The system can set the maximum value for the ‘Maximum retries’ parameter in the Batch task to five.
- The system retries the batch job/task in the following cases, regardless of the type of exception (TransientSQLConnectionError or Error exception):
- BatchRetryable implemented with isRetryable = TRUE.
- BatchRetryable implemented with isRetryable = TRUE, batch class unregistered in Batch class config overrides but ‘Maximum retries’! = 0.
- BatchRetryable not implemented and ‘Maximum retries’! = 0.
- Similarly, the system does not retry the batch job or task in the following cases, regardless of the type of exception:
- BatchRetryable implemented with isRetryable = TRUE, batch class unregistered in Batch class config overrides and ‘Maximum retries’ = 0.
- BatchRetryable not implemented and ‘Maximum retries’ = 0.
- If the Batch job/task is designed to run in multi-threading, the system must implement the Batch Retryable interface with isRetryable = TRUE for both the main and task controllers.
- If the system implements the Batch Retryable interface only at the main controller and not at the task controller, the batch task will fail in case of a TransientSqlConnectionError exception.
- Similarly, if the system implements the interface at the task controller but not at the main controller, the task will retry upon encountering a TransientSqlConnectionError and execute successfully, causing the batch job to end in ‘Ended’ status. However, if the exception occurs before the main controller creates the task(s), the batch job will fail and end in ‘Error’ status.
- If the system retries the batch job or task, it will take more time to complete execution. Developers can optimize the process by designing the outer query to filter the data appropriately, ensuring that the system does not process records again during retries.
- A smaller Transaction block reduces the work lost in case of a transient failure. Together with the optimized query, this would not increase the total execution time drastically in case of retry.
- The Batch Retryable framework controls the interval between retries, and users cannot change it.
- Developers should carefully analyze the batch class that makes external service calls before implementing batch retries.
- Developers must maintain a database log to track what has already been sent or received from the service; otherwise, enabling batch retries may result in duplicate data being sent or received.
DMF behavior during Batch processing
Transient SQL connection loss can also occur for Data management framework jobs (imports, exports etc.). DMF handles the different operation stages in separate ways.
1. Two operation stages
a. Source to Staging
If the exception occurs during ‘Source to Staging’ operation of DMF job either:
- Before ‘Staging status’ changes to executing or
- Before ‘Target status’ changes to executing,
then DMF execution fails, and Batch job ends in ‘Error’ status with error messages logged in Batch job Infolog.
b. Staging to Target
If the system encounters an exception during the ‘Staging to Target’ operation of a DMF job, it retries the operation. The system creates and updates records (in case of retries), generating multiple ‘Target stage’ logs in the DMF Execution log.
The system creates and updates records (in case of retries) and generates multiple ‘Target stage’ logs in the DMF Execution log.
However, there is no logging of error messages in DMF execution logs/batch job Infolog for when TransientSqlConnectionError occurred.
‘Actual retries’ & ‘Maximum retries’ parameters in Batch task do not behave as per expectation.
Also, DMF job takes longer to execute.
Implementations
Please refer to the embedded document for details about the error messages/info logs observed during both the stages.
Conclusion
With this we have come to the end of this blog. In this blog we have learnt:
- Transient SQL connection loss and how to handle it.
- Comparison of retries performed when implementing the BatchRetryable interface versus not implementing it, along with the comparison of Infolog/error messages.
- First, check if the batch task/job is idempotent before implementing retries. If it is idempotent, proceed with configuring the retries.
- ‘Batch class configuration overrides’ form and how to effectively use it.
- Track the retries with the parameters: ‘Actual retries’ and ‘Maximum retries.’
- DMF behavior during Batch processing.
About the expert
![]() |
Adeeshwar Jain, Senior Technical Consultant, OnActuate
Adeeshwar is a D365 certified professional with an experience of 5 years in the areas of upgrade & development of D365 F&O. He has experience working on AX 2012 & D365 implementation and upgrade projects. Adeeshwar has good knowledge of development of SSRS reports, Data Entities, workflows, implementing Dual write infrastructure in D365 F&O. He also has experience working with the Microsoft prescribed best practices and AI-powered no-code visual test automation tool: ‘Leapwork’. Adeeshwar has been successfully leading the public sector project as the sole Technical Consultant for the past 3 years. The “OneByte” blog series are technical how-to articles by OnActuate’s expert consultants covering technologies like Dayforce and Microsoft Business Applications (Microsoft Dynamics, Power Platform). Have a topic you’d like to learn more about? Email us at info@onactuate.com! |
Additionally, you can also read…
![]() OneByte: Custom Workspace in Dynamics 365 Finance & Operations |