Background
Components and Interactions¶
The workflow components and their interactions are illustrated in below figure.
The following explanation will focus on Use Case 1 in the flow chart, where a web page performs a request for execution.
- The user requests execution of a workflow in the Browser, which issues a request to the web server, IIS, where the Web API fetches the connection from the Connection Repository. The repository has information on where job requests are stored in the Jobs Repository and information on where the workflows are stored in the Workflow Repository.
- The request for a job is stored in the Jobs Repository.
- The Job Runner Service runs continuously and based on its own Worker Connection Repository it knows to monitor the same Jobs Repository where the Web API stored the request for a workflow execution. When the request arrives, it loads its Hosts Repository to identify what servers it can use to execute the workflow and
- issues the request to the Workflow Service on that server. The Workflow Service uses a separate executable to execute the workflow to contain any memory leaks that may arise. The Workflow Service Executer uses the Windows Workflow Foundation to execute the workflow.
- Throughout the execution, progress and messages are emitted back through the chain up to the Web API where logs are stored.
Above explained use is one use case. When workflows are scheduled to run automatically, the Browser box is replaced with a Workflow Executer, which is a simple executable capable of performing the same Web API request as the browser. This Executer is different from the Workflow Service Executer, which is an internal part of the service. The Workflow Executer is also able to execute workflows locally in-process, which is usually used when workflows need to be executed immediately without potentially queuing
The designer¶
The Designer used to build workflows allows executing workflows locally inside of the Designer in a similar way as when the executer is executing locally. The designer also allows executing workflows remotely on hosts. This execution accesses the host directly bypassing the queuing mechanism.
The Jobs Web API¶
The Jobs Web API serves as a RESTful access point to the Domain Services Job implementation. It allows managing the job repositories typically by inserting new job requests or allow querying the status of a specific job.
The Job Runner Service¶
The Job Runner service acts as a broker between job repositories and hosts for executing the jobs. When set up, it will automatically monitor multiple job repositories and react when new job requests appear. When a new job request appears, it will identify the host that the workflow should be executed on and maintain communication with the Workflow Service while the execution is performed. The Job Runner services is installed/uninstalled using bat files where it is deployed.
The Workflow Service¶
The workflow service is installed on the host and is responsible for accepting requests for executing workflows and issuing these to an internal execution mechanism the DHI.Service.Workflow.Executer. The execution is handled by this separate application to harden the system against memory leaks implemented through the workflows.
The Workflow Executer¶
The executer is used to either do local in memory execution or schedule workflows to be run through the Web API. The latter is often used with the Windows Task Scheduler to perform scheduled executions.
The Repositories¶
There are a series of storages/repositories in the system that holds data and binds the components together. Some repositories are used for configuration, for example the Connections repository and the Worker Connection repository. The Workflow repository is used for storage of workflow definitions while the Jobs repository is used partly as a message queue and partly as a status and logging mechanism for job/workflow execution.
Below, the individual repository types are described in more details.
Connections Repository¶
The connections repository is the general connections repository that sits part of the Web API configuring access to various data. When a JobServiceConnection is added, this couples a connection ID, below called "MyExecutionConnection", with several other sources of information.
Example:
- JobRepositoryType + JobRepositoryConnectionString: This is the repository of job entries. By default Domain Services comes with a JSON repository which is suited for testing purpose, but less suited for production. The PostgreSQL provider includes a job repository, which will automatically create its data model when pointed to a database. The standard PostgreSQL connection string can take an optional argument Table=public.SomeOtherTable to change the table to something different from the default Jobs.
- TaskRepositoryType + TaskRepositoryConnectionString: This is the workflow repository. Currently only a JSON based task repository exists
When a request comes in for a job in the Web API it used the TaskRepository to check that the workflow asked for exists and if it does, it inserts the job request in the JobRepository.
Worker Connections Repository¶
The worker connections repository contains the same structure as the connections repository on the web server and serves the same purpose of referencing other sources of information. The job runner will monitor all connections and their respective JobsRepositories for new requests. When a new request for a job execution is detected, it will use the task repository to get the workflow to be executed and send this to the host. The worker connections repository does reference several other repositories compared to the Web API connections file
Example:
- WorkerType + WorkerConnectionString: This is the actual functionality for executing workflows. Although a Provider.WF.Workerexists which allows for local in-process execution of workflows, this is typically not desirable in the Job Runner, which is why the RemoteWorker is used. The Remote worker communicates with the Workflow Service on the hosts using OWIN which is a self-hosting web service. Outbound to the hosts on port 7777 and the hosts will respond back to the inbuilt web server in the Remote Worker on port 7778. These are the default ports, but can be changed through the connection string.
- LoggerType + LoggerConnectionString: The Job Runner injects a logger into the worker so that the worker can perform detailed logging. Any progress in CodeActivities including internal messages emitted are sent back from the hosts and logged. The repository used here is a text file based logger implementation that results in a log file with the job id as file name.
Jobs Repository¶
As stated about the jobs repository typically used is the PostgreSQL implementation. The structure of the job entity is shown below in the JSON repository.
Example:
- AccountId: The ID of the user account that requested the execution
- HostId: The job is being executed on this host. When the job has been requested, this will not have been populated as it depends on the availability of execution hosts
- Status: The statuses are:
- Pending: This is the status a job is in after being inserted from e.g. the Web API
- In Progress: Once the Job Runner sends it to execution, the status changes to this
- Completed: When the execution is successfully completed
- Failed: When the execution failed
- Requested, Started and Finished: All date times in UTC where requested is set when the request comes in, started when the workflow starts and Finished when it finishes.
- TaskId: This is the id of the actual workflow being executed.
Workflow Repository¶
A workflow (task) repository is created and maintained through the designer and embeds the actually XAML workflow itself.
Example:
- Parameters: Any variable that is defined in the workflow is exposed as a parameter in the task so that it can be accessed and set from outside.
- Definition: The XAML workflow
Hosts Repository¶
The hosts repository serves as a list of servers that are available for remote execution
Example:
- Localhost: The Id of the host entity is the server name
- RunningJobsLimit: The number of simultaneous jobs that can be run on this particular host
- Priority: The priority of the host
When the job runner receives a request for job execution, it will cross reference what's running already on which hosts with what the individual hosts have of available slots and what their priority is. This means a new execution ends on the host that has capacity that has the highest priority. If no host has capacity then the job will remain pending until a job being executed completes which frees up a resource.
Executable configuration¶
Two of the applications in the system require configurations, both concerning logging.
Several in build logger repositories exist in Domain Services for logging, with the default implementation being to a JSON file. In production, the PostgreSQL logger implementation is more suited though. In the same way as the PostgreSQL Jobs Repository, the logger repository will unfold its data model in the database upon access and if the default table name of public.logging is not preferred, it can be changed through the optional Table parameter in the connection string.
DHI.Services.JobRunner.exe.config: The LoggerType and LoggerConnection string allow the Job Runner to notify of the state of the execution. Also, the CleaningTimerIntervalInMinutes should be set to zero to maintain history of job instances.
Example:
DHI.WorkflowExecuter.exe.config: If the job executer is used to do local execution as in Use Case 3, then the Job Runner is not included which means the notification of the state of execution is not present. The WorkflowExecuter fulfils this role by applying the same logging lines as for the job runner.
Example:
Logging¶
Two types of logging are used in the system. The overall progress and state are handled through the system. The PostgreSQL logging repository is used for the overall state from which DHI Polymer Elements natively knows the structure and can display it. Also, there is a detailed logging where the detailed logging is done for the actual workflow execution.
- Job Runner Service: Overall logging is done by the Job Runner Service where it informs when jobs start and finish either successfully or unsuccessfully. These log entries are inserted in the PostgreSQL logging repository
- Workflow Service: Detailed logging is done in the folder where the workflow service is installed a subfolder log is created automatically. In there a sub-folder, for each day yyyy-MM-dd is created in which a log file with the JobId as name is created. This includes all log information produced by CodeActivities during the execution and serves as a valuable resource for trouble shooting.
- Web API: The same detailed log information is transmitted back to the Web API from where the workflow execution was initiated. Here is ends up in a folder as defined in the LoggerConnectionString in the Worker Connection Repository. There are two reasons for the identical log information ending up on the web server. First, it allows inspecting the log information from the web and second, the host may not be there anymore.
- Workflow Executer: In the same way as with the Workflow Services detailed logging is done to a sub folder for each day yyyy-MM-dd is created in which a log file with the JobId contains detailed log information. In addition to that the Workflow Executer provides overall logging to the PostgreSQL repository when jobs start and finish either successfully or unsuccessfully.
Workflow Executer¶
For executing workflows in Use Case 2 and 3, the Workflow Executer is used. The Workflow Executer can run in either local mode or remote mode. The former causes immediate in process execution and is often used for time critical execution of workflows. The latter option performs a Web API request to a web server placing the workflow in a job queue. The arguments for the WorkflowExecuter can be recalled by running the application without any arguments as shown in the figure below
The main difference in the arguments for the two modes indicated by the -run argument is that run local points to a workflow repository on disk whereas run remote points to a web server.