Dispatcher#
Dispatcher for dispatching the workflows
This module is a dispatcher module, which would help outside caller to get the workflow they need to run in cluster.
In SmoothCrawler-Cluster, every role has its own responsibility to do many things. The workflow module let each role’s processes to be individual to manage and maintain them, and dispatcher module would help caller to get the process (in generally, it is BaseRoleWorkflow type object) from workflow module, and they could use the workflow very easily as following code:
workflow = WorkflowDispatcher.dispatch(<crawler's current role>)
workflow.run(timer=<CrawlerTimer object>)
Therefore, it also could control what running strategy it should be in cluster by this dispatcher module objects.
New in version 0.2.0.
Workflow Dispatcher#
- class smoothcrawler_cluster.crawler.dispatcher.WorkflowDispatcher(name: CrawlerName, path: MetaDataPath, metadata_opts_callback: MetaDataOpt, lock: DistributedLock, crawler_process_callback: Callable)[source]#
Dispatcher for each SmoothCrawler-Cluster role to get the role’s workflow
This object is a dispatcher of dispatching to instantiate different workflow object by different roles. Each role in SmoothCrawler-Cluster has their own responsibilities and BaseRunStrategyByCrawlerRole family integrates all the jobs into different objects as different single workflow. CrawlerRoleDispatcher would dispatch to generate their own workflow object.
- Parameters:
name (CrawlerName) – The data object CrawlerName which provides some attribute like crawler instance’s name or ID, etc.
path (Type[MetaDataPath]) – The objects which has all meta-data object’s path property.
metadata_opts_callback (MetaDataOpt) – The data object MetaDataOpt which provides multiple callback functions about getting and setting meta-data.
lock (DistributedLock) – The adapter of distributed lock.
crawler_process_callback (Callable) – The callback function about running the crawler core processes.
- dispatch(role: str | CrawlerRole) BaseRoleWorkflow | None[source]#
Dispatch to generate the specific workflow object by the argument option.
- Parameters:
role (Union[str, CrawlerRole]) – The crawler instance’s role in SmoothCrawler-Cluster.
- Returns:
It would return BaseRoleWorkflow type object. Below are the mapping table of role with its workflow object:
Role
Workflow Object
Runner
RunnerWorkflow
Primary Backup Runner
PrimaryBackupRunnerWorkflow
Secondary Backup Runner
SecondaryBackupRunnerWorkflow
- Raises:
NotImplementedError – The role is not CrawlerStateRole type.
CrawlerIsDeadError – The current crawler instance is dead.
- heartbeat() HeartbeatUpdatingWorkflow[source]#
Dispatch to a workflow updates heartbeat.
- Returns:
It would return HeartbeatUpdatingWorkflow type instance.