CrawlerRole#

class smoothcrawler_cluster.model.metadata_enum.CrawlerRole(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

The crawler role in crawler cluster system

This role is NOT the role of SmoothCrawler-AppIntegration. They’re very different. The role in SmoothCrawler-AppIntegration means source site (or producer) of application or processor site (or consumer) of application. But the role in meta-data in SmoothCrawler-Cluster means it is active runner to run task or backup of that active runner.

For SmoothCrawler-Cluster realm, it has 4 different roles:

  • Initial

  • Runner

  • Backup Runner

  • Dead Runner

  • Dead Backup Runner

INITIAL = 'initial'#

It only have this state in instantiating process of crawler and before runner election. And it would change this option to Runner or Backup_Runner after done runner election.

RUNNER = 'runner'#

Literally, Runner role is the major element to run web spider tasks.

BACKUP_RUNNER = 'backup-runner'#

Backup Runner role is the backup of Runner. It would activate (base on the setting, it may activate immediately) and run the web spider task if it still not finish.

A Backup Runner would keep checking the heartbeat info of Runner, standby and ready to run in anytime for any one of Runner does not keep updating its own heartbeat info (and it would turn to Dead Runner at that time).

DEAD_RUNNER = 'dead-runner'#

If Runner cannot work finely, like the entire VM be shutdown where the crawler runtime environment in. It would turn to be Dead Runner from Runner. In other words, it must to be Dead Runner if it cannot keep updating its own heartbeat info.

DEAD_BACKUP_RUNNER = 'dead-backup-runner'#

Dead Backup Runner is same as Dead Runner but it’s for Backup Runner.