General Registration#

class smoothcrawler_cluster.register.Register(name: CrawlerName, path: MetaDataPath, metadata_opts_callback: MetaDataOpt, lock: DistributedLock)[source]#

General registration

This registration strategy just register all needed meta-data objects directly to crawler cluster.

Parameters:
  • name (CrawlerName) – The data object CrawlerName which provides some attribute like crawler instance’s name or ID, etc.

  • path (Type[MetaDataPath]) – The objects which has all meta-data object’s path property.

  • metadata_opts_callback (MetaDataOpt) – The data object MetaDataOpt which provides multiple callback functions about getting and setting meta-data.

  • lock (DistributedLock) – The adapter of distributed lock.

metadata(runner: int, backup: int, ensure: bool = False, ensure_timeout: int = 3, ensure_wait: float = 0.5, update_time: float = None, update_timeout: float = None, heart_rhythm_timeout: int = None, time_format: str = None) None[source]#
Parameters:
  • runner (int) – The amount of crawler role RUNNER.

  • backup (int) – The amount of crawler role BACKUP RUNNER.

  • ensure (bool) – If it’s True, it would guarantee the value of register meta-data processing is satisfied of size of GroupState.current_crawler is equal to the total of runner and backup, and this crawler name must be in it.

  • ensure_timeout (int) – The times of timeout to guarantee the register meta-data processing finish. Default value is 3.

  • ensure_wait (float) – How long to wait between every checking. Default value is 0.5 (unit is second).

  • update_time (float) – The time frequency to update heartbeat info, i.g., if value is ‘2’, it would update heartbeat info every 2 seconds. The unit is seconds.

  • update_timeout (float) – The timeout value of updating, i.g., if value is ‘3’, it is time out if it doesn’t to update heartbeat info exceeds 3 seconds. The unit is seconds.

  • heart_rhythm_timeout (int) – The threshold of timeout times to judge it is dead, i.g., if value is ‘3’ and the updating timeout exceeds 3 times, it would be marked as ‘Dead_<Role>’ (like ‘Dead_Runner’ or ‘Dead_Backup’).

  • time_format (str) – The time format. This format rule is same as ‘datetime’.

Returns:

None.

group_state(runner: int, backup: int, ensure: bool = False, ensure_timeout: int = 3, ensure_wait: float = 0.5) None[source]#

Register meta-data GroupState to crawler cluster.

Parameters:
  • runner (int) – The number of crawler to run task. This value is equal to attribute GroupState.total_runner.

  • backup (int) – The number of crawler to check all crawler runner is alive or not and standby to activate by itself to be another runner if anyone is dead. This value is equal to attribute GroupState.total_backup.

  • ensure (bool) – If it’s True, it would guarantee the value of register meta-data processing is satisfied of size of GroupState.current_crawler is equal to the total of runner and backup, and this crawler name must be in it.

  • ensure_timeout (int) – The times of timeout to guarantee the register meta-data processing finish. Default value is 3.

  • ensure_wait (float) – How long to wait between every checking. Default value is 0.5 (unit is second).

Returns:

None.

node_state() None[source]#

Register meta-data NodeState to crawler cluster.

Returns:

None

task() None[source]#

Register meta-data Task to crawler cluster.

Returns:

None

heartbeat(update_time: float = None, update_timeout: float = None, heart_rhythm_timeout: int = None, time_format: str = None) None[source]#

Register meta-data Heartbeat to crawler cluster.

Parameters:
  • update_time (float) – The time frequency to update heartbeat info, i.g., if value is ‘2’, it would update heartbeat info every 2 seconds. The unit is seconds.

  • update_timeout (float) – The timeout value of updating, i.g., if value is ‘3’, it is time out if it doesn’t to update heartbeat info exceeds 3 seconds. The unit is seconds.

  • heart_rhythm_timeout (int) – The threshold of timeout times to judge it is dead, i.g., if value is ‘3’ and the updating timeout exceeds 3 times, it would be marked as ‘Dead_<Role>’ (like ‘Dead_Runner’ or ‘Dead_Backup’).

  • time_format (str) – The time format. This format rule is same as ‘datetime’.

Returns:

None