Model#

Model: data classes or enum objects for package

In a crawler cluster/decentralized system or distributed system, it must have a lot of meta-data objects to transfer to each other instances to get information about state in the entire system to know what happen and what things they should do to processing that. Here are the data classes and enum objects for recording, serialize, deserialize, etc, these information.

class smoothcrawler_cluster.model.Empty[source]#

Empty meta-data objects

Generate an empty meta-data objects without any values.

static group_state() GroupState[source]#

Generate an empty meta-data object GroupState.

Returns:

An empty GroupState meta-data object.

Return type:

GroupState

static node_state() NodeState[source]#

Generate an empty meta-data object NodeState.

Returns:

An empty NodeState meta-data object.

Return type:

NodeState

static task() Task[source]#

Generate an empty meta-data object Task.

Returns:

An empty Task meta-data object.

Return type:

Task

static heartbeat() Heartbeat[source]#

Generate an empty meta-data object Heartbeat.

Returns:

An empty Heartbeat meta-data object.

Return type:

Heartbeat

class smoothcrawler_cluster.model.Initial[source]#

Initialize a meta-data object with values

Initial meta-data object with values at one or more multiple specific different options.

static group_state(crawler_name: str, total_crawler: int, total_runner: int, total_backup: int, standby_id: str = '0', current_crawler: List[str] = [], current_runner: List[str] = [], current_backup: List[str] = [], fail_crawler: List[str] = [], fail_runner: List[str] = [], fail_backup: List[str] = []) GroupState[source]#

Initialize a meta-data object GroupState with values.

Parameters:
  • crawler_name (str) – Crawler instance’s name.

  • total_crawler (int) – Total amount of crawler includes every role.

  • total_runner (int) – Total amount of crawler which is Runner.

  • total_backup (int) – Total amount of crawler which is Backup_Runner.

  • standby_id (str) – The standby ID. It should be the index of crawler name.

  • current_crawler (list of str) – A list of total crawler instance’s name includes every role.

  • current_runner (list of str) – A list of total crawler instance’s name which is Runner.

  • current_backup (list of str) – A list of total crawler instance’s name which is Backup_Runner.

  • fail_crawler (list of str) – A list of total crawler instance’s name which is dead state.

  • fail_runner (list of str) – A list of total crawler instance’s name which is Dead_Runner.

  • fail_backup (list of str) – A list of total crawler instance’s name which is Dead_Backup_Runner.

Returns:

An GroupState meta-data object with value(s).

Return type:

GroupState

static node_state(group: str = None, role: CrawlerRole = None) NodeState[source]#

Initialize a meta-data object NodeState with values.

Parameters:
  • group (str) – The name of group which the current crawler instance belong to.

  • role (CrawlerRole) – The role of current crawler instance.

Returns:

An NodeState meta-data object with value(s).

Return type:

NodeState

static task(running_content: List[dict | RunningContent] = [], cookie: dict = {}, authorization: dict = {}, in_progressing_id: str = '-1', running_result: dict | RunningResult = None, running_state: TaskState = None, result_detail: List[dict | ResultDetail] = []) Task[source]#

Initialize a meta-data object Task with values.

Parameters:
  • running_content (List[Union[dict, RunningContent]]) – The details of task content.

  • cookie (dict) – Cookie.

  • authorization (dict) – Authorization settings of HTTP request.

  • in_progressing_id (str) – The task ID which is in processing state.

  • running_result (Union[dict, RunningResult]) – The running result statistics about amount of successful and fail done tasks.

  • running_state (TaskState) – The status of task running.

  • result_detail (List[Union[dict, ResultDetail]]) – The details of running result.

Returns:

An Task meta-data object with value(s).

Return type:

Task

static heartbeat(time_format: str = None, update_time: str = None, update_timeout: str = None, heart_rhythm_timeout: str = None, healthy_state: HeartState = None, task_state: TaskState = None) Heartbeat[source]#

Initialize a meta-data object Heartbeat with values.

Parameters:
  • time_format (str) – The format of datetime value.

  • update_time (str) – The timer for updating heartbeat.

  • update_timeout (str) – The timeout threshold of updating.

  • heart_rhythm_timeout (str) – The timeout threshold of entire updating process.

  • healthy_state (HeartState) – Heartbeat status.

  • task_state (TaskState) – Task running status.

Returns:

An Heartbeat meta-data object with value(s).

Return type:

Heartbeat

class smoothcrawler_cluster.model.Update[source]#

Updating a meta-data object with values

Update the meta-data object with one or more multiple options.

static group_state(state: GroupState, total_crawler: int = None, total_runner: int = None, total_backup: int = None, standby_id: str = None, append_current_crawler: List[str] = [], append_current_runner: List[str] = [], append_current_backup: List[str] = [], append_fail_crawler: List[str] = [], append_fail_runner: List[str] = [], append_fail_backup: List[str] = []) GroupState[source]#

Updating a meta-data object GroupState with values.

Note

The updating of some options which is list type would update value though appending element(s) at the current list value in Zookeeper and assigning it at the target option.

Parameters:
  • state (GroupState) – Current GroupState meta-data object.

  • total_crawler (int) – Total amount of crawler includes every role.

  • total_runner (int) – Total amount of crawler which is Runner.

  • total_backup (int) – Total amount of crawler which is Backup_Runner.

  • standby_id (str) – The standby ID. It should be the index of crawler name.

  • append_current_crawler (list of str) – A list of total crawler instance’s name includes every role.

  • append_current_runner (list of str) – A list of total crawler instance’s name which is Runner.

  • append_current_backup (list of str) – A list of total crawler instance’s name which is Backup_Runner.

  • append_fail_crawler (list of str) – A list of total crawler instance’s name which is dead state.

  • append_fail_runner (list of str) – A list of total crawler instance’s name which is Dead_Runner.

  • append_fail_backup (list of str) – A list of total crawler instance’s name which is Dead_Backup_Runner.

Returns:

An GroupState meta-data object with value(s).

Return type:

GroupState

static node_state(node_state: NodeState, group: str = None, role: CrawlerRole = None) NodeState[source]#

Updating a meta-data object NodeState with values.

Parameters:
  • node_state (NodeState) – Current NodeState meta-data object.

  • group (str) – The name of group which the current crawler instance belong to.

  • role (CrawlerRole) – The role of current crawler instance.

Returns:

An NodeState meta-data object with value(s).

Return type:

NodeState

static task(task: Task, running_content: List[dict | RunningContent] = None, cookie: dict = None, authorization: dict = None, in_progressing_id: str = None, running_result: dict | RunningResult = None, running_status: TaskState = None, result_detail: List[dict | ResultDetail] = None) Task[source]#

Updating a meta-data object Task with values.

Parameters:
  • task (Task) – Current Task meta-data object.

  • running_content (List[Union[dict, RunningContent]]) – The details of task content.

  • cookie (dict) – Cookie.

  • authorization (dict) – Authorization settings of HTTP request.

  • in_progressing_id (str) – The task ID which is in processing state.

  • running_result (Union[dict, RunningResult]) – The running result statistics about amount of successful and fail done tasks.

  • running_status (TaskState) – The status of task running.

  • result_detail (List[Union[dict, ResultDetail]]) – The details of running result.

Returns:

An Task meta-data object with value(s).

Return type:

Task

static heartbeat(heartbeat: Heartbeat, heart_rhythm_time: datetime = None, time_format: str = None, update_time: str = None, update_timeout: str = None, heart_rhythm_timeout: str = None, healthy_state: HeartState = None, task_state: str | TaskState = None) Heartbeat[source]#

Updating a meta-data object Heartbeat with values.

Parameters:
  • heartbeat (Heartbeat) – Current Heartbeat meta-data object.

  • heart_rhythm_time (datetime) – It should be a datetime.datetime type object.

  • time_format (str) – The format of datetime value.

  • update_time (str) – The timer for updating heartbeat.

  • update_timeout (str) – The timeout threshold of updating.

  • heart_rhythm_timeout (str) – The timeout threshold of entire updating process.

  • healthy_state (HeartState) – Heartbeat status.

  • task_state (TaskState) – Task running status.

Returns:

An Heartbeat meta-data object with value(s).

Return type:

Heartbeat

Below are the details of 2 sub-packages: meta-data objects and the enum objects for them.

Here is the data objects for inner usage: