Heartbeat#

class smoothcrawler_cluster.model.metadata.Heartbeat[source]#

Meta-Data for one specific crawler’s heartbeat

The cluster member of Backup Runner would use this info to determine the member of Runner is health or not. It only has one thing in this section — datetime. The datetime is the stamp of Runner heartbeat to display when does it live and update the stamp last time. So Backup Runner could keep checking this info to determine whether current Runner member is alive or not. And if the current Runner member doesn’t update stamp until timeout, and it also is discovered by Backup Runner, cluster rules that each Backup Runner members should check its index behind its web spider name and the smaller one should activate itself to run first, another ones which index are bigger should NOT activate and keep waiting / checking the heartbeat stamp until next time they discover timeout of heartbeat.

  • Zookeeper node path:

/smoothcrawler/node/<crawler name>/heartbeat/

  • Example data:

{
    "heart_rhythm_time": "2022-07-15 08:42:59",
    "time_format": "%Y-%m-%d %H:%M:%S",
    "update_time": "2s",
    "update_timeout": "4s",
    "heart_rhythm_timeout": "3",
    "healthy_state": "Healthy",
    "task_state": "processing"
}
to_readable_object() dict[source]#

Converse the instance’s current data to be dict type value. Its target is let data converse as JSON format value for deserializing conveniently.

Returns:

A dict type value keeps the current instance’s data.

Return type:

dict

property heart_rhythm_time: str | None#

Properties with both a getter and setter for the datetime value of currently heartbeat of one specific Runner crawler.

Getter would convert the datetime value to string type value and return it. Setter would raise ValueError in below 2 scenarios:

  • Value is NOT str or datetime.datetime type.

  • The datetime value format is not satisfied with the format meta-data object has currently.

Note

About the setter of property heart_rhythm_time, it could accept 2 types: _string_ or _datetime.datetime_. Because it wants to reach a feature about pre-checking the value it got is valid or not, it would try to parse the datetime value before set the value. However, if it’s _string_ type value, it has so many format of datetime. So this property use a simple Python regex to check it:

import re

checksum = re.search(
    r"[0-9]{2,4}[\-\/:][0-9]{2,4}[\-\/:][0-9]{2,4}.[0-9]{2,4}[\-\/:][0-9]{2,4}[\-\/:][0-9]{2,4}",
    str(heart_rhythm_time)
)

It try to check all format of datetime value, i.e., _yyyy-mm-dd hh:MM:DD_, _yy-mm-dd hh:MM:DD_, _dd-mm-yy hh:MM:DD_, etc, checking all values as possible. But, it would pre-check the datetime value easily if it already has property time_format value.

Type:

str

property time_format: str#

Properties with both a getter and setter for the string format of how smoothcrawler should to parse the value of heart_rhythm_time.

Setter would raise ValueError in 2 scenarios:
  • Value data type is NOT str.

  • The value is invalid which cannot parse datetime value, e.g., datetime.datetime.now().

Type:

str

property update_time: str#

Properties with both a getter and setter for how long deos the crawler should keep updating the value of heartbeat property heart_rhythm_time. This property is the target to let smoothcrawler-cluster checks and may discover that one(s) of crawlers is(are) dead, please activating backup one(s) as soon as possible.

Setter would raise ValueError in 2 scenarios:
  • Value data type is NOT str.

  • Value format is invalid. Its format would be like <int><string in (s,m,h)>, i.g., 3s, 1m. s is

seconds, m is minutes and h is hours.

Type:

str

property update_timeout: str#

Properties with both a getter and setter for the timeout threshold to let others crawler judge current crawler instance is alive or dead. If it doesn’t update the timestamp value of property heart_rhythm_time until the time is longer than this property’s value, the crawler instance which doesn’t update property heart_rhythm_time anymore would be marked as HeartState.Arrhythmia by others alive crawler instance.

Setter would raise ValueError in 2 scenarios:
  • Value data type is NOT str.

  • Value format is invalid. Its format would be like <int><string in (s,m,h)>, i.g., 3s, 1m. s is

seconds, m is minutes and h is hours.

Type:

str

property heart_rhythm_timeout: str#

Properties with both a getter and setter for the property of update_timeout means one specific crawler instance doesn’t update value on time, it’s possible that the network issue lead to it happens, doesn’t since the crawler instance is dead. And this property heart_rhythm_timeout means how many times is it late to update. It would truly be judged it is dead by others crawler instances if it reaches this threshold. And it would be marked as HeartState.Asystole by others crawler and be listed in fail_crawler.

Setter would raise ValueError in 2 scenarios:
  • Value data type is NOT str.

  • Value is NOT integer format which cannot be converted by int.

Type:

str

property healthy_state: str#

Properties with both a getter and setter for the healthy state of current crawler instance. It would be updated by itself or others in different scenarios.

Setter would raise ValueError in 2 scenarios:
  • Value data type is NOT str or HeartState.

  • If value type is str, the string type value should be included in HeartState.

Type:

str

property task_state: str#

Properties with both a getter and setter for the running state of the task it takes currently.

Setter would raise ValueError in 2 scenarios:
  • Value data type is NOT str or TaskResult.

  • If value type is str, the string type value should be included in TaskResult.

Type:

str