Utils - Zookeeper#

Util functions of operating with Zookeeper

Here are some objects for ZookeeperCrawler which won’t take care meta-data objects by itself. It would let third party application to handle them — Zookeeper. Therefore, some util functions about doing operations with Zookeeper in this module for that.

ZookeeperPath#

class smoothcrawler_cluster._utils.zookeeper.ZookeeperPath(name: str, group: str)[source]#

All paths of Zookeeper

In Zookeeper, it would save data under specific path as node. This object provides all paths of Zookeeper which saves meta-data for SmoothCrawler-Cluster.

Parameters:
  • name (str) – The name of current crawler instance.

  • group (str) – The group what current crawler instance is in.

classmethod generate_parent_node(parent_name: str, is_group: bool = False) str[source]#

Generate node path of Zookeeper with fixed format.

Parameters:
  • parent_name (str) – The crawler name.

  • is_group (bool) – If it’s True, generate node path for _group_ type meta-data.

Returns:

A Zookeeper node path.

Return type:

str

ZookeeperNode#

class smoothcrawler_cluster._utils.zookeeper.ZookeeperNode[source]#

Zookeeper node object

All data be got from Zookeeper would be converted to this object in all util functions for getting value.

property path: str | None#

Properties with both a getter and setter for the path of node in Zookeeper.

Type:

str

property value: str | None#

Properties with both a getter and setter for the value of the path. It may need to deserialize the data if it needs.

Type:

str

ZookeeperRecipe#

class smoothcrawler_cluster._utils.zookeeper.ZookeeperRecipe(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Distributed Lock features

The enum value is the object naming which could be found in the module kazoo.recipe.lock.

READ_LOCK: str = 'ReadLock'#

The kazoo.recipe.lock.ReadLock object.

WRITE_LOCK: str = 'WriteLock'#

The kazoo.recipe.lock.WriteLock object.

SEMAPHORE: str = 'Semaphore'#

The kazoo.recipe.lock.Semaphore object.

ZookeeperClient#

class smoothcrawler_cluster._utils.zookeeper.ZookeeperClient(hosts: str)[source]#

The Zookeeper client object which be implemented by Python library `kazoo`_.

This object is the default usage in this package used as Zookeeper client.

restrict(path: str, restrict: ZookeeperRecipe, identifier: str, max_leases: int = None) ReadLock | WriteLock | Semaphore[source]#

Limit Zookeeper operations in concurrency scenarios by distributed lock.

Parameters:
  • path (str) – The node path.

  • restrict (ZookeeperRecipe) – Which type of distributed lock to instantiate and use.

  • identifier (str) – The identifier of distributed lock.

  • max_leases (Optional[int]) – This option for distributed lock Semaphore. The maximum amount to leases available for the semaphore. It’s same as the argument of `kazoo.recipe.lock.Semaphore.__init__`_.

Returns:

The distributed lock be instantiated by kazoo.recipe.lock.ReadLock,

kazoo.recipe.lock.WriteLock or kazoo.recipe.lock.Semaphore.

The return type would be effected by the arguments restrict and max_leaves. In generally, it would generate the mapping object by the naming. But it would try to instantiate Semaphore if argument max_leaves is not None. So it DOES NOT suggest that giving value to option max_leaves if it doesn’t want to use Semaphore.

Return type:

Union[ReadLock, WriteLock, Semaphore]

Note

The instance it returns also could be operated by Python keyword with.

lock = <_BaseZookeeperClient type instance>.restrict(path="/test",
                                                     restrict=ZookeeperRecipe.READ_LOCK,
                                                     identifier="test_id")
with lock:
    # Do something with the lock

# pylint: disable=line-too-long .. _kazoo.recipe.lock.Semaphore.__init__: https://kazoo.readthedocs.io/en/latest/api/recipe/lock.html#kazoo.recipe.lock.Semaphore.__init__

exist_node(path: str) Any | None[source]#

Check whether the target node exist or not.

Parameters:

path (str) – The node path.

Returns:

True if the target path is existed, nor False.

Return type:

bool

get_node(path: str) Generic[_BaseZookeeperNodeType][source]#

Get one specific node by path in Zookeeper.

Parameters:

path (str) – The node path.

Returns:

It would return a _BaseZookeeperPathType type object.

Return type:

Generic[_BaseZookeeperNodeType]

create_node(path: str, value: str | bytes = None) str[source]#

Create a node with the path and value in Zookeeper.

Parameters:
  • path (str) – The node path.

  • value (Union[str, bytes]) – Assign value to the node by path when create it.

Returns:

None

delete_node(path: str) bool[source]#

Delete the node by path in Zookeeper.

Parameters:

path (str) – The node path.

Returns:

True if it deletes the node successfully.

Return type:

bool

get_value_from_node(path: str) str[source]#

Get the value directly from the Zookeeper path.

Parameters:

path (str) – The node path.

Returns:

The value from node in Zookeeper. It must be a string type value, but it might as a specific format,

e.g.,JSON format, so it’s possible to deserialize the data if it needs.

Return type:

str

set_value_to_node(path: str, value: str | bytes) None[source]#

Set a value to the one specific Zookeeper path.

Parameters:
  • path (str) – The node path.

  • value (str) – Data which must be string type value.

Returns:

True if it runs finely without any issue, nor it returns False.

Return type:

bool