How to write new Converter#
In SmoothCrawler-Cluster package, it would process serialization features through JSON format data in default. Absolutely, it also could customize your own serialization feature and apply it in ZookeeperCrawler via option zk_converter.
Before demonstrate, there are 3 things you need to know:
Apart from serialize/deserialize feature, it must to implement the detail how to convert string type value into target meta-data objects.
Although it needs to implement each one meta-data deserialization, it doesn’t for serialization.
For serialization, it has only one common function for processing all objects to target data format.
From above 3 points, that’s clear let us to customized your own converter.
No matter which converters of meta-data, it must needs to extend the base class and implement all functions it rules:
from smoothcrawler_cluster._utils.converter import BaseConverter
class ListConverter(BaseConverter):
# all function implementations of converter
Let’s start to learn how to implement new converter with list <-> str example!
Implement serialization/deserialization#
About serialization and deserialization features, the first we need to do must be how to convert an object to a string type value and convert a string type value back to an object.
Serialization#
Format objects as target data format#
Before running serialization, it should let the data to be satisfied the format we want it to be. So we need to format it before serialize:
def _convert_to_readable_object(self, obj: Generic[_BaseMetaDataType]) -> Any:
dict_obj: Dict[str, Any] = obj.to_readable_object()
return list(dict_obj.values())
obj.to_readable_object() would return a dict type value which format like as JSON. And it gets all values of the dict object and convert
to list directly. It has done the format part so that we could go ahead to implement serialization feature.
Serialize to string value#
About serialization, it means that convert an object to a string type which could save all needed data clearly. So for converting list
to str, we just need to use native function str():
def _convert_to_str(self, data: Any) -> str:
data = str(data)
return data
Now, we have done the whole feature of serialization. Let’s keep implementing the deserialization part!
Deserialization#
Deserialize back to object#
About deserialization, it should convert a string type value back to Python object, e.g., list in this demonstration. So we do it through
Python native library ast:
import ast
def _convert_from_str(self, data: str) -> Any:
parsed_data: List[Any] = ast.literal_eval(ini_list)
return parsed_data
However, it doesn’t finish the all tasks you should implement in deserialization. Above code only convert a string type value back to
a Python object — list object, but it doesn’t convert it to meta-data object yet. We still should implement the details of
how to convert it again to each one meta-data objects so that crawler could work finely with these meta-data objects with Zookeeper
through the customized converter you define.
Therefore, let’s keep doing for converting of each one meta-data objects!
Detail of how to convert each one meta-data objects#
About implementing the converting feature of each meta-data objects, it’s the same mostly, the only different is the index or key to get target value. So we only demonstrate NodeState part implementation here.
def _convert_to_node_state(self, state: NodeState, data: Any) -> NodeState:
data: Dict[str, Any] = data
state.group = data[0]
state.role = data[1]
return state
Hint
It has different functions for implementing deserialization of different meta-data objects.
GroupState -> _convert_to_group_state
NodeState -> _convert_to_node_state
Task -> _convert_to_task
Heartbeat -> _convert_to_heartbeat
Congratulation! You finish a customized converter for list object. Let’s try to use it and verify the running result!
Verify the converting features#
Finish all tasks we should do of implementing your own customized converter. Let’s try to run it and verify whether the running result is expected for us or not.
We could create a NodeState object for example through smoothcrawler_cluster.model.Initial. It provides initialization function for
every meta-data objects. And we could test the customized converter features by functions serialize_meta_data and deserialize_meta_data
as below demonstration:
from smoothcrawler_cluster.model import NodeState, CrawlerStateRole, Initial
node_state = Initial.node_state(group="test", role=CrawlerStateRole.RUNNER)
# Instantiate your customized converter
converter = ListConverter()
# Test for serialization
value = converter.serialize_meta_data(obj=node_state)
print(f"value: {value}")
# Test for deserialization
metadata_obj = converter.deserialize_meta_data(data=value, as_obj=NodeState)
print(f"metadata_obj: {metadata_obj}")
print(f"readable metadata_obj: {metadata_obj.to_readable_object()}")
The running result would be like below:
>>> python3 example_converter.py
value: '["test", "Runner"]'
metadata_obj: <class 'NodeState'>
readable metadata_obj: {"role": "runner", "group": "sc-crawler-cluster"}
That’s great! Therefore, you finish your own customized converter and verify its features could work finely, you also could apply it as option zk_converter of ZookeeperCrawler to replace default one.
from smoothcrawler_cluster import ZookeeperCrawler
zk_crawler = ZookeeperCrawler(runner=2,
backup=1,
ensure_initial=True,
zk_hosts=_ZK_HOSTS,
zk_converter=ListConverter())
That’s all of how to write a customized converter by yourself. Have fun with it!