Attributes#
The basic attributes of crawler in SmoothCrawler-Cluster
A crawler objects in SmoothCrawler-Cluster must have some basic attributes, e.g., name, id_separation, etc. Although these attributes are necessary for each cluster crawlers, all of them are optional so the arguments of crawler are not required. Therefore, it has a question it needs to consider: what value it should be set if it is empty value? Absolutely, we could set it manually with any values we want. But if we want to set it more conveniently, how can it do? For example, if we run multiple crawler instance by Docker container, do we need to set the attributes like its name for each single containers? So this module exists for resolving this issue.
New in version 0.2.0.
Base Attributes#
- class smoothcrawler_cluster.crawler.attributes.BaseCrawlerAttribute[source]#
The base class of all crawler’s attribute
Definition of crawler’s attribute objects. Currently, every crawler must have one property — name. Another property id_separation could be set automatically with name.
- abstract property name: str#
Properties with both a getter and setter. This crawler instance name. It MUST be unique naming in cluster (the same group) for let entire crawler cluster to distinguish every one, for example, the properties current_crawler, current_runner and current_backup in meta-data GroupState would record by crawler names. This option value could be modified by Zookeeper object option name.
- Type:
- abstract property id_separation: str#
Properties with both getter and setter. The string to separate the attribute name value to get identity of each one crawler instance.
- Type:
- abstract property current_id: str#
Properties with only getter. The current identity of each one crawler instance. It MUST BE unique.
- Type:
- property has_default: bool#
Properties with both getter and setter. Whether the properties name and id_separation can have default value or not.
- Type:
- class smoothcrawler_cluster.crawler.attributes.NextableAttribute[source]#
The one type of base crawler attribute with expected crawler’s identity
This crawler attribute base class means the crawler instance’s identity is expected. In the other words, it can use some specific way or logic to get the next identity to the new crawler instance if it needs to generate in cluster.
- abstract property next_id: str#
Properties with only getter. The next one identity of crawler instance. This identity MUST be new and unique which doesn’t be used before. This function only let you know what next one is. But it won’t really iterate to operate.
- Type:
- abstract property iter_to_next_id: str#
Properties with only getter. The next one identity of crawler instance. This identity MUST be new and unique which doesn’t be used before. This function would operate to next one, it means that if you try to get value by property name, it would turn to be the value which is equal to the return value of this property.
- Type:
Serial Attributes#
- class smoothcrawler_cluster.crawler.attributes.SerialCrawlerAttribute[source]#
The attribute let crawler’s identity to be serial
This crawler attribute generates crawler identity as serial number like 1, 2, 3, …, etc. This is the default attribute of crawler when it runs in local site directly.
- property name: str#
Properties with both a getter and setter. This crawler instance name. It MUST be unique naming in cluster (the same group) for let entire crawler cluster to distinguish every one, for example, the properties current_crawler, current_runner and current_backup in meta-data GroupState would record by crawler names. This option value could be modified by Zookeeper object option name.
- Type:
- property id_separation: str#
Properties with both getter and setter. The string to separate the attribute name value to get identity of each one crawler instance.
- Type:
- property current_id: str#
Properties with only getter. The current identity of each one crawler instance. It MUST BE unique.
- Type:
- property next_id: str#
Properties with only getter. The next one identity of crawler instance. This identity MUST be new and unique which doesn’t be used before. This function only let you know what next one is. But it won’t really iterate to operate.
- Type:
- property iter_to_next_id: str#
Properties with only getter. The next one identity of crawler instance. This identity MUST be new and unique which doesn’t be used before. This function would operate to next one, it means that if you try to get value by property name, it would turn to be the value which is equal to the return value of this property.
- Type: