Spark
toksearch.backend.spark.ToksearchSparkConfig
dataclass
Configuration for the Spark backend
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sc |
Optional[SparkContext]
|
SparkContext to use. If not provided, a default SparkContext will be created. |
None
|
numparts |
Optional[int]
|
Number of partitions to use. If not provided, defaults to the number of records. will be used. |
None
|
cache |
bool
|
Whether to cache the RDD. Default is False. |
False
|
toksearch.backend.spark.SparkRecordSet
Bases: RecordSet
do_cache = cache
instance-attribute
rdd = rdd
instance-attribute
__getitem__(index)
__init__(rdd, cache=False)
__iter__()
__len__()
cache()
cleanup(immediate=False)
Shut down the SparkContext.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
immediate |
Whether to shut down the SparkContext immediately. If false (default), the data will be collected before shutting down the SparkContext and remain accessible after the SparkContext is stopped. |
False
|
from_records(records, config=None)
classmethod
Create a SparkRecordSet from a list of records.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
records |
List[Record]
|
List of records to create the RecordSet from. |
required |
config |
Optional[ToksearchSparkConfig]
|
Configuration for the Spark backend. |
None
|
Returns:
Name | Type | Description |
---|---|---|
SparkRecordSet |
SparkRecordSet
|
The record set |