doFolder.hashing.calculate module¶
Core cryptographic hash calculation functions for files and byte content.
This module provides the fundamental hash calculation functionality with support for multiple algorithms, chunked processing, and both memory and streaming I/O modes.
Added in version 2.3.0.
- doFolder.hashing.calculate.calc(content: BinaryIO | bytes, algorithm: str | Iterable[str] = 'sha256', chunkSize: int = 16384, progress: ProgressController | None = None) dict[str, str]¶
Calculate the hash of arbitrary content (bytes or file-like object).
This is the main public interface for hashing content that is not necessarily a file. It handles both in-memory content (bytes) and streaming content (file-like objects) uniformly.
- Parameters:
content (Union[BinaryIO, bytes]) – The content to hash. Can be: - bytes, bytearray, or memoryview objects - Any file-like object with a read() method (e.g., open files, BytesIO)
algorithm (Union[str, Iterable[str]], optional) – Hash algorithm name(s). Must be supported by hashlib. Common options: ‘sha256’, ‘sha1’, ‘md5’, ‘sha512’. Defaults to ‘sha256’.
chunkSize (int, optional) – Size of chunks when reading from file-like objects. Larger chunks may be more efficient for large files but use more memory. Defaults to 16KB.
progress (ProgressController, optional) –
Progress controller for tracking calculation progress. Updates progress based on bytes processed.
Added in version 2.3.0.
- Returns:
- Mapping of algorithm names to calculated hashes as
lowercase hexadecimal strings.
- Return type:
Dict[str, str]
- Raises:
ValueError – If any specified hash algorithm is not supported.
IOError – If reading from a file-like object fails.
Example
>>> calc(b"hello world") {'sha256': 'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'}
>>> with open("file.txt", "rb") as f: ... hash_values = calc(f, algorithm=["sha256", "md5"])
- doFolder.hashing.calculate.fileHash(file: File, algorithm: str = 'sha256', chunkSize: int = 16384, fileIOMinSize: int = 65536, progress: ProgressController | None = None) FileHashResult¶
Calculate the cryptographic hash of a file with comprehensive metadata.
This is the primary function for hashing files, providing a complete FileHashResult with hash value, algorithm info, file path, and timing data. The function automatically optimizes I/O based on file size and handles both small and large files efficiently.
- Parameters:
file (File) – The file object to hash. Must be a valid file from the fileSystem module with accessible content and metadata.
algorithm (str, optional) – Cryptographic hash algorithm to use. Must be supported by Python’s hashlib (e.g., ‘sha256’, ‘md5’, ‘sha1’, ‘sha512’, ‘blake2b’). Defaults to ‘sha256’.
chunkSize (int, optional) – Size of chunks for reading large files. Larger chunks may improve I/O performance but use more memory. Defaults to 16KB.
fileIOMinSize (int, optional) – File size threshold for I/O optimization. Files larger than this use streaming I/O, smaller ones are read entirely into memory. Defaults to 64KB.
progress (ProgressController, optional) –
Progress controller for tracking calculation progress. Updates progress based on bytes processed. Allows monitoring progress and potentially canceling the operation.
Added in version 2.3.0.
- Returns:
- A complete result object containing:
hash: The calculated hash as hexadecimal string
algorithm: The algorithm used
path: The file’s path
mtime: File modification time when hashed
calcTime: Timestamp of hash calculation
- Return type:
- Raises:
ValueError – If the hash algorithm is not supported.
IOError – If the file cannot be read.
OSError – If file metadata cannot be accessed.
Note
- It is an interface specifically designed for File.hash
to replace the hash calculation code originally implemented within File.hash
The returned FileHashResult can be used with caching systems to avoid recalculating hashes for unchanged files.
- doFolder.hashing.calculate.multipleFileHash(file: File, algorithms: str | Iterable[str] = 'sha256', chunkSize: int = 16384, fileIOMinSize: int = 65536, progress: ProgressController | None = None) dict[str, FileHashResult]¶
Calculate multiple cryptographic hashes of a file efficiently in a single pass.
This function computes multiple hash algorithms for a single file in one I/O operation, making it more efficient than calling fileHash() multiple times. It reads the file content once and applies all specified algorithms simultaneously, returning an iterable of FileHashResult objects for each algorithm.
This approach is particularly beneficial when you need the same file hashed with multiple algorithms (e.g., for security verification, compatibility with different systems, or comprehensive file integrity checking).
- Parameters:
file (File) – The file object to hash. Must be a valid file from the fileSystem module with accessible content and metadata.
algorithms (Union[str, Iterable[str]], optional) – Hash algorithm(s) to use. Can be a single algorithm name (str) or an iterable of algorithm names. Each algorithm must be supported by Python’s hashlib (e.g., ‘sha256’, ‘md5’, ‘sha1’, ‘sha512’, ‘blake2b’). Defaults to ‘sha256’.
chunkSize (int, optional) – Size of chunks for reading large files. Larger chunks may improve I/O performance but use more memory. Defaults to 16KB.
fileIOMinSize (int, optional) – File size threshold for I/O optimization. Files larger than this use streaming I/O, smaller ones are read entirely into memory. Defaults to 64KB.
progress (ProgressController, optional) –
Progress controller for tracking calculation progress. Updates progress based on bytes processed. Allows monitoring progress and potentially canceling the operation.
Added in version 2.3.0.
- Returns:
- A mapping of algorithm names to FileHashResult
objects. Each result contains: - hash: The calculated hash as hexadecimal string - algorithm: The specific algorithm used for this result - path: The file’s path - mtime: File modification time when hashed - calcTime: Timestamp of hash calculation (same for all results)
- Return type:
Dict[str, FileHashResult]
- Raises:
ValueError – If any specified hash algorithm is not supported by hashlib.
IOError – If the file cannot be read.
OSError – If file metadata cannot be accessed.
Example
Calculate multiple hashes for a single file:
# Single algorithm (equivalent to fileHash) results = multipleFileHash(file, "sha256") sha256_result = results["sha256"] # Multiple algorithms in one pass results = multipleFileHash(file, ["sha256", "md5", "sha1"]) for algorithm, result in results.items(): print(f"{algorithm}: {result.hash}") # Using with different algorithms algorithms = ["sha256", "blake2b", "sha512"] results = multipleFileHash(file, algorithms)
- Performance Notes:
More efficient than multiple calls to fileHash() for the same file
All algorithms process the same data stream simultaneously
File is read only once regardless of the number of algorithms
Memory usage scales with the number of algorithms (one hasher per algorithm)
Calculation time is roughly the sum of individual algorithm times
Note
While this function accepts a single algorithm string for compatibility, if you only need one hash, consider using fileHash() instead as it returns a single FileHashResult rather than a dictionary.
- doFolder.hashing.calculate.unsupport(algorithms: Iterable[str])¶
Check for unsupported hash algorithms.
- Parameters:
algorithms – Iterable of algorithm names to check.
- Returns:
Tuple of unsupported algorithm names.