doFolder.hashing.cache module¶
File hash cache management systems for persistent and in-memory result storage.
This module provides abstract and concrete implementations of cache managers that can store and retrieve FileHashResult objects to avoid redundant hash calculations.
Added in version 2.3.0.
- class doFolder.hashing.cache.FileHashCacheManagerBase¶
Bases:
ABCAbstract base class for file hash cache management systems.
This class defines the interface for implementing custom cache managers that can store and retrieve FileHashResult objects. It provides the foundation for building persistent or specialized caching solutions beyond the default in-memory cache used by FileHashCalculator.
Implementing classes can provide various storage backends such as: - Database-backed caching for persistence across program runs - Redis or other external cache systems for distributed applications - File-based caching for single-machine persistence - Memory-mapped file caching for large datasets
The interface is designed to be simple and focused, requiring only two core operations: retrieving cached results and storing new results.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the get() and set() methods to provide actual caching functionality.
Added in version 2.2.4.
- get(file: File, algorithm: str) FileHashResult | None¶
Retrieve a cached hash result for the specified file.
This method should look up and return any previously cached hash result for the given file. The implementation should handle cache validation logic, such as checking if the cached result is still valid based on file modification times or other criteria.
- Parameters:
file (File) – The file object to look up in the cache. The implementation can use any file attributes (path, modification time, size, etc.) as cache keys or validation criteria.
- Returns:
- The cached hash result if found and valid,
None if no valid cached result exists. Returning None indicates that a new hash calculation should be performed.
- Return type:
FileHashResult or None
- Implementation Guidelines:
Use file.path as the primary cache key
Consider file.state.st_mtime for cache validation
Handle cases where cached data may be corrupted or incomplete
Return None for any uncertain cases to trigger recalculation
Optimize for fast lookup since this method may be called frequently
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- getKey(target, algorithm=None)¶
Generate a cache key tuple from target object and algorithm.
This method creates a standardized cache key tuple that uniquely identifies a file hash entry based on the target object and algorithm. The key is used internally by cache managers to store and retrieve hash results efficiently.
- Parameters:
target (Union[FileHashResult, FileSystemItem, Any]) – The target object to generate a key for. Can be: - FileHashResult: Uses the result’s path and algorithm - FileSystemItem (File, Path, etc.): Uses the item’s path with provided algorithm - Any other object: Converts to string and uses with provided algorithm
algorithm (str, optional) – Hash algorithm name. Required when target is not a FileHashResult. Ignored when target is a FileHashResult (uses result’s algorithm instead).
- Returns:
- A tuple containing (path_string, algorithm_string) that
serves as a unique cache key. The path is always converted to string for consistent key generation across different path-like objects.
- Return type:
Tuple[str, str]
- Raises:
AssertionError – If algorithm is not provided when target is not a FileHashResult.
Note
This method is primarily intended for internal use by cache managers to maintain consistent key generation across different input types. The returned tuple format ensures stable dictionary keys for caching.
- set(file: File, result: FileHashResult) None¶
Store a hash result in the cache for future retrieval.
This method should persistently store the provided hash result so it can be retrieved later via the get() method. The implementation should handle any necessary serialization, storage optimization, and error handling.
- Parameters:
file (File) – The file object that was hashed. This provides context for the cache entry, including path and metadata that may be useful for cache organization or validation.
result (FileHashResult) – The complete hash result to store. This includes the hash value, algorithm used, file path, modification time, and calculation timestamp.
- Implementation Guidelines:
Use file.path or result.path as the cache key
Store all relevant fields from the FileHashResult
Handle storage errors gracefully (logging, fallback strategies)
Consider implementing cache size limits or expiration policies
Ensure thread-safety if the cache will be used concurrently
Optimize for write performance since caching should not significantly slow down hash calculations
- Raises:
The method should handle storage errors internally when possible, –
but may raise exceptions for critical failures that should halt –
the application (e.g., disk full, permission denied). –
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- class doFolder.hashing.cache.LfuMemoryFileHashManager(maxSize: int)¶
Bases:
FileHashCacheManagerBaseLRU (Least Recently Used) memory cache manager with size limits.
This implementation provides memory-bounded caching using an OrderedDict to track access patterns. When the cache exceeds the specified maximum size, the least recently used entries are automatically evicted to maintain the size limit.
The cache uses LRU eviction policy: - Recently accessed items are moved to the end of the cache - When size limit is exceeded, items from the beginning are removed - Both get() and set() operations update the access order
This manager is ideal for long-running applications that process many files where memory usage needs to be controlled while maintaining good cache hit rates.
- Parameters:
maxSize (int) – Maximum number of hash results to keep in cache. When exceeded, least recently used entries are evicted.
Note
Despite the class name “LfuMemoryFileHashManager”, this implementation actually uses LRU (Least Recently Used) eviction policy. The name is kept for backward compatibility but may be misleading - consider it as an LRU cache manager.
Added in version 2.2.4.
- get(file: File, algorithm: str)¶
Retrieve a cached hash result for the specified file.
This method should look up and return any previously cached hash result for the given file. The implementation should handle cache validation logic, such as checking if the cached result is still valid based on file modification times or other criteria.
- Parameters:
file (File) – The file object to look up in the cache. The implementation can use any file attributes (path, modification time, size, etc.) as cache keys or validation criteria.
- Returns:
- The cached hash result if found and valid,
None if no valid cached result exists. Returning None indicates that a new hash calculation should be performed.
- Return type:
FileHashResult or None
- Implementation Guidelines:
Use file.path as the primary cache key
Consider file.state.st_mtime for cache validation
Handle cases where cached data may be corrupted or incomplete
Return None for any uncertain cases to trigger recalculation
Optimize for fast lookup since this method may be called frequently
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- maxSize: int¶
- set(file: File, result: FileHashResult)¶
Store a hash result in the cache for future retrieval.
This method should persistently store the provided hash result so it can be retrieved later via the get() method. The implementation should handle any necessary serialization, storage optimization, and error handling.
- Parameters:
file (File) – The file object that was hashed. This provides context for the cache entry, including path and metadata that may be useful for cache organization or validation.
result (FileHashResult) – The complete hash result to store. This includes the hash value, algorithm used, file path, modification time, and calculation timestamp.
- Implementation Guidelines:
Use file.path or result.path as the cache key
Store all relevant fields from the FileHashResult
Handle storage errors gracefully (logging, fallback strategies)
Consider implementing cache size limits or expiration policies
Ensure thread-safety if the cache will be used concurrently
Optimize for write performance since caching should not significantly slow down hash calculations
- Raises:
The method should handle storage errors internally when possible, –
but may raise exceptions for critical failures that should halt –
the application (e.g., disk full, permission denied). –
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- class doFolder.hashing.cache.MemoryFileHashManager¶
Bases:
FileHashCacheManagerBaseSimple in-memory cache manager for file hash results.
This implementation provides basic caching functionality using a Python dictionary to store hash results in memory. The cache persists for the lifetime of the MemoryFileHashManager instance but is not persistent across program runs.
This is the default cache manager when caching is enabled and provides good performance for most use cases where memory usage is not a primary concern.
Note
The cache grows without bounds, so for applications processing many files, consider using LfuMemoryFileHashManager instead to limit memory usage.
Added in version 2.2.4.
- get(file: File, algorithm: str)¶
Retrieve a cached hash result for the specified file.
This method should look up and return any previously cached hash result for the given file. The implementation should handle cache validation logic, such as checking if the cached result is still valid based on file modification times or other criteria.
- Parameters:
file (File) – The file object to look up in the cache. The implementation can use any file attributes (path, modification time, size, etc.) as cache keys or validation criteria.
- Returns:
- The cached hash result if found and valid,
None if no valid cached result exists. Returning None indicates that a new hash calculation should be performed.
- Return type:
FileHashResult or None
- Implementation Guidelines:
Use file.path as the primary cache key
Consider file.state.st_mtime for cache validation
Handle cases where cached data may be corrupted or incomplete
Return None for any uncertain cases to trigger recalculation
Optimize for fast lookup since this method may be called frequently
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- set(file: File, result: FileHashResult)¶
Store a hash result in the cache for future retrieval.
This method should persistently store the provided hash result so it can be retrieved later via the get() method. The implementation should handle any necessary serialization, storage optimization, and error handling.
- Parameters:
file (File) – The file object that was hashed. This provides context for the cache entry, including path and metadata that may be useful for cache organization or validation.
result (FileHashResult) – The complete hash result to store. This includes the hash value, algorithm used, file path, modification time, and calculation timestamp.
- Implementation Guidelines:
Use file.path or result.path as the cache key
Store all relevant fields from the FileHashResult
Handle storage errors gracefully (logging, fallback strategies)
Consider implementing cache size limits or expiration policies
Ensure thread-safety if the cache will be used concurrently
Optimize for write performance since caching should not significantly slow down hash calculations
- Raises:
The method should handle storage errors internally when possible, –
but may raise exceptions for critical failures that should halt –
the application (e.g., disk full, permission denied). –
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- class doFolder.hashing.cache.NullFileHashManager¶
Bases:
FileHashCacheManagerBaseNo-operation cache manager that disables caching entirely.
This implementation provides a null object pattern for situations where caching is not desired. All cache operations are no-ops: - get() always returns None (cache miss) - set() discards the provided result
Use this manager when you want to disable caching but still use the FileHashCalculator interface, or for testing scenarios where cache behavior should be eliminated.
This manager has zero memory overhead and provides consistent performance characteristics since no cache lookups or storage operations are performed.
Added in version 2.2.4.
- get(file: File, algorithm: str)¶
Retrieve a cached hash result for the specified file.
This method should look up and return any previously cached hash result for the given file. The implementation should handle cache validation logic, such as checking if the cached result is still valid based on file modification times or other criteria.
- Parameters:
file (File) – The file object to look up in the cache. The implementation can use any file attributes (path, modification time, size, etc.) as cache keys or validation criteria.
- Returns:
- The cached hash result if found and valid,
None if no valid cached result exists. Returning None indicates that a new hash calculation should be performed.
- Return type:
FileHashResult or None
- Implementation Guidelines:
Use file.path as the primary cache key
Consider file.state.st_mtime for cache validation
Handle cases where cached data may be corrupted or incomplete
Return None for any uncertain cases to trigger recalculation
Optimize for fast lookup since this method may be called frequently
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.
- set(file: File, result: FileHashResult)¶
Store a hash result in the cache for future retrieval.
This method should persistently store the provided hash result so it can be retrieved later via the get() method. The implementation should handle any necessary serialization, storage optimization, and error handling.
- Parameters:
file (File) – The file object that was hashed. This provides context for the cache entry, including path and metadata that may be useful for cache organization or validation.
result (FileHashResult) – The complete hash result to store. This includes the hash value, algorithm used, file path, modification time, and calculation timestamp.
- Implementation Guidelines:
Use file.path or result.path as the cache key
Store all relevant fields from the FileHashResult
Handle storage errors gracefully (logging, fallback strategies)
Consider implementing cache size limits or expiration policies
Ensure thread-safety if the cache will be used concurrently
Optimize for write performance since caching should not significantly slow down hash calculations
- Raises:
The method should handle storage errors internally when possible, –
but may raise exceptions for critical failures that should halt –
the application (e.g., disk full, permission denied). –
Note
This is an abstract method that must be implemented by subclasses. The base class implementation raises NotImplementedError.