Public API for libffuzzy. More...
#include <stdbool.h>
#include <stddef.h>
Go to the source code of this file.
Data Structures | |
struct | ffuzzy_digest |
The type to store ssdeep digest after parsing. More... | |
Macros | |
#define | FFUZZY_SPAMSUM_LENGTH 64 |
Maximum length for the digest block. | |
#define | FFUZZY_MIN_BLOCKSIZE 3 |
Minimum block size to start in ssdeep implementation. | |
#define | FFUZZY_MIN_MATCH 7 |
The minimal match (length of common substring) required for (at least) one of the block digests. | |
Functions | |
Comparison and Parsing | |
bool | ffuzzy_read_digest (ffuzzy_digest *digest, const char *s) |
Read ssdeep digest from the string. More... | |
int | ffuzzy_compare_digest (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two fuzzy hashes and compute similarity score. More... | |
int | ffuzzy_compare (const char *str1, const char *str2) |
Compute similarity score for given ssdeep hash strings. More... | |
Block Size Utilities | |
bool | ffuzzy_blocksize_is_valid (unsigned long block_size) |
Determines whether given block size is valid to use in libffuzzy. More... | |
bool | ffuzzy_blocksize_is_natural (unsigned long block_size) |
Determines whether given block size is "natural". More... | |
bool | ffuzzy_blocksize_is_near (unsigned long block_size1, unsigned long block_size2) |
Determines whether given block sizes are "near". More... | |
bool | ffuzzy_blocksize_is_far_le (unsigned long block_size1, unsigned long block_size2) |
Determines whether given ordered block sizes "far" enough. More... | |
Digest Utilities | |
bool | ffuzzy_digest_is_valid_lengths (const ffuzzy_digest *digest) |
Determines whether block lengths of given digest are valid. More... | |
bool | ffuzzy_digest_is_valid_buffer (const ffuzzy_digest *digest) |
Determines whether digest blocks are valid. More... | |
bool | ffuzzy_digest_is_natural_buffer (const ffuzzy_digest *digest) |
Determines whether digest blocks are valid and "natural". More... | |
bool | ffuzzy_digest_is_valid (const ffuzzy_digest *digest) |
Determines whether given digest is valid. More... | |
bool | ffuzzy_digest_is_natural (const ffuzzy_digest *digest) |
Determines whether given digest is valid and "natural". More... | |
int | ffuzzy_digestcmp (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two ffuzzy_digest values. More... | |
int | ffuzzy_digestcmp_blocksize (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two ffuzzy_digest values by block sizes. More... | |
int | ffuzzy_digestcmp_blocksize_n (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two ffuzzy_digest values by whether block sizes are "natural" and block size values. More... | |
bool | ffuzzy_pretty_digest (char *buf, size_t buflen, const ffuzzy_digest *digest) |
Convert ffuzzy_digest to the string. More... | |
Optimized / Specialized Comparison | |
int | ffuzzy_compare_digest_near (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two fuzzy hashes assuming two block sizes of given hashes are "near". More... | |
int | ffuzzy_compare_digest_near_eq (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two fuzzy hashes assuming two block sizes are same. More... | |
int | ffuzzy_compare_digest_near_lt (const ffuzzy_digest *d1, const ffuzzy_digest *d2) |
Compare two fuzzy hashes assuming second block size is double as first one. More... | |
Internal Comparison Utilities | |
int | ffuzzy_score_cap (int s1len, int s2len, unsigned long block_size) |
Retrieve score cap for given block lengths and the block size. More... | |
int | ffuzzy_score_cap_1 (int minslen, unsigned long block_size) |
Retrieve score cap for given block length and size. More... | |
int | ffuzzy_score_strings (const char *s1, size_t s1len, const char *s2, size_t s2len, unsigned long block_size) |
Compute partial similarity score for given two block strings and block size. More... | |
Public API for libffuzzy.
bool ffuzzy_blocksize_is_far_le | ( | unsigned long | block_size1, |
unsigned long | block_size2 | ||
) |
Determines whether given ordered block sizes "far" enough.
In this context, "far" means the second block size is greater than double of the first block size.
For block size-sorted digests, "far" means there are no subsequent entries which will match.
This function determines whether given block sizes are "far".
You may want to inline or reimplement this because this function is very easy. There's nothing preventing you to do that.
block_size1 | Valid block size 1 |
block_size2 | Valid block size 2 (must be equal or greater than block_size1) |
bool ffuzzy_blocksize_is_natural | ( | unsigned long | block_size | ) |
Determines whether given block size is "natural".
In this context, "natural" means given parameter of fuzzy hash may be generated by ssdeep or its backend, libfuzzy. Depending on the job, handling only "natural" digests may make your program efficient.
This function doesn't only check whether the block size is valid, but it checks the given size is a product of FFUZZY_MIN_BLOCKSIZE and a power of two.
block_size | Block size (which may not be valid or "natural") |
bool ffuzzy_blocksize_is_near | ( | unsigned long | block_size1, |
unsigned long | block_size2 | ||
) |
Determines whether given block sizes are "near".
In this context, "near" means two block sizes are equal or one of the block size is twice as other.
This function determines whether given block sizes are "near". If this function returns true, it is safe to use ffuzzy_compare_digest_near function for two digests which have given block sizes.
block_size1 | Valid block size 1 |
block_size2 | Valid block size 2 |
bool ffuzzy_blocksize_is_valid | ( | unsigned long | block_size | ) |
Determines whether given block size is valid to use in libffuzzy.
To prevent arithmetic overflow, not all unsigned long values are valid in libffuzzy. This function determines whether the given block size is valid and safe to use in libffuzzy.
You will not need to use this function if you use ffuzzy_read_digest function because it always returns valid digest on success.
Note that this is not the restriction of ssdeep digest, but restriction of the implementation.
block_size | Block size (which may not be valid) |
int ffuzzy_compare | ( | const char * | str1, |
const char * | str2 | ||
) |
Compute similarity score for given ssdeep hash strings.
[in] | str1 | ssdeep hash 1 |
[in] | str2 | ssdeep hash 2 |
int ffuzzy_compare_digest | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two fuzzy hashes and compute similarity score.
[in] | d1 | Valid digest 1 |
[in] | d2 | Valid digest 2 |
int ffuzzy_compare_digest_near | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two fuzzy hashes assuming two block sizes of given hashes are "near".
In this context, "near" means two block sizes are equal or one of the block size is twice as other.
This function assumes two block sizes are "near" (ffuzzy_blocksize_is_near on two block sizes returns true) and make the computation slightly faster.
[in] | d1 | Valid digest 1 |
[in] | d2 | Valid digest 2 |
int ffuzzy_compare_digest_near_eq | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two fuzzy hashes assuming two block sizes are same.
This function assumes two block sizes are same.
[in] | d1 | Valid digest 1 (with same block size as d2) |
[in] | d2 | Valid digest 2 (with same block size as d1) |
int ffuzzy_compare_digest_near_lt | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two fuzzy hashes assuming second block size is double as first one.
This function assumes second block size is double as first one.
[in] | d1 | Valid digest 1 |
[in] | d2 | Valid digest 2 (with double block size as d1) |
bool ffuzzy_digest_is_natural | ( | const ffuzzy_digest * | digest | ) |
Determines whether given digest is valid and "natural".
[in] | digest | Digest (which may not be valid or natural) |
bool ffuzzy_digest_is_natural_buffer | ( | const ffuzzy_digest * | digest | ) |
Determines whether digest blocks are valid and "natural".
This function determines whether valid range of ffuzzy_digest::digest values consist of base64 characters (in other words, "natural").
This function needs valid digest block lengths. If digest block lengths are not guaranteed to be valid, use ffuzzy_digest_is_valid_lengths first.
You may need to use this function even after success call to ffuzzy_read_digest because this function is not guaranteed to set digests with "natural" digest blocks.
However, if you are just comparing, this check is not necessary because fuzzy hash comparison will not decode base64 characters (it just "compares").
You will need this function ONLY if you need to verify whether given digest is truly "natural".
[in] | digest | Digest (which may not be valid or natural but block lengths are valid) |
bool ffuzzy_digest_is_valid | ( | const ffuzzy_digest * | digest | ) |
Determines whether given digest is valid.
[in] | digest | Digest (which may not be valid) |
bool ffuzzy_digest_is_valid_buffer | ( | const ffuzzy_digest * | digest | ) |
Determines whether digest blocks are valid.
This function determines whether there are no sequences which consist of four or more identical characters.
This function needs valid digest block lengths. If digest block lengths are not guaranteed to be valid, use ffuzzy_digest_is_valid_lengths first.
You will not need to use this function if you use ffuzzy_read_digest function because it always returns valid digests on success.
[in] | digest | Digest (which may not be valid but block lengths are valid) |
bool ffuzzy_digest_is_valid_lengths | ( | const ffuzzy_digest * | digest | ) |
Determines whether block lengths of given digest are valid.
[in] | digest | Digest (which may not be valid) |
int ffuzzy_digestcmp | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two ffuzzy_digest values.
This comparison has priorities.
[in] | d1 | Valid digest 1 |
[in] | d2 | Valid digest 2 |
int ffuzzy_digestcmp_blocksize | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two ffuzzy_digest values by block sizes.
[in] | d1 | Valid digest 1 |
[in] | d2 | Valid digest 2 |
int ffuzzy_digestcmp_blocksize_n | ( | const ffuzzy_digest * | d1, |
const ffuzzy_digest * | d2 | ||
) |
Compare two ffuzzy_digest values by whether block sizes are "natural" and block size values.
This comparison has priorities.
[in] | d1 | Valid digest 1 |
[in] | d2 | Valid digest 2 |
bool ffuzzy_pretty_digest | ( | char * | buf, |
size_t | buflen, | ||
const ffuzzy_digest * | digest | ||
) |
Convert ffuzzy_digest to the string.
[out] | buf | Buffer to store string |
buflen | Size of buf | |
[in] | digest | A valid digest to convert |
bool ffuzzy_read_digest | ( | ffuzzy_digest * | digest, |
const char * | s | ||
) |
Read ssdeep digest from the string.
This function always sets valid digest if succeeds.
[out] | digest | The pointer to the buffer to store valid digest after parsing. |
[in] | s | The string which contains a ssdeep digest. |
int ffuzzy_score_cap | ( | int | s1len, |
int | s2len, | ||
unsigned long | block_size | ||
) |
Retrieve score cap for given block lengths and the block size.
The (partial) similarity score is capped when the block is short and the block size is small to prevent exaggerate match. This function returns this score cap for given block lengths and the block size.
s1len | Length of block 1 |
s2len | Length of block 2 |
block_size | Block size |
If s1len or s2len is out of range [0,FFUZZY_SPAMSUM_LENGTH], the value is undefined.
int ffuzzy_score_cap_1 | ( | int | minslen, |
unsigned long | block_size | ||
) |
Retrieve score cap for given block length and size.
ffuzzy_score_cap function computes the score cap by the block size and "minimum" length of the given blocks. This function exposes internal interface of ffuzzy_score_cap.
minslen | Minimum length of the blocks |
block_size | Block size |
If minslen is out of range [0,FFUZZY_SPAMSUM_LENGTH], the value is undefined.
int ffuzzy_score_strings | ( | const char * | s1, |
size_t | s1len, | ||
const char * | s2, | ||
size_t | s2len, | ||
unsigned long | block_size | ||
) |
Compute partial similarity score for given two block strings and block size.
In the fuzzy computation, the digest block of the same block sizes are selected to compare. This is the internal interface for ffuzzy_compare and ffuzzy_compare_digest.
[in] | s1 | Digest block 1 |
s1len | Length of s1 | |
[in] | s2 | Digest block 2 |
s2len | Length of s2 | |
block_size | Block size for two digest blocks |