libffuzzy  2.1
Fast ssdeep comparison library
 All Data Structures Files Functions Variables Macros
Data Structures | Macros
ffuzzy.h File Reference

Public API for libffuzzy. More...

#include <stdbool.h>
#include <stddef.h>

Go to the source code of this file.

Data Structures

struct  ffuzzy_digest
 The type to store ssdeep digest after parsing. More...
 
struct  ffuzzy_udigest
 The type to store unnormalized ssdeep digest after parsing. More...
 

Macros

#define FFUZZY_SPAMSUM_LENGTH   64
 Maximum length for the digest block.
 
#define FFUZZY_MIN_BLOCKSIZE   3
 Minimum block size to start in ssdeep implementation.
 
#define FFUZZY_MIN_MATCH   7
 The minimal match (length of common substring) required for (at least) one of the block digests.
 

Functions

Comparison and Parsing
bool ffuzzy_read_digest (ffuzzy_digest *digest, const char *s)
 Read ssdeep digest from the string. More...
 
int ffuzzy_compare_digest (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two fuzzy hashes and compute similarity score. More...
 
int ffuzzy_compare (const char *str1, const char *str2)
 Compute similarity score for given ssdeep hash strings. More...
 
Optimized / Specialized Comparison
int ffuzzy_compare_digest_near (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two fuzzy hashes assuming two block sizes of given hashes are "near". More...
 
int ffuzzy_compare_digest_near_eq (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two fuzzy hashes assuming two block sizes are same. More...
 
int ffuzzy_compare_digest_near_lt (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two fuzzy hashes assuming second block size is double as first one. More...
 
Block Size Utilities
bool ffuzzy_blocksize_is_valid (unsigned long block_size)
 Determines whether given block size is valid to use in libffuzzy. More...
 
bool ffuzzy_blocksize_is_natural (unsigned long block_size)
 Determines whether given block size is "natural". More...
 
bool ffuzzy_blocksize_is_near (unsigned long block_size1, unsigned long block_size2)
 Determines whether given block sizes are "near". More...
 
bool ffuzzy_blocksize_is_far_le (unsigned long block_size1, unsigned long block_size2)
 Determines whether given ordered block sizes "far" enough. More...
 
Digest Utilities
bool ffuzzy_digest_is_valid_lengths (const ffuzzy_digest *digest)
 Determines whether block lengths of given digest are valid. More...
 
bool ffuzzy_digest_is_valid_buffer (const ffuzzy_digest *digest)
 Determines whether digest blocks are valid. More...
 
bool ffuzzy_digest_is_natural_buffer (const ffuzzy_digest *digest)
 Determines whether digest blocks are valid and "natural". More...
 
bool ffuzzy_digest_is_valid (const ffuzzy_digest *digest)
 Determines whether given digest is valid. More...
 
bool ffuzzy_digest_is_natural (const ffuzzy_digest *digest)
 Determines whether given digest is valid and "natural". More...
 
int ffuzzy_digestcmp (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two ffuzzy_digest values. More...
 
int ffuzzy_digestcmp_blocksize (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two ffuzzy_digest values by block sizes. More...
 
int ffuzzy_digestcmp_blocksize_n (const ffuzzy_digest *d1, const ffuzzy_digest *d2)
 Compare two ffuzzy_digest values by whether block sizes are "natural" and block size values. More...
 
bool ffuzzy_pretty_digest (char *buf, size_t buflen, const ffuzzy_digest *digest)
 Convert ffuzzy_digest to the string. More...
 
Unnormalized Digests
bool ffuzzy_read_udigest (ffuzzy_udigest *udigest, const char *s)
 Read unnormalized ssdeep digest from the string. More...
 
bool ffuzzy_udigest_is_valid_lengths (const ffuzzy_udigest *udigest)
 Determines whether block lengths of given digest are valid. More...
 
bool ffuzzy_udigest_is_natural_buffer (const ffuzzy_udigest *udigest)
 Determines whether digest blocks are "natural". More...
 
bool ffuzzy_udigest_is_valid (const ffuzzy_udigest *udigest)
 Determines whether given digest is valid. More...
 
bool ffuzzy_udigest_is_natural (const ffuzzy_udigest *udigest)
 Determines whether given digest is valid and "natural". More...
 
int ffuzzy_udigestcmp (const ffuzzy_udigest *d1, const ffuzzy_udigest *d2)
 Compare two ffuzzy_udigest values. More...
 
int ffuzzy_udigestcmp_blocksize (const ffuzzy_udigest *d1, const ffuzzy_udigest *d2)
 Compare two ffuzzy_udigest values by block sizes. More...
 
int ffuzzy_udigestcmp_blocksize_n (const ffuzzy_udigest *d1, const ffuzzy_udigest *d2)
 Compare two ffuzzy_udigest values by whether block sizes are "natural" and block size values. More...
 
bool ffuzzy_pretty_udigest (char *buf, size_t buflen, const ffuzzy_udigest *udigest)
 Convert ffuzzy_udigest to the string. More...
 
void ffuzzy_convert_digest_to_udigest (ffuzzy_udigest *udigest, const ffuzzy_digest *digest)
 Convert ffuzzy_digest to ffuzzy_udigest. More...
 
void ffuzzy_convert_udigest_to_digest (ffuzzy_digest *digest, const ffuzzy_udigest *udigest)
 Convert ffuzzy_udigest to ffuzzy_digest. More...
 
Internal Comparison Utilities
int ffuzzy_score_cap (int s1len, int s2len, unsigned long block_size)
 Retrieve score cap for given block lengths and the block size. More...
 
int ffuzzy_score_cap_1 (int minslen, unsigned long block_size)
 Retrieve score cap for given block length and size. More...
 
int ffuzzy_score_strings (const char *s1, size_t s1len, const char *s2, size_t s2len, unsigned long block_size)
 Compute partial similarity score for given two block strings and block size. More...
 

Detailed Description

Public API for libffuzzy.

Function Documentation

bool ffuzzy_blocksize_is_far_le ( unsigned long  block_size1,
unsigned long  block_size2 
)

Determines whether given ordered block sizes "far" enough.

In this context, "far" means the second block size is greater than double of the first block size.

For block size-sorted digests, "far" means there are no subsequent entries which will match.

This function determines whether given block sizes are "far".

You may want to inline or reimplement this because this function is very easy. There's nothing preventing you to do that.

Parameters
block_size1Valid block size 1
block_size2Valid block size 2 (must be equal or greater than block_size1)
Returns
true if the given block sizes are "far"; false otherwise.
bool ffuzzy_blocksize_is_natural ( unsigned long  block_size)

Determines whether given block size is "natural".

In this context, "natural" means given parameter of fuzzy hash may be generated by ssdeep or its backend, libfuzzy. Depending on the job, handling only "natural" digests may make your program efficient.

This function doesn't only check whether the block size is valid, but it checks the given size is a product of FFUZZY_MIN_BLOCKSIZE and a power of two.

Parameters
block_sizeBlock size (which may not be valid or "natural")
Returns
true if the given block size is valid and "natural"; false otherwise.
bool ffuzzy_blocksize_is_near ( unsigned long  block_size1,
unsigned long  block_size2 
)

Determines whether given block sizes are "near".

In this context, "near" means two block sizes are equal or one of the block size is twice as other.

This function determines whether given block sizes are "near". If this function returns true, it is safe to use ffuzzy_compare_digest_near function for two digests which have given block sizes.

Parameters
block_size1Valid block size 1
block_size2Valid block size 2
Returns
true if the given block sizes are "near"; false otherwise.
bool ffuzzy_blocksize_is_valid ( unsigned long  block_size)

Determines whether given block size is valid to use in libffuzzy.

To prevent arithmetic overflow, not all unsigned long values are valid in libffuzzy. This function determines whether the given block size is valid and safe to use in libffuzzy.

You will not need to use this function if you use ffuzzy_read_digest function because it always returns valid digest on success.

Note that this is not the restriction of ssdeep digest, but restriction of the implementation.

Parameters
block_sizeBlock size (which may not be valid)
Returns
true if the given block size is valid; false otherwise.
int ffuzzy_compare ( const char *  str1,
const char *  str2 
)

Compute similarity score for given ssdeep hash strings.

Parameters
[in]str1ssdeep hash 1
[in]str2ssdeep hash 2
Returns
[0,100] values represent similarity score or negative values on failure.
int ffuzzy_compare_digest ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two fuzzy hashes and compute similarity score.

Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
[0,100] values represent similarity score or negative values on failure.
int ffuzzy_compare_digest_near ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two fuzzy hashes assuming two block sizes of given hashes are "near".

In this context, "near" means two block sizes are equal or one of the block size is twice as other.

This function assumes two block sizes are "near" (ffuzzy_blocksize_is_near on two block sizes returns true) and make the computation slightly faster.

Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
[0,100] values represent similarity score or negative values on failure.
int ffuzzy_compare_digest_near_eq ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two fuzzy hashes assuming two block sizes are same.

This function assumes two block sizes are same.

Parameters
[in]d1Valid digest 1 (with same block size as d2)
[in]d2Valid digest 2 (with same block size as d1)
Returns
[0,100] values represent similarity score or negative values on failure.
See also
int ffuzzy_compare_digest_near(const ffuzzy_digest*, const ffuzzy_digest*)
int ffuzzy_compare_digest_near_lt ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two fuzzy hashes assuming second block size is double as first one.

This function assumes second block size is double as first one.

Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2 (with double block size as d1)
Returns
[0,100] values represent similarity score or negative values on failure.
See also
int ffuzzy_compare_digest_near(const ffuzzy_digest*, const ffuzzy_digest*)
void ffuzzy_convert_digest_to_udigest ( ffuzzy_udigest udigest,
const ffuzzy_digest digest 
)

Convert ffuzzy_digest to ffuzzy_udigest.

Parameters
[out]udigestThe pointer to buffer to the unnormalized digest
[in]digestThe pointer to the valid digest
void ffuzzy_convert_udigest_to_digest ( ffuzzy_digest digest,
const ffuzzy_udigest udigest 
)

Convert ffuzzy_udigest to ffuzzy_digest.

Parameters
[out]digestThe pointer to buffer to the normalized digest
[in]udigestThe pointer to the valid and unnormalized digest
bool ffuzzy_digest_is_natural ( const ffuzzy_digest digest)

Determines whether given digest is valid and "natural".

Parameters
[in]digestDigest (which may not be valid or natural)
Returns
true if the digest is valid and natural; false otherwise.
bool ffuzzy_digest_is_natural_buffer ( const ffuzzy_digest digest)

Determines whether digest blocks are valid and "natural".

This function determines whether valid range of ffuzzy_digest::digest values consist of base64 characters (in other words, "natural").

This function needs valid digest block lengths. If digest block lengths are not guaranteed to be valid, use ffuzzy_digest_is_valid_lengths first.

You may need to use this function even after success call to ffuzzy_read_digest because this function is not guaranteed to set digests with "natural" digest blocks.

However, if you are just comparing, this check is not necessary because fuzzy hash comparison will not decode base64 characters (it just "compares").

You will need this function ONLY if you need to verify whether given digest is truly "natural".

Parameters
[in]digestDigest (which may not be valid or natural but block lengths are valid)
Returns
true if the digest blocks are valid and "natural"; false otherwise.
bool ffuzzy_digest_is_valid ( const ffuzzy_digest digest)

Determines whether given digest is valid.

Parameters
[in]digestDigest (which may not be valid)
Returns
true if the digest is valid; false otherwise.
bool ffuzzy_digest_is_valid_buffer ( const ffuzzy_digest digest)

Determines whether digest blocks are valid.

This function determines whether there are no sequences which consist of four or more identical characters.

This function needs valid digest block lengths. If digest block lengths are not guaranteed to be valid, use ffuzzy_digest_is_valid_lengths first.

You will not need to use this function if you use ffuzzy_read_digest function because it always returns valid digests on success.

Parameters
[in]digestDigest (which may not be valid but block lengths are valid)
Returns
true if the digest blocks are valid; false otherwise.
bool ffuzzy_digest_is_valid_lengths ( const ffuzzy_digest digest)

Determines whether block lengths of given digest are valid.

Parameters
[in]digestDigest (which may not be valid)
Returns
true if values of ffuzzy_digest::len1 and ffuzzy_digest::len2 are valid.
int ffuzzy_digestcmp ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two ffuzzy_digest values.

This comparison has priorities.

  1. Compare block sizes.
  2. Compare block lengths of the first block.
  3. Compare block lengths of the second block.
  4. Compare block buffer contents (first and second).
Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
Positive value if d1 < d2, negativa value if d2 > d1 and 0 if d1 is equal to d2.
int ffuzzy_digestcmp_blocksize ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two ffuzzy_digest values by block sizes.

Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
Positive value if d1 < d2, negativa value if d2 > d1 and 0 if block size of d1 is equal to d2.
See also
int ffuzzy_digestcmp(const ffuzzy_digest*, const ffuzzy_digest*)
int ffuzzy_digestcmp_blocksize_n ( const ffuzzy_digest d1,
const ffuzzy_digest d2 
)

Compare two ffuzzy_digest values by whether block sizes are "natural" and block size values.

This comparison has priorities.

  1. Compare whether block sizes are "natural" (for ffuzzy_blocksize_is_natural return value, true comes first)
  2. Compare block sizes.
Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
Positive value if d1 < d2, negativa value if d2 > d1 and 0 if block size of d1 is equal to d2.
See also
bool ffuzzy_blocksize_is_natural(unsigned long)
int ffuzzy_digestcmp(const ffuzzy_digest*, const ffuzzy_digest*)
bool ffuzzy_pretty_digest ( char *  buf,
size_t  buflen,
const ffuzzy_digest digest 
)

Convert ffuzzy_digest to the string.

Parameters
[out]bufBuffer to store string
buflenSize of buf
[in]digestA valid digest to convert
Returns
true if succeeds; false otherwise.
bool ffuzzy_pretty_udigest ( char *  buf,
size_t  buflen,
const ffuzzy_udigest udigest 
)

Convert ffuzzy_udigest to the string.

Parameters
[out]bufBuffer to store string
buflenSize of buf
[in]udigestA valid digest to convert
Returns
true if succeeds; false otherwise.
bool ffuzzy_read_digest ( ffuzzy_digest digest,
const char *  s 
)

Read ssdeep digest from the string.

This function always sets valid digest if succeeds.

Parameters
[out]digestThe pointer to the buffer to store valid digest after parsing.
[in]sThe string which contains a ssdeep digest.
Returns
true if succeeds; false otherwise.
bool ffuzzy_read_udigest ( ffuzzy_udigest udigest,
const char *  s 
)

Read unnormalized ssdeep digest from the string.

This function always sets valid and unnormalized digest if succeeds.

Parameters
[out]udigestThe pointer to the buffer to store valid unnormalized digest after parsing.
[in]sThe string which contains a ssdeep digest.
Returns
true if succeeds; false otherwise.
See also
ffuzzy_udigest
int ffuzzy_score_cap ( int  s1len,
int  s2len,
unsigned long  block_size 
)

Retrieve score cap for given block lengths and the block size.

The (partial) similarity score is capped when the block is short and the block size is small to prevent exaggerate match. This function returns this score cap for given block lengths and the block size.

Parameters
s1lenLength of block 1
s2lenLength of block 2
block_sizeBlock size
Returns
Maximum (partial) similarity score value. If the return value is greater than 100, the score cap is 100.

If s1len or s2len is out of range [0,FFUZZY_SPAMSUM_LENGTH], the value is undefined.

int ffuzzy_score_cap_1 ( int  minslen,
unsigned long  block_size 
)

Retrieve score cap for given block length and size.

ffuzzy_score_cap function computes the score cap by the block size and "minimum" length of the given blocks. This function exposes internal interface of ffuzzy_score_cap.

Parameters
minslenMinimum length of the blocks
block_sizeBlock size
Returns
Maximum (partial) similarity score value. If the return value is greater than 100, the score cap is 100.

If minslen is out of range [0,FFUZZY_SPAMSUM_LENGTH], the value is undefined.

int ffuzzy_score_strings ( const char *  s1,
size_t  s1len,
const char *  s2,
size_t  s2len,
unsigned long  block_size 
)

Compute partial similarity score for given two block strings and block size.

In the fuzzy computation, the digest block of the same block sizes are selected to compare. This is the internal interface for ffuzzy_compare and ffuzzy_compare_digest.

Parameters
[in]s1Digest block 1
s1lenLength of s1
[in]s2Digest block 2
s2lenLength of s2
block_sizeBlock size for two digest blocks
Returns
[0,100] values represent partial similarity score or negative values on failure.
bool ffuzzy_udigest_is_natural ( const ffuzzy_udigest udigest)

Determines whether given digest is valid and "natural".

Parameters
[in]udigestUnnormalized digest (which may not be valid or natural)
Returns
true if the digest is valid and "natural"; false otherwise.
bool ffuzzy_udigest_is_natural_buffer ( const ffuzzy_udigest udigest)

Determines whether digest blocks are "natural".

This function determines whether valid range of ffuzzy_udigest::digest values consist of base64 characters (in other words, "natural").

This function needs valid digest block lengths. If digest block lengths are not guaranteed to be valid, use ffuzzy_udigest_is_valid_lengths first.

You will need this function ONLY if you need to verify whether given digest is truly "natural".

Parameters
[in]udigestUnnormalized digest (which may not be natural but block lengths are valid)
Returns
true if the digest blocks are "natural"; false otherwise.
bool ffuzzy_udigest_is_valid ( const ffuzzy_udigest udigest)

Determines whether given digest is valid.

Parameters
[in]udigestUnnormalized digest (which may not be valid)
Returns
true if the digest is valid; false otherwise.
bool ffuzzy_udigest_is_valid_lengths ( const ffuzzy_udigest udigest)

Determines whether block lengths of given digest are valid.

Parameters
[in]udigestUnnormalized digest (which may not be valid)
Returns
true if values of ffuzzy_udigest::len1 and ffuzzy_udigest::len2 are valid.
int ffuzzy_udigestcmp ( const ffuzzy_udigest d1,
const ffuzzy_udigest d2 
)

Compare two ffuzzy_udigest values.

This comparison has priorities.

  1. Compare block sizes.
  2. Compare block lengths of the first block.
  3. Compare block lengths of the second block.
  4. Compare block buffer contents (first and second).
Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
Positive value if d1 < d2, negativa value if d2 > d1 and 0 if d1 is equal to d2.
int ffuzzy_udigestcmp_blocksize ( const ffuzzy_udigest d1,
const ffuzzy_udigest d2 
)

Compare two ffuzzy_udigest values by block sizes.

Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
Positive value if d1 < d2, negativa value if d2 > d1 and 0 if block size of d1 is equal to d2.
See also
int ffuzzy_udigestcmp(const ffuzzy_udigest*, const ffuzzy_udigest*)
int ffuzzy_udigestcmp_blocksize_n ( const ffuzzy_udigest d1,
const ffuzzy_udigest d2 
)

Compare two ffuzzy_udigest values by whether block sizes are "natural" and block size values.

This comparison has priorities.

  1. Compare whether block sizes are "natural" (for ffuzzy_blocksize_is_natural return value, true comes first)
  2. Compare block sizes.
Parameters
[in]d1Valid digest 1
[in]d2Valid digest 2
Returns
Positive value if d1 < d2, negativa value if d2 > d1 and 0 if block size of d1 is equal to d2.
See also
bool ffuzzy_blocksize_is_natural(unsigned long)
int ffuzzy_udigestcmp(const ffuzzy_udigest*, const ffuzzy_udigest*)