Draft program for the Rust class of semester 2. The goal was to write a search engine for images.
Go to file
teridax 6f2c6ce7aa
Update README.md
added link to benchmarks
2023-06-18 09:56:07 +00:00
.github/workflows Initial commit 2023-05-17 09:15:28 +02:00
benches added benchmark for indexing 2023-06-18 11:46:44 +02:00
res added benchmark for indexing 2023-06-18 11:46:44 +02:00
src added benchmark for indexing 2023-06-18 11:46:44 +02:00
tests added integration test 2023-06-17 23:27:07 +02:00
.gitignore added integration test 2023-06-17 23:27:07 +02:00
Cargo.toml added benchmark for indexing 2023-06-18 11:46:44 +02:00
LICENSE Create LICENSE 2023-06-15 15:24:54 +00:00
Programmentwurf.md Update Programmentwurf.md 2023-05-19 09:01:37 +02:00
README.md Update README.md 2023-06-18 09:56:07 +00:00

README.md

Imsearch

Extensible library for creating an image based search engine. The library exposes the functionality to create databases which index various images stored as png files.

Files can be compared for similarity by either premade features or custom ones. The basic idea of handling the library is as follows:

  • Create a new database
  • Add some features to the database
  • Add some images to the database
  • Search for some images in the database by a certain feature
  • Save the database to disk

or:

  • Load a database form disk
  • Supply generator functions
  • Search for some images in the database by a certain feature
  • Add some images to the database
  • Save the database to disk

Examples:

Define a new feature

/// Compute the average value of the tree color channels of a given image
fn average_rgb_value(image: Arc<Image<f32>>) -> (String, FeatureResult) {
    let bright = image
        .pixels()
        .iter()
        .map(|(r, g, b, _)| (r + g + b) / 3.0 / 255.0)
        .sum::<f32>();

    (
        String::from("average_brightness"),
        FeatureResult::Percent(bright / image.pixels().len() as f32),
    )
}

Create a new database

let files: Vec<PathBuf> = std::fs::read_dir("image/folder/")
    .unwrap()
    .map(|f| f.unwrap().path())
    .collect();

let feats: Vec<FeatureGenerator> = vec![average_rgb_value];

let db = Database::new(&files, feats).unwrap();

db.write_to_file(json);

Read a new database and search for similar images

let db = Database::from_file(Path::new("db.json"));

for results in db
    .search(
        std::path::Path::new("path/to/image.png"),
        average_brightness,
    )
    .unwrap()
{
    println!(
        "path: {} similarity: {}",
        results.0.as_os_str().to_str().unwrap(),
        results.1
    );
}

Details

Processing of features for images are multithreaded. Features that are calculated for images only get their results stored. The generator function used to calculate won't get serialized. This implies that in order to compute the features for images the generator functions have to be passed to the database after it has been read from a file.

Limiting thread usage

You can limit the number of threads to be used by calling set_limit() on the database. Note that the thread pool will automatically try to detect the optimal number of threads to use. As long as no edge case such as running in an over committed virtual machine applies this will be good enough for most cases.

Image formats

The library can only handle png files through the png crate. Note that not all colortypes are supported. Due to the poor capabilites of the crate pngs with indexed palettes are not functional will cause functions to return error values.

Memory usage

The database won't hold all images in ram at the same time. They are loaded on demand when calculating features for them. This may cause increased disk usage but will prevent ram overcommitment.

Benchmark results for system with 12 threads (i5-10400) can be found here: Benchmarks If you encounter issues with this link contact admin@teridax.de