Compare commits

...

3 Commits

Author SHA1 Message Date
Sven Vogel 1f5a530c60 implemented proper linear loop for dot product 2023-05-03 08:46:13 +02:00
Sven Vogel 9b7d91ad5b updated README.md 2023-05-01 15:32:47 +02:00
Sven Vogel 6c0ec16a9c added README.md 2023-05-01 15:13:16 +02:00
4 changed files with 107 additions and 23 deletions

View File

@ -1,3 +1,3 @@
# Rust-Programming # Rust-Programming
Repository hosting code of the excercises made during class This repository contains several cargo projects that implement various tasks from the lesson rust programming at DHBW.

View File

@ -11,3 +11,6 @@ futures = "0.3.28"
jemalloc-ctl = "0.5.0" jemalloc-ctl = "0.5.0"
jemallocator = "0.5.0" jemallocator = "0.5.0"
bytesize = "1.2.0" bytesize = "1.2.0"
[features]
binary_search = []

60
sparse_vector/README.md Normal file
View File

@ -0,0 +1,60 @@
# Sparse Vector Implementations
This repository aims at comparing various implementations of sparse vectors.
## What is a sparse vector?
A sparse vector is a vector in which most of its elements are zero.
That makes is easier to store because the many zero elements must not be stored.
Though this comes a the cost that we may need to decide between memory saving and computation time.
## Implementations overview
* Hashmap
* Index Array
* Compressed Index Array
### Index Array
We can omit all zero elements by storing an index array alongside all non zero values. Each value will be associated with an index in from the index array. This model is only efficient in memory size when the amount of zero elements is at least 50%. Since I used `usize` to store the indices, which is equal to a `u64` in 64-bit architectures, The required memory is:
```
mem(N) = non_zero_elements * (8 Bytes + 8 Bytes)
```
One significant downside is the cost of finding each corresponding entry when performing calculations such as the dot product. For this I used a binary search which gives a nive speedup.
### Hashmap Implementation
This implementation uses a hashmap to associate a value with its corresponding index in the vectors column. In Theory this should be as efficient in memory size as the previous array index method.
But in comparision this method requires signifacantly more memory since a hashmap allocates more memory than it can fill in order to reduce collisions.
It has one significant benefit, that being speed in calculations. Looking up values in a hashmap is generally faster than performing a binary seach. Also inserting and deleting is an O(1) operation.
### Compressed Index Array
In order to reduce the size required to store the indices of each value we can compress them by only storing the relative offset to the previous value:
| Uncompressed Index | 0 | 7 | 13 | 33 | 45 | 47 | 48 | 57 | ... | 34567 |
| -------------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ---- | ----- | ------- |
| Compressed Index | 0 | 7 | 6 | 20 | 12 | 2 | 1 | 9 | ... | 23 |
This yields smaller values. Thus we can savely reduce the bandwidth of available bits to store.
In this implementation I reduced to size from 64 to 16 bit. This makes memory usage a lot smaller, but computation gets a lot heavier, since all values have to be decompressed on the fly. A possible improvement would be to cache uncompressed values. May be worth investigating futher.
## Comparision
The following values were achieved by using a randomly initialized vector with a length of 10^10 elements from which 2% were non zero. The dot product implementation was single threaded.
| Implementation | Size on Heap (GB) | Runtime of dot product (s) |
| :----------------------- | ------------------- | ---------------------------- |
| Naive | 80 | N/A |
| Index Array | 3.6 | 6.254261896 |
| Hashmap | 5.4 | 0.732189927 |
| Compressed Index Array | 2.0 | > 120 |

View File

@ -1,11 +1,7 @@
use std::ops::{Add, Mul, Sub};
use std::thread;
use std::time::Instant;
use bytesize::ByteSize; use bytesize::ByteSize;
use futures::executor::block_on; use jemalloc_ctl::{epoch, stats};
use rand::Rng; use rand::Rng;
use futures::future::{join_all}; use std::time::Instant;
use jemalloc_ctl::{stats, epoch};
#[global_allocator] #[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc; static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
@ -17,13 +13,35 @@ pub struct SparseVec {
} }
impl SparseVec { impl SparseVec {
pub fn dot(&self, other: &SparseVec) -> f64 { pub fn dot(&self, other: &SparseVec) -> f64 {
let mut sum = 0.0; let mut sum = 0.0;
for index in 0..other.indices.len() { #[cfg(not(feature="binary_search"))]
// exponential search for an element in the second vector to have the same index {
sum += binary_search(self.indices[index], &other.indices, &other.values) * self.values[index]; let mut x = 0;
let mut y = 0;
while x < self.indices.len() && y < other.indices.len() {
if self.indices[x] == other.indices[y] {
sum += self.values[x] * other.values[y];
x += 1;
y += 1;
} else if self.indices[x] > other.indices[y] {
y += 1;
} else {
x += 1;
}
}
}
#[cfg(feature="binary_search")]
{
for index in 0..other.indices.len() {
// binary search for an element in the second vector to have the same index
sum += binary_search(self.indices[index], &other.indices, &other.values)
* self.values[index];
}
} }
sum sum
@ -40,14 +58,12 @@ impl SparseVec {
for i in 0..non_zero_elements { for i in 0..non_zero_elements {
values.push(0.5); values.push(0.5);
let idx = i as f32 / non_zero_elements as f32 * (elements as f32 - 4.0) + rng.gen_range(0.0..3.0); let idx = i as f32 / non_zero_elements as f32 * (elements as f32 - 4.0)
+ rng.gen_range(0.0..3.0);
indices.push(idx as usize); indices.push(idx as usize);
} }
Self { Self { values, indices }
values,
indices
}
} }
} }
@ -80,11 +96,10 @@ macro_rules! time {
let start = Instant::now(); let start = Instant::now();
$block; $block;
println!("{} took {}s", $name, start.elapsed().as_secs_f64()); println!("{} took {}s", $name, start.elapsed().as_secs_f64());
}} }};
} }
fn main() { fn main() {
/// Theoretical size of the vector in elements /// Theoretical size of the vector in elements
/// This would mean the we would require 10 GBs of memory to store a single vector /// This would mean the we would require 10 GBs of memory to store a single vector
const VECTOR_SIZE: usize = 10_000_000_000; const VECTOR_SIZE: usize = 10_000_000_000;
@ -96,10 +111,13 @@ fn main() {
let non_zero_elements = (VECTOR_SIZE as f64 * NULL_NON_NULL_RATIO) as usize; let non_zero_elements = (VECTOR_SIZE as f64 * NULL_NON_NULL_RATIO) as usize;
let heap_element_size = std::mem::size_of::<f64>() + std::mem::size_of::<usize>(); let heap_element_size = std::mem::size_of::<f64>() + std::mem::size_of::<usize>();
println!("Estimated size on heap: {}", ByteSize::b((non_zero_elements * heap_element_size) as u64)); println!(
"Estimated size on heap: {}",
ByteSize::b((non_zero_elements * heap_element_size) as u64)
);
println!("Size on stack: {} B", std::mem::size_of::<SparseVec>()); println!("Size on stack: {} B", std::mem::size_of::<SparseVec>());
let mut vec: SparseVec; let vec: SparseVec;
time!("Sparse vector creation", { time!("Sparse vector creation", {
// generate a vector // generate a vector
@ -108,7 +126,10 @@ fn main() {
// many statistics are cached and only updated when the epoch is advanced. // many statistics are cached and only updated when the epoch is advanced.
epoch::advance().unwrap(); epoch::advance().unwrap();
println!("Heap allocated bytes (total): {}", ByteSize::b(stats::allocated::read().unwrap() as u64)); println!(
"Heap allocated bytes (total): {}",
ByteSize::b(stats::allocated::read().unwrap() as u64)
);
time!("Sparse vector dot product", { time!("Sparse vector dot product", {
vec.dot(&vec); vec.dot(&vec);