Skip to content

pgera/efg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

17 Commits

Repository files navigation

About

This project provides an efficient implementation for traversing large compressed graphs on GPUs. Graphs are compressed with Elias-Fano encoding (on the CPU), and the breadth first search (BFS) implementation traverses such graphs on the GPU. Since GPU memory capacity is limited, this approach can accomodate larger graphs in device memory. Graphs are compressed by a factor of 1.5x on average relative to the compressed spare row (CSR) format. If the graphs still do not fit in memory, unified virtual memory (UVM) is used to transfer data at runtime from the host.

Build Requirements

Validated on:

Slightly newer or older versions should work as well. Please create an issue if you face issues during compilation.

Build

  • sudo apt install g++ libboost-all-dev libssl-dev libgoogle-glog-dev libdouble-conversion-dev
  • Build and install folly at <scratch_path> (e.g., /tmp/folly_install)
    git clone https://github /facebook/folly.git
    cd folly
    git checkout v2023.05.22.00
    sudo./build/fbcode_builder/getdeps.py install-system-deps --recursive
    Python 3./build/fbcode_builder/getdeps.py --allow-system-packages --scratch-path /tmp/folly_install build --no-tests
    
  • Clone this project. In the project's Makefile:
    • SetFOLLY_INSTALL_DIRto <scratch_path>/installed/folly
    • SetFMT_INSTALL_DIRto <scratch_path>/installed/fmt-XXX, where XXX will be a string generated during the folly build process.
    • Add -arch=sm_XX to NVCC flags for your GPU architecture for better performance.
  • make -j

Input Graph Format

Input graphs need to be in the Compressed Sparse Row (CSR) format consisting of two binary files,vertex.CSRandedge.CSR.vertex.CSRcontains row-offsets and is of length|V| + 1.edge.CSRcontains column indices and is of length|E|.Each neighbour list inedge.CSRshould be in sorted order. The data type for the input graphs should be 64-bits. This is merely done to simplify the input handling and has no bearing on the compression. The reported compression ratio is relative to the optimal CSR size.

Small sample graphs are included in the repository undersample_graphs.These can be used for basic tests, but are meaningless for any performance measurements. Larger graphs are available athttps://p.gera.io/public/graphs/.

Run

./ef_bfs sample_graphs/tiny

Usage:

./ef_bfs [OPTIONS] INPUT_DIR
OPTIONS:
-h [ --help ] Print help
-n [ --num ] arg (=100) Number of traversals
-m [ --mapfile ] arg Mapfile for remapping vertex ids. If provided, map[x]
will be used instead of x for starting a traversal
-u [ --uvm ] Use UVM for memory allocations
-d [ --nosort ] Disable the frontier sorting optimisation
-r [ --root ] arg Root for the traversal. Random roots will be used if
unspecified
  • Themapfileoption is useful for comparing the performance between reordered versions of the same graph. For example, the dataset above include graphs in their natural order and graphs reordered with HALO, and the reordered graphs include a mapfile. When running the the traversal on reordered graphs, the mapfile should be passed as an argument to get the same traversals.
  • The-dflag disables the partial frontier sorting optimisation. This should generally not be required, but it saves memory. It can be useful if the compressed graph barely fits in the GPU. For example, this is required for theuk-2007-05graph on a 12 GiB GPU.
  • The-uflag enables UVM allocations, which is useful for massive graphs that do not fit even after compression. DO NOT use-dalong with-uas the performance will be very poor without the sorting optimisation in the UVM regime.

Performance

BFS results averaged over 100 traversals from random sources:

Graph |V| (|E|) CSR Size
(GiB)
EFG Size
(GiB)
Performance
Titan Xp (12 GiB Mem)
Performance
V100 (32 GiB Mem)
twitter (d) 41.6 M (1.47 B) 5.63 3.33 238 ms
5.58 GTEPS
127 ms
10.47 GTEPS
gsh-h-15 (d) 68.66 M (1.8 B) 6.97 4.73 174 ms
7.57 GTEPS
120 ms
10.94 GTEPS
frndster (u) 65.61 M (3.61 B) 13.7 9.15 1006 ms
3.59 GTEPS
349 ms
10.35 GTEPS
uk-07-05 (d) 105.22 M (3.74 B) 14.32 10.31 212 ms
11.06 GTEPS
117 ms
20.02 GTEPS
kron27_s (u) 63.07 M (4.22 B) 15.97 9.23 997 ms
4.27 GTEPS
370 ms
11.43 GTEPS

The last three graphs would not have fit in memory on the Titan Xp GPU without compression.

UVM

Graphs that do not fit even after compression use UVM. Compression is still beneficial for reducing the data transferred over the interconnect. The following graphs use UVM on the Titan Xp GPU

Graph |V| (|E|) CSR Size
(GiB)
EFG Size
(GiB)
Performance
Titan Xp (12 GiB Mem)
Performance
V100 (32 GiB Mem)
molr16 (u) 30.22 M (6.68 B) 25.1 14.5 2148 ms (UVM)
3.07 GTEPS
296 ms
22.32 GTEPS
uk-75_s (u) 105.22 M (6.62 B) 25.47 15.43 2825 ms (UVM)
2.34 GTEPS
284 ms
23.25 GTEPS

Documentation

Please refer to the IPDPS '23paperfor more details. If you find this useful, please cite it as:

@inproceedings{gera2023traversing,
title={Traversing Large Compressed Graphs on GPUs},
author={Gera, Prasun and Kim, Hyesoon},
booktitle={2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
pages={25--35},
year={2023},
organization={IEEE}
}