faiss相似性搜索和向量聚类库 faiss: A library for efficient similarity search and clustering of dense vectors.

标签: | 发表时间:2018-10-12 11:31 | 作者:
出处:https://github.com

Faiss是一个有效的相似性搜索和密集向量聚类的库。它包含搜索任意大小的向量集的算法,包括不适合放入RAM的数据集。它还包含用于评估和参数调整的支持代码。Faiss是用C ++编写的,包含Python / numpy的完整包装。一些最有用的算法是在GPU上实现的。它由 Facebook AI Research开发。

Faiss

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

NEWS

NEW: version 1.4.0 (2018-08-30) no more crashes in pure Python code

NEW: version 1.3.0 (2018-07-12) support for binary indexes

NEW: latest commit (2018-02-22) supports on-disk storage of inverted indexes, see demos/demo_ondisk_ivf.py

NEW: latest commit (2018-01-09) includes an implementation of the HNSW indexing method, see benchs/bench_hnsw.py

NEW: there is now a Facebook public discussion group for Faiss users at https://www.facebook.com/groups/faissusers/

NEW: on 2017-07-30, the license on Faiss was relaxed to BSD from CC-BY-NC. Read LICENSE for details.

Introduction

Faiss contains several methods for similarity search. It assumes that the instances are represented as vectors and are identified by an integer, and that the vectors can be compared with L2 distances or dot products. Vectors that are similar to a query vector are those that have the lowest L2 distance or the highest dot product with the query vector. It also supports cosine similarity, since this is a dot product on normalized vectors.

Most of the methods, like those based on binary vectors and compact quantization codes, solely use a compressed representation of the vectors and do not require to keep the original vectors. This generally comes at the cost of a less precise search but these methods can scale to billions of vectors in main memory on a single server.

The GPU implementation can accept input from either CPU or GPU memory. On a server with GPUs, the GPU indexes can be used a drop-in replacement for the CPU indexes (e.g., replace IndexFlatL2with GpuIndexFlatL2) and copies to/from GPU memory are handled automatically. Results will be faster however if both input and output remain resident on the GPU. Both single and multi-GPU usage is supported.

Building

The library is mostly implemented in C++, with optional GPU support provided via CUDA, and an optional Python interface. The CPU version requires a BLAS library. It compiles with a Makefile and can be packaged in a docker image. See INSTALL.mdfor details.

How Faiss works

Faiss is built around an index type that stores a set of vectors, and provides a function to search in them with L2 and/or dot product vector comparison. Some index types are simple baselines, such as exact search. Most of the available indexing structures correspond to various trade-offs with respect to

  • search time
  • search quality
  • memory used per index vector
  • training time
  • need for external data for unsupervised training

The optional GPU implementation provides what is likely (as of March 2017) the fastest exact and approximate (compressed-domain) nearest neighbor search implementation for high-dimensional vectors, fastest Lloyd's k-means, and fastest small k-selection algorithm known. The implementation is detailed here.

Full documentation of Faiss

The following are entry points for documentation:

Authors

The main authors of Faiss are:

Reference

Reference to cite when you use Faiss in a research paper:

    @article{JDH17,
  title={Billion-scale similarity search with GPUs},
  author={Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
  journal={arXiv preprint arXiv:1702.08734},
  year={2017}
}

Join the Faiss community

For public discussion of Faiss or for questions, there is a Facebook public discussion group at https://www.facebook.com/groups/faissusers/

We monitor the issues pageof the repository. You can report bugs, ask questions, etc.

License

Faiss is BSD-licensed. We also provide an additional patent grant.

相关 [faiss 相似 搜索] 推荐:

faiss相似性搜索和向量聚类库 faiss: A library for efficient similarity search and clustering of dense vectors.

- -
Faiss是一个有效的相似性搜索和密集向量聚类的库. 它包含搜索任意大小的向量集的算法,包括不适合放入RAM的数据集. 它还包含用于评估和参数调整的支持代码. Faiss是用C ++编写的,包含Python / numpy的完整包装. 一些最有用的算法是在GPU上实现的. 它由 Facebook AI Research开发.

FAISS + SBERT实现的十亿级语义相似性搜索

- - 雷锋网
译者:AI研习社( FIONAbiubiu). 双语原文链接: Billion-scale semantic similarity search with FAISS+SBERT. 语义搜索是一种关注句子意义而不是传统的关键词匹配的信息检索系统. 尽管有许多文本嵌入可用于此目的,但将其扩展到构建低延迟api以从大量数据集合中获取数据是很少讨论的.

facebook-faiss库 - YiLiang - CSDN博客

- -
三月初,Facebook AI Research(FAIR)开源了一个名为 Faiss 的库,Faiss 主要用于有效的相似性搜索(Similarity Search)和稠密矢量聚类(Clustering of dense vectors),包含了在任何大小的矢量集合里进行搜索的算法. Faiss 上矢量集合的大小甚至可以大到装不进 RAM.

相似图片搜索的原理

- apuar - 阮一峰的网络日志
上个月,Google把"相似图片搜索"正式放上了首页. 你可以用一张图片,搜索互联网上所有与它相似的图片. 你输入网片的网址,或者直接上传图片,Google就会找出与其相似的图片. 下面这张图片是美国女演员Alyson Hannigan. 上传后,Google返回如下结果:. 类似的"相似图片搜索引擎"还有不少,TinEye甚至可以找出照片的拍摄背景.

相似图片搜索的原理(二)

- - 阮一峰的网络日志
二年前,我写了 《相似图片搜索的原理》,介绍了一种最简单的实现方法. 昨天,我在 isnowfy的网站看到,还有其他两种方法也很简单,这里做一些笔记. 每张图片都可以生成 颜色分布的直方图(color histogram). 如果两张图片的直方图很接近,就可以认为它们很相似. 任何一种颜色都是由红绿蓝三原色(RGB)构成的,所以上图共有4张直方图(三原色直方图 + 最后合成的直方图).

以图搜图 – 3大相似图片搜索引擎

- 杨磊 - 帕兰映像
以图搜图,顾名思义就是上传一张图片,网站搜索并显示与之类似的图片. 看到一个可爱的卡通头像想搜出更多来. 看看是不是用旧图片制作的新新闻. 还有很多用法就看大家的想象力啦. 作者爱好搜集图片,最不能容忍的就是美图上面有水印,只要上传图片到以图搜图网站,轻轻一点便能搜出不带水印的图片. 这种去水印的方法是不是很给力,我的独创哦.

Chrome 相似图片搜索扩展 LeiTu Image Search

- zg - 谷奥——探寻谷歌的奥秘
感谢读者 sligtCats 的自爆. LeiTu Image Search这货是sligtCats昨天晚上制作的Chrome类似图片搜索扩展(简称类图搜索). 安装之后可以在图片上右键即开始搜索和它类似的图片. 内有TinEye、Google Search by image和百度识图三个功能,可以按需使用.

LIRE(Lucene Image Retrieval)相似图像索引和搜索机制

- - CSDN博客云计算推荐文章
众说周知,lucene是一个开源的强大的索引工具,但是它仅限于文本索引. 基于内容的图像检索(CBIR)要求我们利用图像的一些基本特征(如颜色纹理形状以及sift,surf等等)搜索相似的图片,LIRE(Lucene Image Retrieval)是一款基于lucene的图像特征索引工具,它能帮助我们方便的对图像特征建立索引和搜索,作者也在不断加入新的特征供用户使用.

相似图片搜索的三种哈希算法

- - CSDN博客推荐文章
想必大家都用google或baidu的识图功能,上面就是我搜索冠希哥一幅图片的结果,达到图片比较目的且利用信息指纹比较有三种算法,这些算法都很易懂,下面分别介绍一下:. 一、平均哈希算法(aHash). 此算法是基于比较灰度图每个像素与平均值来实现的,最适用于缩略图,放大图搜索. 1.缩放图片:为了保留结构去掉细节,去除大小、横纵比的差异,把图片统一缩放到8*8,共64个像素的图片.