datasketch:MinHash,LSH,LSH森林,加权MinHash,HyperLogLog,HyperLogLog ++,LSH集成

上传者: 42097450 | 上传时间: 2023-03-26 14:13:18 | 文件大小: 776KB | 文件类型: ZIP
datasketch:大数据看起来很小 datasketch提供给您概率性的数据结构,这些数据结构可以超快地处理和搜索大量数据,而几乎不会降低准确性。 该软件包包含以下数据草图: 数据草图 用法 估计Jaccard相似度和基数 估计加权Jaccard相似度 估计基数 估计基数 提供了以下数据草图索引以支持亚线性查询时间: 指数 对于数据草图 支持的查询类型 MinHash,加权MinHash 提卡阈值 MinHash,加权MinHash Jaccard Top-K 最小哈希 遏制阈值 datasketch必须与Python 2.7或更高版本以及NumPy 1.11或更高版本一起使用。 Scipy是可选的,但有了它,LSH初始化可以更快。 请注意, 和也支持Redis和Cassandra存储层(请参见 )。 安装 要使用pip安装datasketch: pip insta

文件下载

资源详情

[{"title":"( 90 个子文件 776KB ) datasketch:MinHash,LSH,LSH森林,加权MinHash,HyperLogLog,HyperLogLog ++,LSH集成","children":[{"title":"datasketch-master","children":[{"title":"README.rst <span style='color:#111;'> 3.42KB </span>","children":null,"spread":false},{"title":".flake8 <span style='color:#111;'> 188B </span>","children":null,"spread":false},{"title":".github","children":[{"title":"workflows","children":[{"title":"python-package.yml <span style='color:#111;'> 2.63KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"examples","children":[{"title":"lshensemble_example.py <span style='color:#111;'> 1.01KB </span>","children":null,"spread":false},{"title":"lshforest_example.py <span style='color:#111;'> 1.11KB </span>","children":null,"spread":false},{"title":"hyperloglog_examples.py <span style='color:#111;'> 1.03KB </span>","children":null,"spread":false},{"title":"lsh_examples.py <span style='color:#111;'> 1.73KB </span>","children":null,"spread":false},{"title":"minhash_examples.py <span style='color:#111;'> 854B </span>","children":null,"spread":false},{"title":"weighted_minhash_examples.py <span style='color:#111;'> 436B </span>","children":null,"spread":false}],"spread":true},{"title":"benchmark","children":[{"title":"sketches","children":[{"title":"hyperloglog_benchmark.py <span style='color:#111;'> 1.71KB </span>","children":null,"spread":false},{"title":"weighted_minhash_benchmark.py <span style='color:#111;'> 2.36KB </span>","children":null,"spread":false},{"title":"minhash_benchmark.py <span style='color:#111;'> 2.01KB </span>","children":null,"spread":false},{"title":"inclusion_benchmark.py <span style='color:#111;'> 3.92KB </span>","children":null,"spread":false},{"title":"cardinality_benchmark.py <span style='color:#111;'> 2.54KB </span>","children":null,"spread":false},{"title":"b_bit_minhash_benchmark.py <span style='color:#111;'> 2.71KB </span>","children":null,"spread":false},{"title":"similarity_benchmark.py <span style='color:#111;'> 4.20KB </span>","children":null,"spread":false}],"spread":true},{"title":"indexes","children":[{"title":"jaccard","children":[{"title":"plot_topk_benchmark.py <span style='color:#111;'> 2.50KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 8.49KB </span>","children":null,"spread":false},{"title":"lshforest.py <span style='color:#111;'> 1.50KB </span>","children":null,"spread":false},{"title":"lsh.py <span style='color:#111;'> 1.47KB </span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'> 48B </span>","children":null,"spread":false},{"title":"exact.py <span style='color:#111;'> 1.82KB </span>","children":null,"spread":false},{"title":"topk_benchmark.py <span style='color:#111;'> 5.77KB </span>","children":null,"spread":false},{"title":"hnsw.py <span style='color:#111;'> 1.34KB </span>","children":null,"spread":false}],"spread":true},{"title":"containment","children":[{"title":"utils.py <span style='color:#111;'> 827B </span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'> 52B </span>","children":null,"spread":false},{"title":"lshensemble_benchmark.py <span style='color:#111;'> 10.31KB </span>","children":null,"spread":false},{"title":"lshensemble_benchmark_plot.py <span style='color:#111;'> 6.44KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true},{"title":".travis.yml <span style='color:#111;'> 1.39KB </span>","children":null,"spread":false},{"title":"LICENSE <span style='color:#111;'> 1.05KB </span>","children":null,"spread":false},{"title":"test","children":[{"title":"test_weighted_minhash.py <span style='color:#111;'> 1.09KB </span>","children":null,"spread":false},{"title":"aio","children":[{"title":"test_lsh_mongo.py <span style='color:#111;'> 19.70KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false}],"spread":true},{"title":"utils.py <span style='color:#111;'> 68B </span>","children":null,"spread":false},{"title":"test_lean_minhash.py <span style='color:#111;'> 5.87KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"test_lsh_cassandra.py <span style='color:#111;'> 9.45KB </span>","children":null,"spread":false},{"title":"test_lshensemble.py <span style='color:#111;'> 2.64KB </span>","children":null,"spread":false},{"title":"test_lshforest.py <span style='color:#111;'> 5.51KB </span>","children":null,"spread":false},{"title":"test_lsh.py <span style='color:#111;'> 9.70KB </span>","children":null,"spread":false},{"title":"test_hyperloglog.py <span style='color:#111;'> 5.96KB </span>","children":null,"spread":false},{"title":"test_minhash.py <span style='color:#111;'> 6.52KB </span>","children":null,"spread":false}],"spread":false},{"title":"setup.py <span style='color:#111;'> 2.58KB </span>","children":null,"spread":false},{"title":"Makefile <span style='color:#111;'> 224B </span>","children":null,"spread":false},{"title":"travis","children":[{"title":"wait_for_cassandra.sh <span style='color:#111;'> 1.26KB </span>","children":null,"spread":false}],"spread":true},{"title":"datasketch","children":[{"title":"minhash.py <span style='color:#111;'> 12.69KB </span>","children":null,"spread":false},{"title":"b_bit_minhash.py <span style='color:#111;'> 6.35KB </span>","children":null,"spread":false},{"title":"experimental","children":[{"title":"aio","children":[{"title":"lsh.py <span style='color:#111;'> 15.18KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"storage.py <span style='color:#111;'> 10.60KB </span>","children":null,"spread":false}],"spread":false},{"title":"__init__.py <span style='color:#111;'> 458B </span>","children":null,"spread":false}],"spread":false},{"title":"lshensemble.py <span style='color:#111;'> 10.01KB </span>","children":null,"spread":false},{"title":"lshforest.py <span style='color:#111;'> 5.76KB </span>","children":null,"spread":false},{"title":"hyperloglog.py <span style='color:#111;'> 12.09KB </span>","children":null,"spread":false},{"title":"lsh.py <span style='color:#111;'> 12.55KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 611B </span>","children":null,"spread":false},{"title":"storage.py <span style='color:#111;'> 35.17KB </span>","children":null,"spread":false},{"title":"hyperloglog_const.py <span style='color:#111;'> 70.84KB </span>","children":null,"spread":false},{"title":"weighted_minhash.py <span style='color:#111;'> 4.93KB </span>","children":null,"spread":false},{"title":"hashfunc.py <span style='color:#111;'> 646B </span>","children":null,"spread":false},{"title":"version.py <span style='color:#111;'> 20B </span>","children":null,"spread":false},{"title":"lean_minhash.py <span style='color:#111;'> 9.42KB </span>","children":null,"spread":false},{"title":"lshensemble_partition.py <span style='color:#111;'> 7.19KB </span>","children":null,"spread":false}],"spread":false},{"title":"docs","children":[{"title":"documentation.rst <span style='color:#111;'> 1.11KB </span>","children":null,"spread":false},{"title":"minhash.rst <span style='color:#111;'> 4.23KB </span>","children":null,"spread":false},{"title":".nojekyll <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"lsh.rst <span style='color:#111;'> 14.64KB </span>","children":null,"spread":false},{"title":"conf.py <span style='color:#111;'> 10.20KB </span>","children":null,"spread":false},{"title":"lshforest.rst <span style='color:#111;'> 5.29KB </span>","children":null,"spread":false},{"title":"weightedminhash.rst <span style='color:#111;'> 3.29KB </span>","children":null,"spread":false},{"title":"index.rst <span style='color:#111;'> 85B </span>","children":null,"spread":false},{"title":"Makefile <span style='color:#111;'> 7.44KB </span>","children":null,"spread":false},{"title":"hyperloglog.rst <span style='color:#111;'> 2.18KB </span>","children":null,"spread":false},{"title":"lshensemble.rst <span style='color:#111;'> 5.34KB </span>","children":null,"spread":false},{"title":"_static","children":[{"title":"weighted_minhash_benchmark.png <span style='color:#111;'> 56.22KB </span>","children":null,"spread":false},{"title":"hyperloglog_benchmark.png <span style='color:#111;'> 51.66KB </span>","children":null,"spread":false},{"title":"lshensemble_benchmark_1k","children":[{"title":"lshensemble_num_perm_256_recall.png <span style='color:#111;'> 25.03KB </span>","children":null,"spread":false},{"title":"lshensemble_num_perm_256_precision.png <span style='color:#111;'> 29.65KB </span>","children":null,"spread":false},{"title":"lshensemble_num_perm_256_fscore.png <span style='color:#111;'> 29.37KB </span>","children":null,"spread":false},{"title":"lshensemble_num_perm_256_query_time.png <span style='color:#111;'> 14.50KB </span>","children":null,"spread":false}],"spread":false},{"title":"lshforest_benchmark.png <span style='color:#111;'> 48.55KB </span>","children":null,"spread":false},{"title":"containment.png <span style='color:#111;'> 22.75KB </span>","children":null,"spread":false},{"title":"hashfunc","children":[{"title":"minhash_benchmark_mmh3.png <span style='color:#111;'> 54.33KB </span>","children":null,"spread":false},{"title":"minhash_benchmark_farmhash.png <span style='color:#111;'> 53.84KB </span>","children":null,"spread":false},{"title":"minhash_benchmark_sha1.png <span style='color:#111;'> 52.65KB </span>","children":null,"spread":false},{"title":"minhash_benchmark_xxh.png <span style='color:#111;'> 53.01KB </span>","children":null,"spread":false}],"spread":false},{"title":"b_bit_minhash_benchmark.png <span style='color:#111;'> 61.04KB </span>","children":null,"spread":false},{"title":"lsh_benchmark.png <span style='color:#111;'> 84.85KB </span>","children":null,"spread":false},{"title":"minhash_benchmark.png <span style='color:#111;'> 54.23KB </span>","children":null,"spread":false}],"spread":false}],"spread":false},{"title":".gitignore <span style='color:#111;'> 944B </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明