NEO4J

admin管理员组
文章数量:1794759

NEO4J

说明：使用neo4j算法库时需引入跟neo4j数据库对应的算法库插件或自定义算法库

1.简介

重叠相似度算法就是先把两个向量表示成两个长度相等得一维坐标，即映射到一维空间，再进行重合度加权求和，它即不关注两个向量得夹角，也不关注向量之差得长度值。

其向量公式如下：

数学计算公式如下：

，其中分母是作为一个归一化因子，其中成为共同维度函数，O(x_{i},y_{i}) }为重合度函数

2.使用场景

我们可以使用重叠相似性算法来计算出哪些事物是其他事物的子集。此算法对两个事物之间本身关联关系数据不必要求数量一致，然后我们可能会使用这些计算出的子集从标记数据中学习分类法。比如再常见数据挖掘分析过程中

3.neo4j中重叠度函数使用示例

neo4j中提供如下函数和存储过程，根据第四小节源码分析可知，其函数适合两个点比较重叠度，存储过程适合多个点比较重叠度

algo.similarity.overlap函数，入参为两个List<Number集合>
algo.similarity.overlap.stream存储过程，入参（List<Map<String,Object>> data，
```
Map<String, Object> config）
```
algo.similarity.overlap存储过程，入参（List<Map<String,Object>> data，
```
Map<String, Object> config）
```

1.计算两个硬编码重叠度

RETURN algo.similarity.overlap([1,2,3], [1,2,4,5]) AS similarity

结果：0.6666666666666666

2.初始化节点数据

MERGE (fahrenheit451:Book {title:'Fahrenheit 451'})
MERGE (dune:Book {title:'Dune'})
MERGE (hungerGames:Book {title:'The Hunger Games'})
MERGE (nineteen84:Book {title:'1984'})
MERGE (gatsby:Book {title:'The Great Gatsby'})MERGE (scienceFiction:Genre {name: "Science Fiction"})
MERGE (fantasy:Genre {name: "Fantasy"})
MERGE (dystopia:Genre {name: "Dystopia"})
MERGE (classics:Genre {name: "Classics"})MERGE (fahrenheit451)-[:HAS_GENRE]->(dystopia)
MERGE (fahrenheit451)-[:HAS_GENRE]->(scienceFiction)
MERGE (fahrenheit451)-[:HAS_GENRE]->(fantasy)
MERGE (fahrenheit451)-[:HAS_GENRE]->(classics)MERGE (hungerGames)-[:HAS_GENRE]->(scienceFiction)
MERGE (hungerGames)-[:HAS_GENRE]->(fantasy)
MERGE (hungerGames)-[:HAS_GENRE]->(romance)MERGE (nineteen84)-[:HAS_GENRE]->(scienceFiction)
MERGE (nineteen84)-[:HAS_GENRE]->(dystopia)
MERGE (nineteen84)-[:HAS_GENRE]->(classics)MERGE (dune)-[:HAS_GENRE]->(scienceFiction)
MERGE (dune)-[:HAS_GENRE]->(fantasy)
MERGE (dune)-[:HAS_GENRE]->(classics)MERGE (gatsby)-[:HAS_GENRE]->(classics)

3.计算节点之间得交集和重叠相似性

MATCH (book:Book)-[:HAS_GENRE]->(genre)
WITH {item:id(genre), categories: collect(id(book))} as userData
WITH collect(userData) as data
CALL algo.similarity.overlap.stream(data)
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.asNode(item1).name AS from, algo.asNode(item2).name AS to,count1, count2, intersection, similarity
ORDER BY similarity DESC

结果：count1为跟from相关得节点,count2为跟to相关得节点，intersection为共同节点数，similarity为相似度

4.对相似度计算结果进行条件筛选，增加条件相似度大于等于0.75的

MATCH (book:Book)-[:HAS_GENRE]->(genre)
WITH {item:id(genre), categories: collect(id(book))} as userData
WITH collect(userData) as data
CALL algo.similarity.overlap.stream(data, {similarityCutoff: 0.75})
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.asNode(item1).name AS from, algo.asNode(item2).name AS to,count1, count2, intersection, similarity
ORDER BY similarity DESC

结果：

5.为每个节点找到最相似的节点，并存储这些节点之间的关系

MATCH (book:Book)-[:HAS_GENRE]->(genre)
WITH {item:id(genre), categories: collect(id(book))} as userData
WITH collect(userData) as data
CALL algo.similarity.overlap(data, {topK: 2, similarityCutoff: 0.5, write:true})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95

结果：

6.指定源和目标ID：有时，我们不想计算所有对的相似性，而是希望指定项目的子集来相互比较。我们使用配置中的sourceIds和targetIds键来做到这一点

MATCH (book:Book)-[:HAS_GENRE]->(genre)
WITH {item:id(genre), name: genre.name, categories: collect(id(book))} as userData
WITH collect(userData) as data
WITH data,[value in data WHERE value.name IN ["Fantasy", "Classics"] | value.item ] AS sourceIds
CALL algo.similarity.overlap.stream(data, {sourceIds: sourceIds})
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.getNodeById(item1).name AS from, algo.getNodeById(item2).name AS to, similarity
ORDER BY similarity DESC

结果：

4.源码解析

algo.similarity.overlap函数：

    public double overlapSimilarity(@Name("vector1") List<Number> vector1, @Name("vector2") List<Number> vector2) {if (vector1 == null || vector2 == null) return 0;HashSet<Number> intersectionSet = new HashSet<>(vector1);intersectionSet.retainAll(vector2);int intersection = intersectionSet.size();long denominator = Math.min(vector1.size(), vector2.size());return denominator == 0 ? 0 : (double) intersection / denominator;}

algo.similarity.overlap存储过程：

    public Stream<SimilaritySummaryResult> overlap(@Name(value = "data", defaultValue = "null") List<Map<String, Object>> data,@Name(value = "config", defaultValue = "{}") Map<String, Object> config) {ProcedureConfiguration configuration = ProcedureConfiguration.create(config);CategoricalInput[] inputs = prepareCategories(data, getDegreeCutoff(configuration));String writeRelationshipType = configuration.get("writeRelationshipType", "NARROWER_THAN");String writeProperty = configuration.getWriteProperty("score");if(inputs.length == 0) {return emptyStream(writeRelationshipType, writeProperty);}long[] inputIds = SimilarityInput.extractInputIds(inputs);int[] sourceIndexIds = indexesFor(inputIds, configuration, "sourceIds");int[] targetIndexIds = indexesFor(inputIds, configuration, "targetIds");SimilarityComputer<CategoricalInput> computer = similarityComputer(sourceIndexIds, targetIndexIds);SimilarityRecorder<CategoricalInput> recorder = categoricalSimilarityRecorder(computer, configuration);double similarityCutoff = getSimilarityCutoff(configuration);Stream<SimilarityResult> stream = topN(similarityStream(inputs, sourceIndexIds, targetIndexIds, recorder, configuration, () -> null, similarityCutoff, getTopK(configuration)), getTopN(configuration));boolean write = configuration.isWriteFlag(false) && similarityCutoff > 0.0;return writeAndAggregateResults(stream, inputs.length, sourceIndexIds.length, targetIndexIds.length, configuration, write, writeRelationshipType, writeProperty, recorder);}

algo.similarity.overlap.stream存储过程：

    public Stream<SimilarityResult> similarityStream(@Name(value = "data", defaultValue = "null") List<Map<String,Object>> data,@Name(value = "config", defaultValue = "{}") Map<String, Object> config) {ProcedureConfiguration configuration = ProcedureConfiguration.create(config);CategoricalInput[] inputs = prepareCategories(data, getDegreeCutoff(configuration));if(inputs.length == 0) {return Stream.empty();}long[] inputIds = SimilarityInput.extractInputIds(inputs);int[] sourceIndexIds = indexesFor(inputIds, configuration, "sourceIds");int[] targetIndexIds = indexesFor(inputIds, configuration, "targetIds");SimilarityComputer<CategoricalInput> computer = similarityComputer(sourceIndexIds, targetIndexIds);return topN(similarityStream(inputs, sourceIndexIds, targetIndexIds, computer, configuration, () -> null, getSimilarityCutoff(configuration), getTopK(configuration)), getTopN(configuration));}

本文标签： Neo4j

版权声明：本文标题：NEO4J 内容由林淑君副主任自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.xiehuijuan.com/baike/1697212159a339588.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

写会百科网

NEO4J

NEO4J

1.简介

2.使用场景

3.neo4j中重叠度函数使用示例

4.源码解析

更多相关文章

图数据库Neo4j入门

【数据库】linux安装neo4j教程（neo4j 4.x）

NEO4J

发表评论

推荐文章

扫雷游戏初阶版

java8 Optional使用 stream filter多级过滤

今日软荐：17.6K Star！自动扫描代码，发现并阻止密码等敏感信息的开源工具：Gitleaks

【计算机网络】详解HTTP请求和响应格式&常见请求方法&Header报头&响应报文状态码&URL

生活分享网站源码博客风格分享小清新php源码

热门文章

链上相遇，节点之间的悸动与牵连

【题目训练】回溯、搜索练习1

使用MaxKB添加本地部署的Ollama大语言模型搭建智能聊天系统

Llama Stack发布，助力开发者构建“代理应用”

【已解决】ModuleNotFoundError: No module named ‘web’的解决办法：

WGCNA加权基因共表达网络多步法分析学习

程序模块化设计结构化开发优势

如何用自制引擎写出第一个游戏？Carimbo 给你答案

高性能PHP框架webman爬虫引擎插件

Dcoker Compose 模板文件详解

最新文章

Linux系统之jobs命令的基本使用

手把手教学！简单上手“AI复活”技术

单细胞Seruat和h5ad数据格式互换(R与python)方法学习和整理

JVM专题

学会5个图表，让数据分析简洁高效

写“藤”的作文1200字

有关于进步的作文

幼儿园见习报告

语数英寒假作业上册答案五年级

伊索寓言经典语词句摘抄

写会百科网

NEO4J

NEO4J

1.简介

2.使用场景

3.neo4j中重叠度函数使用示例

4.源码解析

更多相关文章

图数据库Neo4j入门

【数据库】linux安装neo4j教程（neo4j 4.x）

NEO4J

发表评论

推荐文章

扫雷游戏初阶版

java8 Optional使用 stream filter多级过滤

今日软荐：17.6K Star！自动扫描代码，发现并阻止密码等敏感信息的开源工具：Gitleaks

【计算机网络】详解HTTP请求和响应格式&amp;常见请求方法&amp;Header报头&amp;响应报文状态码&amp;URL

生活分享网站源码 博客风格分享小清新php源码

热门文章

链上相遇，节点之间的悸动与牵连

【题目训练】回溯、搜索练习1

使用MaxKB添加本地部署的Ollama大语言模型搭建智能聊天系统

Llama Stack发布，助力开发者构建“代理应用”

【已解决】ModuleNotFoundError: No module named ‘web’的解决办法：

WGCNA加权基因共表达网络多步法分析学习

程序模块化设计结构化开发优势

如何用自制引擎写出第一个游戏？Carimbo 给你答案

高性能PHP框架webman爬虫引擎插件

Dcoker Compose 模板文件详解

最新文章

Linux系统之jobs命令的基本使用

手把手教学！简单上手“AI复活”技术

单细胞Seruat和h5ad数据格式互换(R与python)方法学习和整理

JVM专题

学会5个图表，让数据分析简洁高效

写“藤”的作文1200字

有关于进步的作文

幼儿园见习报告

语数英寒假作业上册答案五年级

伊索寓言经典语词句摘抄

【计算机网络】详解HTTP请求和响应格式&常见请求方法&Header报头&响应报文状态码&URL

生活分享网站源码博客风格分享小清新php源码