欢迎大家赞助一杯啤酒🍺 我们准备了下酒菜:Formal mathematics/Isabelle/ML, Formal verification/Coq/ACL2, C++/F#/Lisp
HPCC
小 (→Identity&Risk) |
小 (→ECL) |
||
(未显示1个用户的63个中间版本) | |||
第1行: | 第1行: | ||
{{SeeWikipedia}} | {{SeeWikipedia}} | ||
+ | HPCC | ||
+ | [[文件:hpcc-systems-logo.png|right]] | ||
+ | ==新闻== | ||
+ | *[https://wiki.hpccsystems.com/display/hpcc/10+Year+Anniversary+Podcast+Series 庆祝为HPCC开源社区服务10年] (June 15, 2021) | ||
+ | |||
+ | ==简介== | ||
HPCC (High-Performance Computing Cluster), 也称为 DAS (Data Analytics Supercomputer) 是一个开源(Apache v2)的大数据处理和分析平台,使用 [[C++]] 和 [https://en.wikipedia.org/wiki/ECL_%28data-centric_programming_language%29 ECL] 开发。 | HPCC (High-Performance Computing Cluster), 也称为 DAS (Data Analytics Supercomputer) 是一个开源(Apache v2)的大数据处理和分析平台,使用 [[C++]] 和 [https://en.wikipedia.org/wiki/ECL_%28data-centric_programming_language%29 ECL] 开发。 | ||
+ | |||
+ | * 一套[[C++]]开发的大数据处理和分析平台; | ||
+ | * ECL (Enterprise Control Language) 和KEL (Knowledge Engineering Language) 是两个High Level的脚本语言; | ||
+ | * [https://hpccsystems.com/about/hpcc-hadoop-comparison HPCC比Apache Hadoop历史更加悠久,在并行架构上也有自己的独到之处:Data Parallelism、Pipeline Parallelism、System Parallelism] | ||
+ | |||
+ | ==版本== | ||
+ | [https://hpccsystems.com/download/release-notes HPCC Community Edition] | ||
+ | *[https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+9.4.x+Releases 9.x] | ||
+ | *[https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+8.0.x+Releases 8.x Cloud Native] | ||
+ | *[https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+7.12.x+Releases 7.x] | ||
+ | *[https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+6.4.x+Releases 6.x] | ||
+ | *[https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+5.6.x+Releases 5.x] | ||
==组件== | ==组件== | ||
HPCC Systems 包括以下核心组件: | HPCC Systems 包括以下核心组件: | ||
* Thor (the Data Refinery Cluster) | * Thor (the Data Refinery Cluster) | ||
− | * Roxie (the Query Cluster) | + | * Roxie (Rapid Online XML Inquiry Engine, the Query Cluster) |
* ECL (Enterprise Control Language) | * ECL (Enterprise Control Language) | ||
* ECL IDE | * ECL IDE | ||
* ESP (Enterprise Services Platform) | * ESP (Enterprise Services Platform) | ||
+ | |||
+ | ==平行== | ||
+ | Parallelism Architecture: | ||
+ | *Data Parallelism | ||
+ | *Component Parallelism | ||
+ | *Pipeline Parallelism | ||
+ | *System Parallelism | ||
==指南== | ==指南== | ||
− | 下载 [https://hpccsystems.com/download/virtual-machine-image HPCC 虚拟机] | + | 下载 [https://hpccsystems.com/download/virtual-machine-image HPCC 虚拟机] 快速启动。 |
+ | http://127.0.0.1:8010/ | ||
+ | HPCC配置管理器 | ||
+ | sudo /opt/HPCCSystems/sbin/configmgr | ||
+ | http://localhost:8015 | ||
+ | |||
+ | ==项目== | ||
+ | *[https://github.com/hpcc-systems HPCC @ GitHub] | ||
==机器学习== | ==机器学习== | ||
+ | *[https://github.com/hpcc-systems/ML_Core Core ECL Machine Learning library] ECL编写 | ||
+ | *[http://docs.huihoo.com/hpcc/Machine-Learning-Library-Reference.pdf Machine Learning Library Reference] | ||
+ | *[http://docs.huihoo.com/hpcc/developing-machine-learning-algorithms-on-hpcc-ecl-platfrom.pdf Developing Machine Learning Algorithms on HPCC/ECL Platform] | ||
+ | *[http://docs.huihoo.com/hpcc/unsupervised-learning-and-image-classification-in-high-performance-computing-cluster.pdf Unsupervised Learning and Image Classification in High Performance Computing Cluster] | ||
+ | *[http://docs.huihoo.com/hpcc/optimizing-supervised-and-implementing-unsupervised-machine-learning-algorithms-in-hpcc-systems.pdf Optimizing Supervised Machine Learning Algorithms and Implementing Deep Learning in HPCC Systems] | ||
+ | *[https://hpccsystems.com/resources/blog/richardkchapman/embedding-tensorflow-operations-ecl Embedding TensorFlow Operations in ECL] | ||
==Identity&Risk== | ==Identity&Risk== | ||
+ | *[http://docs.huihoo.com/hpcc/2016-North-American-Healthcare-Identity-Management-Technology-Innovation-Award.pdf 2016 North American Healthcare Identity Management Technology Innovation Award] | ||
*[http://docs.huihoo.com/hpcc/security-and-privacy-in-a-big-data-world.pdf Security and Privacy in a Big Data World] | *[http://docs.huihoo.com/hpcc/security-and-privacy-in-a-big-data-world.pdf Security and Privacy in a Big Data World] | ||
*[http://docs.huihoo.com/hpcc/crowdsourcing-large-scale-identity-theft-and-fraud-to-make-bucket-loads-of-easy-money.pdf Crowdsourcing large scale identity theft and fraud to make bucket loads of easy money] | *[http://docs.huihoo.com/hpcc/crowdsourcing-large-scale-identity-theft-and-fraud-to-make-bucket-loads-of-easy-money.pdf Crowdsourcing large scale identity theft and fraud to make bucket loads of easy money] | ||
*[http://docs.huihoo.com/hpcc/data-analytics-governance-and-ethics-lexisnexis-risk-solutions.pdf Data Analytics Governance and Ethics] | *[http://docs.huihoo.com/hpcc/data-analytics-governance-and-ethics-lexisnexis-risk-solutions.pdf Data Analytics Governance and Ethics] | ||
+ | *[http://docs.huihoo.com/hpcc/data-analytics-in-cyber-security-and-threat-intelligence.pdf Data Analytics in Cyber-Security and Threat Intelligence] | ||
+ | |||
+ | ==可视化== | ||
+ | [https://github.com/hpcc-systems/Visualization HPCC Visualization Framework] [[JavaScript]]编写 | ||
+ | |||
+ | ==ECL== | ||
+ | 声明性的、模块化的、可扩展的企业控制语言(ECL)是专为处理大数据而设计的。ECL代码编译成优化的C++,并且可以利用C++库方便地扩展。 | ||
+ | |||
+ | 我可以写4行ECL代码来替代SQL中的200行。这使得阅读,理解和维护代码变得非常容易。- Adwait Joshi, DataSeers公司CEO | ||
+ | *[https://github.com/hpcc-systems/HPCC-Platform/tree/master/ecl ECL @ GitHub] | ||
+ | *[https://github.com/hpcc-systems/ecl-bundles ECL bundles] | ||
+ | *[https://github.com/infosys-hpcc/eclbuilder ECLBuilder] [http://docs.huihoo.com/hpcc/ecl-builder-an-ecl-web-interface-for-analytics.pdf HPCC Systems ECL Builder] | ||
+ | |||
+ | ==UDF== | ||
+ | 用户可通过[[Java]], [[Python]], [[C++ ]] 和 [[R Project|R]]创建自己的[https://wiki.hpccsystems.com/pages/viewpage.action?pageId=1802272 User Defined Functions (UDF)]。 | ||
+ | |||
+ | ==ECL IDE== | ||
+ | *[https://hpccsystems.com/resources/faq/what-ecl-ide What is the ECL IDE?] | ||
+ | *[https://github.com/hpcc-systems/eclide eclide] [[C++]]编写 | ||
+ | *[https://github.com/GordonSmith/vscode-ecl ECL for Visual Studio Code] | ||
+ | |||
+ | ==ECL Watch== | ||
+ | ECL Watch 是运行在Enterprise Services Platform (ESP)的一个服务,是HPCC平台的一个中间件组件。 | ||
+ | |||
+ | [https://github.com/hpcc-systems/HPCC-Platform/tree/candidate-6.4.0/esp/src/eclwatch ECL Watch Candidate-6.4.0 源代码] [[JavaScript]]编写。 | ||
+ | |||
+ | ==ESDL== | ||
+ | ESDL (Enterprise Service Description Language) | ||
+ | |||
+ | [http://docs.huihoo.com/hpcc/Dynamic-ESDL-6.2.0-1.pdf Dynamic ESDL] | ||
+ | |||
+ | ==SALT== | ||
+ | [https://hpccsystems.com/enterprise-services/purchase-required-modules/SALT SALT: Scalable Automated Linking Technology] 提供: | ||
+ | *连接和聚类 (MDM) | ||
+ | *数据归档、清洗、规范、标准化 | ||
+ | *复杂的特性和基于连接和聚类的关系 | ||
+ | [[image:hpcc-salt.jpg]] | ||
+ | |||
+ | ==Thor== | ||
+ | Thor (the Data Refinery Cluster), Thor 集群负责复杂的数据处理。 | ||
+ | |||
+ | Thor,数据提炼引擎,是提取和补充数据的引擎。 | ||
+ | * Thor 使用主从拓扑,其从机提供本地化的数据存储和处理能力,主机监控和协调从机的活动,并传递任务状态信息。 | ||
+ | * 中间组件提供命名服务和其它服务,以辅助执行分布式任务。 | ||
+ | |||
+ | ==Roxie== | ||
+ | Roxie (Rapid Online XML Inquiry Engine), ROXIE 集群负责数据查询和报告。 | ||
+ | |||
+ | ROXIE,数据传送引擎,提供了高性能的在线处理和数据仓库功能。 | ||
+ | * 每一个ROXIE节点会启动一个服务器进程和一个代理进程。这个服务器进程会处理用户传入的查询请求,并将查询任务分配给ROXIE集群相应的代理,校对结果,最后将有效负载返回给客户端。 | ||
+ | * 查询可能包括数据联接和其它复杂数据转换,有效负载可以包含结构化或非结构化的数据。 | ||
+ | |||
+ | ==Interlok== | ||
+ | Interlok: Seamless Data Integration | ||
+ | |||
+ | ==KEL== | ||
+ | [https://hpccsystems.com/download/free-modules/kel-lite KEL: Knowledge Engineering Language] | ||
+ | |||
+ | 社交图 | ||
+ | |||
+ | [[文件:hpcc-kel-social-graph.jpg]] | ||
+ | |||
+ | ==ESP== | ||
+ | ESP (Enterprise Services Platform) | ||
+ | |||
+ | ==DFS== | ||
+ | 分布式文件系统 (DFS) | ||
+ | |||
+ | * Thor DFS 是面向数据记录而设计的,并针对大数据ETL(提取-转换-加载) 进行了优化。数据记录存在于大数据输入文件中,可能是标准格式或是自定义格式,可能是定长或是不定长。大数据输入文件会在集群的 DFS 中进行分区,每一个节点都会获得大致相同数量的数据记录,并且单独记录不会被分割。 | ||
+ | * ROXIE DFS 基于索引,并针对并发查询处理进行了优化。该系统基于自定义B+树结构,可以实现快速、高效的数据摄取。 | ||
+ | |||
+ | ==Nagios== | ||
+ | HPCC使用[[Nagios]]进行系统监控。 | ||
+ | |||
+ | ==Ganglia== | ||
+ | HPCC使用[[Ganglia]]提供监控和报表。 | ||
+ | |||
+ | ==Hadoop== | ||
+ | [https://hpccsystems.com/why-hpcc-systems/hpcc-hadoop-comparison HPCC和Hadoop的比较] | ||
+ | |||
+ | ==[[Apache Cassandra|Cassandra]]== | ||
+ | *[https://github.com/hpcc-systems/HPCC-Platform/tree/master/plugins/cassandra Cassandra Plugin] | ||
+ | *[https://github.com/datastax/cpp-driver DataStax C/C++ Driver for Apache Cassandra] | ||
+ | |||
+ | ==[[Apache Kafka|Kafka]]== | ||
+ | *[https://github.com/hpcc-systems/HPCC-Platform/tree/master/plugins/kafka Kafka Plugin] | ||
+ | *[https://github.com/edenhill/librdkafka Apache Kafka C/C++ library] | ||
+ | |||
+ | ==AWS== | ||
+ | *[https://aws.hpccsystems.com HPCC on AWS] | ||
+ | *[http://docs.huihoo.com/hpcc/Instant-Cloud-for-AWS-6.2.0-1.pdf HPCC Cluster on AWS] | ||
+ | |||
+ | ==用户== | ||
+ | [https://hpccsystems.com/resources/case-studies Case Studies] | ||
==文档== | ==文档== | ||
第30行: | 第163行: | ||
*[http://docs.huihoo.com/hpcc/Dynamic-ESDL-5.4.2.pdf Dynamic ESDL (Enterprise Service Description Language)] | *[http://docs.huihoo.com/hpcc/Dynamic-ESDL-5.4.2.pdf Dynamic ESDL (Enterprise Service Description Language)] | ||
*[http://docs.huihoo.com/hpcc/Installing-and-Running-the-HPCC-Platform-5.4.2.pdf Installing & Running the HPCC Platform] | *[http://docs.huihoo.com/hpcc/Installing-and-Running-the-HPCC-Platform-5.4.2.pdf Installing & Running the HPCC Platform] | ||
− | + | [http://docs.huihoo.com/hpcc/ 更多文档>>>] | |
==图集== | ==图集== | ||
第41行: | 第174行: | ||
image:Pig-Program-Translation-to-MapReduce.png|Pig | image:Pig-Program-Translation-to-MapReduce.png|Pig | ||
image:HPCC-Environment-System-Component-Relationships.png|HPCC组件 | image:HPCC-Environment-System-Component-Relationships.png|HPCC组件 | ||
+ | image:hpcc-configuration-manager.png|配置管理器 | ||
image:hpcc-ecl-watch.png|ECL Watch | image:hpcc-ecl-watch.png|ECL Watch | ||
image:hpcc-ecl-watch-playground.png|ELC操作平台 | image:hpcc-ecl-watch-playground.png|ELC操作平台 | ||
image:HPCC-Systems-ETL-Platform.png|ETL平台 | image:HPCC-Systems-ETL-Platform.png|ETL平台 | ||
image:ECL-ML-Machine-Learning-Module.png|机器学习模块 | image:ECL-ML-Machine-Learning-Module.png|机器学习模块 | ||
+ | image:LexisNexis-Risk-Solutions.png|Risk解决方案 | ||
+ | image:hpcc-use-cases.png|HPCC用例 | ||
+ | image:SALT-Scalable-Automated-Linking-Technology.png|SALT方法 | ||
+ | image:LexisNexis-linking.png|Linking方法 | ||
</gallery> | </gallery> | ||
==链接== | ==链接== | ||
*[https://hpccsystems.com/ HPCC官网] | *[https://hpccsystems.com/ HPCC官网] | ||
+ | *[https://hpccsystems.com/resources/blog HPCC Systems Blog] | ||
+ | *[https://wiki.hpccsystems.com/ HPCC Systems Wiki] | ||
+ | *[https://hpccsystems.com/bb/ HPCC论坛] | ||
*[http://sourceforge.net/projects/hpccsystems/ HPCC VM下载] | *[http://sourceforge.net/projects/hpccsystems/ HPCC VM下载] | ||
*[http://docs.huihoo.com/hpcc HPCC文档] | *[http://docs.huihoo.com/hpcc HPCC文档] | ||
− | *[https://github.com/hpcc-systems HPCC | + | *[https://github.com/hpcc-systems/HPCC-Platform/tree/master/docs HPCC文档DocBook仓库] |
− | + | ||
*[https://aws.hpccsystems.com/ HPCC Systems on Amazon Web Services] | *[https://aws.hpccsystems.com/ HPCC Systems on Amazon Web Services] | ||
第58行: | 第198行: | ||
[[category:big data]] | [[category:big data]] | ||
[[category:data analysis]] | [[category:data analysis]] | ||
+ | [[category:machine learning]] | ||
[[category:c++]] | [[category:c++]] | ||
+ | [[category:Huihoo Foundation]] |
2024年1月14日 (日) 02:40的最后版本
您可以在Wikipedia上了解到此条目的英文信息 HPCC Thanks, Wikipedia. |
HPCC
目录 |
[编辑] 新闻
- 庆祝为HPCC开源社区服务10年 (June 15, 2021)
[编辑] 简介
HPCC (High-Performance Computing Cluster), 也称为 DAS (Data Analytics Supercomputer) 是一个开源(Apache v2)的大数据处理和分析平台,使用 C++ 和 ECL 开发。
- 一套C++开发的大数据处理和分析平台;
- ECL (Enterprise Control Language) 和KEL (Knowledge Engineering Language) 是两个High Level的脚本语言;
- HPCC比Apache Hadoop历史更加悠久,在并行架构上也有自己的独到之处:Data Parallelism、Pipeline Parallelism、System Parallelism
[编辑] 版本
[编辑] 组件
HPCC Systems 包括以下核心组件:
- Thor (the Data Refinery Cluster)
- Roxie (Rapid Online XML Inquiry Engine, the Query Cluster)
- ECL (Enterprise Control Language)
- ECL IDE
- ESP (Enterprise Services Platform)
[编辑] 平行
Parallelism Architecture:
- Data Parallelism
- Component Parallelism
- Pipeline Parallelism
- System Parallelism
[编辑] 指南
下载 HPCC 虚拟机 快速启动。
http://127.0.0.1:8010/
HPCC配置管理器
sudo /opt/HPCCSystems/sbin/configmgr http://localhost:8015
[编辑] 项目
[编辑] 机器学习
- Core ECL Machine Learning library ECL编写
- Machine Learning Library Reference
- Developing Machine Learning Algorithms on HPCC/ECL Platform
- Unsupervised Learning and Image Classification in High Performance Computing Cluster
- Optimizing Supervised Machine Learning Algorithms and Implementing Deep Learning in HPCC Systems
- Embedding TensorFlow Operations in ECL
[编辑] Identity&Risk
- 2016 North American Healthcare Identity Management Technology Innovation Award
- Security and Privacy in a Big Data World
- Crowdsourcing large scale identity theft and fraud to make bucket loads of easy money
- Data Analytics Governance and Ethics
- Data Analytics in Cyber-Security and Threat Intelligence
[编辑] 可视化
HPCC Visualization Framework JavaScript编写
[编辑] ECL
声明性的、模块化的、可扩展的企业控制语言(ECL)是专为处理大数据而设计的。ECL代码编译成优化的C++,并且可以利用C++库方便地扩展。
我可以写4行ECL代码来替代SQL中的200行。这使得阅读,理解和维护代码变得非常容易。- Adwait Joshi, DataSeers公司CEO
[编辑] UDF
用户可通过Java, Python, C++ 和 R创建自己的User Defined Functions (UDF)。
[编辑] ECL IDE
[编辑] ECL Watch
ECL Watch 是运行在Enterprise Services Platform (ESP)的一个服务,是HPCC平台的一个中间件组件。
ECL Watch Candidate-6.4.0 源代码 JavaScript编写。
[编辑] ESDL
ESDL (Enterprise Service Description Language)
[编辑] SALT
SALT: Scalable Automated Linking Technology 提供:
- 连接和聚类 (MDM)
- 数据归档、清洗、规范、标准化
- 复杂的特性和基于连接和聚类的关系
[编辑] Thor
Thor (the Data Refinery Cluster), Thor 集群负责复杂的数据处理。
Thor,数据提炼引擎,是提取和补充数据的引擎。
- Thor 使用主从拓扑,其从机提供本地化的数据存储和处理能力,主机监控和协调从机的活动,并传递任务状态信息。
- 中间组件提供命名服务和其它服务,以辅助执行分布式任务。
[编辑] Roxie
Roxie (Rapid Online XML Inquiry Engine), ROXIE 集群负责数据查询和报告。
ROXIE,数据传送引擎,提供了高性能的在线处理和数据仓库功能。
- 每一个ROXIE节点会启动一个服务器进程和一个代理进程。这个服务器进程会处理用户传入的查询请求,并将查询任务分配给ROXIE集群相应的代理,校对结果,最后将有效负载返回给客户端。
- 查询可能包括数据联接和其它复杂数据转换,有效负载可以包含结构化或非结构化的数据。
[编辑] Interlok
Interlok: Seamless Data Integration
[编辑] KEL
KEL: Knowledge Engineering Language
社交图
[编辑] ESP
ESP (Enterprise Services Platform)
[编辑] DFS
分布式文件系统 (DFS)
- Thor DFS 是面向数据记录而设计的,并针对大数据ETL(提取-转换-加载) 进行了优化。数据记录存在于大数据输入文件中,可能是标准格式或是自定义格式,可能是定长或是不定长。大数据输入文件会在集群的 DFS 中进行分区,每一个节点都会获得大致相同数量的数据记录,并且单独记录不会被分割。
- ROXIE DFS 基于索引,并针对并发查询处理进行了优化。该系统基于自定义B+树结构,可以实现快速、高效的数据摄取。
[编辑] Nagios
HPCC使用Nagios进行系统监控。
[编辑] Ganglia
HPCC使用Ganglia提供监控和报表。
[编辑] Hadoop
[编辑] Cassandra
[编辑] Kafka
[编辑] AWS
[编辑] 用户
[编辑] 文档
- Introduction to HPCC
- ECL程序员中文指南
- ECL Programmers Guide
- ECL Best Practices
- ECL Language Reference 400多页
- ECL Standard Library Reference
- Dynamic ESDL (Enterprise Service Description Language)
- Installing & Running the HPCC Platform