Monday, December 30, 2013

New version of HCE 1.1 available

HCE project aim and main idea

This project became the successor of Associative Search Machine (ASM) full-text web search engine project that was developed from 2006 to 2012 by IOIX Ukraine.

The main idea of this new project – to implement the solution that can be used to: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results (aggregation, duplicates elimination, sorting and so on), internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language...

HCE application area

  • As a network infrastructure and messages transport layer provider –  the HCE can be used in any big-data solution that needs some custom network structure to build distributed high-performance easy scalable vertically and horizontally data processing or data-mining architecture.
  • As a native internally supported full text search engine interface  provider – the HCE can be used in web or corporate network solutions that needs smoothly integrated with usage of natural target project specific languages, fast and powerful full text search and NOSQL distributed data storage. Now the Sphinx (c) search engine with extended data model internally supported.
  • AS a Distributed Remote Command Execution service provider – the HCE can be used for automation of administration of many host servers in ensemble mode for OS and services deployment, maintenance and support tasks.

 Hierarchical Cluster as engine

  • Provides hierarchical cluster infrastructure – nodes connection schema, relations between nodes, roles of nodes, requests typification and data processing sequences algorithms, data sharding modes, and so on.
  • Provides network transport layer for data of client application and administration management messages.
  • Manages native supported integrated NOSQL data storage (Sphinx (c) search index and Distributed Remote Command Execution).
  • Collect, reduce and sort results of native and custom data processing.
  • Ready to support transactional messages processing.

 HCE key functional principles

  • Free network cluster structure architecture. Target applied project can construct specific schema of network relations that succeeds on hardware, load-balancing, file-over, fault-tolerance and another principles on the basis of one simple Lego-like engine.
  • Stable based on ZMQ sockets reversed client-server networking protocol with connection heart-beating, automated restoration and messages buffering.
  • Easy asynchronous connections handling with NUMA oriented architecture of messages handlers.
  • Unified I/O messages based on json format.
  • Ready to have client APIs bindings for many programmer languages covered by ZMQ library. Can be easily integrated and deployed.

HCE-node application

The heart and main component of the HCE project it is hce-node application. This application integrates complete set of base functionality to support network infrastructure, hierarchical cluster construction, full-text search system integration and so on, see “Hierarchical Cluster as engine” main points.
  • Implemented for Linux OS environment and distributed in form of source code tarball archive and Debian Linux binary package with dependencies packages.
  • Supports single instance configuration-less start or requires set of options that used to build correspondent network cluster architecture.
  • Supposes usage with client-side applications or integrated IPI.
  • First implementation of client-side API and cli utilities bind on PHP.

Hce-node roles in the cluster structure

Internally HCE-node application contains seven basic handler threads. Each handler acts as special black-box messages processor/dispatcher and used in
combination with other to work in one of five different roles of node:
  • Router – upper end-point of cluster hierarchy. Has three server-type connections. Handles client API, any kind of another node roles instances (typically, shard or replica managers) and admin connections.
  • Shard manager – intermediate-point of cluster hierarchy. Routes messages between upper and down layers. Uses data sharding and messages multicast dispatching algorithms. Has two server-type and one client connections.
  • Replica manager – the same as shard manager. Routes messages between upper and down layers uses data balancing and messages round-robin algorithms.
  • Replica – down end-point of cluster hierarchy. Data node, interacts with data storage and/or process data with target algorithm(s), provides interface with fill-text search engine, target host for Distributed Remote Commands Execution. Has one server- and one client-side connections used for cluster infrastructure Also can to have several data storage-dependent connections.

 hce-node typical connection points

Hce-node internal architecture

seven handlers objects in relations

 hce-node in cluster structure relations

Simple cluster structures comparison