Probabilistic digital system for the efficient exploration of huge datasets

Based on novel probabilistic computing techniques based on neurobiological behaviors, an ultra-fast data mining digital circuitry have been developed and patented. This circuitry implements a basic similarity search, which is the core of most data mining algorithms. In fact, similarity search is the traditional bottleneck for virtually all data mining algorithms.

Based on novel probabilistic computing techniques based on neurobiological behaviors, an ultra-fast data mining digital circuitry have been developed and patented. This circuitry implements a basic similarity search, which is the core of most data mining algorithms. In fact, similarity search is the traditional bottleneck for virtually all data mining algorithms.

The invention is the result of a Spanish research project entitled "Development and Implementation of High-Velocity Computation Systems using Pulsed Networks and its Application to Drug Discovery (DISCOVERY)", financed by the Spanish Ministry of Economy and Competitiveness.

Description

The system is able to achieve large performance values in the exploration of large datasets due to the huge parallelism that is achieved with probabilistic computing methodologies. The logic structures can be replicated hundred of times in a single FPGA in order to obtain a high parallelism with a minimum cost. The present invention can be applied to multiple scientific disciplines where useful information has to be extracted from huge databases.

The techniques that are used to implement the digital comparators are inspired in the neural behavior, based in the interaction of multiple action potentials in biological neural networks that, following the basic probabilistic laws, are able to compare and process in parallel big information sets.

Main advantages

The higher processing speed of the proposed techniques with respect traditional processor-based techniques is traduced in lower hardware and power requirements (and therefore lower costs) for a required processing speed

The technology can be applied to different disciplines in which useful information must be extracted from large datasets

Innovations

This system is a non-conventional design based on probabilistic logic rather than the traditional and deterministic binary logic. The key point of probabilistic computing is that they provide most probable results as an output instead of the exact result. The difference can be orders of magnitude in terms of computation time (or in terms of the financial resources needed to achieve an specific performance). With respect to the precision of the methodology, in the case of dealing with large datasets (where trillions of iteractions have to be done) the difference between the exact and most probable results is minimum.

The principal innovative aspect is the use of probabilistic techniques that are able to increase the processing speed with respect to traditional deterministic techniques. Those techniques are more appropiate for the processing of huge amounts of information in reasonable times.

Actual State

A prototype has been implemented in a PCle-based board containing for large-scale FPGAs, obtaining a performance of more than 100 millions of comparisons per second. Each comparison involves the treatment of 16 shape descriptors for each compound (with a total of 128 bits per compound). It and its use is protected by a patent and available for license.

The system can be adapted to any kind of descriptors. The objects to be recognized can be of any nature, the only requisite is that they must be described by a n-dimensional vector (being "n" any natural value).

In collaboration with: