Table of Contents Table of Contents
Previous Page  50 / 64 Next Page
Show Menu
Previous Page 50 / 64 Next Page
Page Background

making it accessible

to the user in a timely

manner, and ECL is

the language which

allows the user to

perform queries on the

data in question. One

area where the HPCC

platform is not yet fully

mature, however, is the

domain of machine

learning (ML). Although HPCC includes

some basic ML modules, many of the most

commonly-used approaches in the field have

yet to be implemented. The project objective is

to extend ECL/HPCC to perform classification

and regression using a wider range of ML

From bioinformatics to social computing to

document mining, whole new research areas

exist today which were not possible even 20

years ago. This research demands large-scale

systems to both manage and process huge

quantities of data. Many traditional approaches

fail when dealing with multi-gigabyte datasets,

preventing researchers and practitioners from

fully benefiting from the data.

The High Performance Cluster Computing

(HPCC) architecture, which was developed in

conjunctionwith theECLprogramming language,

is LexisNexis’s answer to this challenge. This

system has two essential functions for working

with Big Data: HPCC is a cluster backend which

stores and manages large quantities of data,

Developing Machine Learning

Algorithms on HPCC/ECL Platform

Taghi M. Khoshgoftaar, PI


Student: Victor Herrera

5 0

p ro j e ct 1 0

algorithms. Furthermore, we will implement our

own algorithms in ECL, to make them widely

available for a larger user base. With these

additions, the HPCC/ECL platform will be fully

prepared to take on the challenges posed by Big

Data and permit a new scale of research.

Industry partner interested in

this project: LexisNexis