

making it accessible
to the user in a timely
manner, and ECL is
the language which
allows the user to
perform queries on the
data in question. One
area where the HPCC
platform is not yet fully
mature, however, is the
domain of machine
learning (ML). Although HPCC includes
some basic ML modules, many of the most
commonly-used approaches in the field have
yet to be implemented. The project objective is
to extend ECL/HPCC to perform classification
and regression using a wider range of ML
From bioinformatics to social computing to
document mining, whole new research areas
exist today which were not possible even 20
years ago. This research demands large-scale
systems to both manage and process huge
quantities of data. Many traditional approaches
fail when dealing with multi-gigabyte datasets,
preventing researchers and practitioners from
fully benefiting from the data.
The High Performance Cluster Computing
(HPCC) architecture, which was developed in
conjunctionwith theECLprogramming language,
is LexisNexis’s answer to this challenge. This
system has two essential functions for working
with Big Data: HPCC is a cluster backend which
stores and manages large quantities of data,
Developing Machine Learning
Algorithms on HPCC/ECL Platform
Taghi M. Khoshgoftaar, PI
l
Student: Victor Herrera
5 0
p ro j e ct 1 0
algorithms. Furthermore, we will implement our
own algorithms in ECL, to make them widely
available for a larger user base. With these
additions, the HPCC/ECL platform will be fully
prepared to take on the challenges posed by Big
Data and permit a new scale of research.
Industry partner interested in
this project: LexisNexis