Strategic Research Program in Data Science


Data Science is the study of the generalizable extraction of knowledge from data, the key word being science in this definition.

It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.

Data Science is not restricted to only big data, although the fact that data is scaling up makes big data an important aspect thereof.

At the same time this new field of scientific study has to address different sensitivities, including:

  • Social sensitivity: as recent international problems with the NSA in the USA proved, data privacy is a socially sensitive issue. Every citizen has the right to keep her or his data private, and at the same time understand the implications of big data analytics.
  • Strategic sensitivity: big data is like big oil. More and more companies’ competitive edge will depend on a thorough understanding of the relevant methodologies and their implications.

The above are complemented by different kinds of data analytics, including predictive analytics: market penetration and sustainable company strategies depend on the right decisions at the right time. With increasing amounts of market and competitor behaviour data to analyse for decision makers, predictions become vulnerable without the right kind of know-how.


  • Carry out interdisciplinary research in the intersection of high performance computing (HPC), data analytics, information visualization, and machine learning;
  • Combine solution development for companies with theory development for the academe;
  • Generate output for curriculum development and science education as feedback to society.

Major application domains of the above research components include, but are not limited to, biology, drug discovery, digital preservation, cultural studies, automation, business logistics, etc.

More about the project in Data Science.


Sándor DarányiResearch leader: Sándor Darányi, professor in Information Science.