i-Space name: IOP4HPDA – Italian Open Platform for High Performance Data Analysis
Main Organization: Cineca, via Magnanelli 6/3, 40033, Casalecchio di Reno (BO), Italy
Partners: UniMORE DBGroup
Description: Cineca is a non-profit organization, founded in 1969 as a Consortium of Universities. With its High Performance Computing (HPC) facility, and with excellent scientific skills, Cineca supports the world of the public and private research: it is the most powerful supercomputing center in Italy devoted to scientific and industrial research, and one of the most important worldwide. Moreover, Cineca develops IT systems for universities administration offices, for the MIUR, and for companies, health care Institutions, and public administration.
In particular the mission of the HPC department of Cineca is to accelerate the scientific discovery by providing HPC resources, data management and storage systems and tools. It also provides expertise on numerical simulation and data science in an Open Innovation paradigm. Member of BDVA, ETP4HPC, PRACE, core partner in EUDAT and Elixir, partner in Human Brain, Fortissimo2 and I4MS, in 2016 the HPC department of Cineca supported 1140 research projects and directly participated in 31 EU research projects, 40 research agreements and 12 industry projects.
The IOP4HPDA is made available by the HPC department of Cineca, as an environment for Big Data for research and innovation.
The DBGroup is the research database group at the Department of Engineering “Enzo Ferrari” of the University of Modena and Reggio Emilia and it contributes to the IOP4HPDA with researchers and tools.
Contact person: firstname.lastname@example.org
Hardware 20 Pflops performance; 360.000 cores (75.000 servers with 48 cores per server); 80 Nvidia Tesla K80; 200 Nvidia Tesla P100; 4 FPGA Intel DLIA; storage: 30 PB online and 30 PB offline; total RAM 1 PB; network: internal 100 Gb/s OPA 25 Gb/s ETH, external 10 Gb/s.
Software Big Data Apache suite AAS; Deep Learning: Caffe, Theano, TensorFlow optimized for platform hardware characteristics in collaboration with Intel and Nvidia; Data Analytics: R, H2O, Octave, math libraries, I/O libraries; organized data repository for data deposit, retrieval and preservation; high performance data base PGSQL, MySQL, NEO4J; specific tools for bioinformatics (NGS pipelines) and for visualization.
The IOP4HPDA enables, supports and promotes the exploitation and development of Big Data applications by providing the following services:
• Exploitation of the Infrastructure
open access to HPC/HPDA storage and computing resources, Cloud Computing, Computing in batch, interactive and streaming modes
• Advanced middleware and software tools
• Data management
collection, preparation, annotation, curation, linking, security, access control, long-term preservation, post-processing
• Data analytics
Predictive modeling, Supervised and unsupervised learning, Association rules, Sequential patterns, Link analysis, Recommenders, Natural Language Processing, Named Entities Recognition, Information Extraction, Automatic classification, Sentiment Analysis, Semantic metadata generation, Automatic annotation, Speaker segmentation, Automatic Speech Recognition, Video segmentation, Keyframes extraction, Semantic metadata generation from video items, Image recognition
Remote visualization, Computer vision and visual computing, Computer Graphics, 3d modeling and rendering, Immersive device programming , Render farm service, Virtual Reality, Augmented Reality, Virtual museum and exhibition design
• User support and Specialist support covering different scientific fields, technologies, programming languages, and techniques
• Training and Education
Specialized training (workshops on massive data analysis and international summer schools on parallel computing, data analytics and computer graphics), Cooperation with universities (lab activity for master programs, post-doc programs), Knowledge transfer during the projects life cycle
• Technology transfer and consulting
Development of proof of concept and innovation projects for businesses to demonstrate the added value and ROIs
Risk Management Code Optimization for a Large Insurance Company
The risk assessment in the life insurance field may require considerable computing power.
The algorithm that the Large Insurance Company was using took many hours and would not allow them to calculate the risk measurement with a nested Monte Carlo approach. In fact, nested Monte Carlo,
involve two stages, scenario generation (outer stage) and portfolio re-valuation (inner stage),
that produce millions of Monte Carlo trajectories to be executed for each of the millions of life policies.
The simulation becomes very quickly a computational challenge. The Large Insurance Company contacted the HPC department of Cineca – IOP4HPDA for a PoC to demonstrate the improved efficiency that could be obtained with efficient code parallelization and optimization. Nested Monte Carlo with parameters 100000x100 for all the 12M of policies could be achieved. The Large Insurance Company then decided to establish a commercial contract with Cineca for the provision of the service.
Sequential patterns of errors from on board diagnostic devices for TEXA, European leader company on electronic diagnostic
In the PRESERVE project, which has been funded within the Fortissimo EU project, sensor data from TEXA on-board diagnostic tools have been analyzed in order to identify the driving habits on one hand, and patterns of operating parameters that are predictive of failures and damages on the other hand.
The result is a portfolio of prototypes of service that can predict failures, mechanic problems or damages at the components level and offer the manufacturer very detailed information to better re-design or upgrade spare-parts or vehicle. The Return on Innovation Investment (ROI2) for Texa from this project has been estimated as 2,72.
Internet as a data source for ISTAT, the Italian National Statistical Office
The HPC department of Cineca – IOP4HPDA started a collaboration with ISTAT to investigate the use of web scraping techniques, associated with text mining algorithms, in order to replace traditional tools of data collection and estimation, and/or to combine them into an integrated strategy.
In the first experiment, a sample of 8,600 enterprises’ websites was “scraped” and the acquired texts (more than 200 M textual records) were processed in order to extract the same information that is provided by the standard questionnaire “Survey on ICT Usage and e-Commerce in Enterprises”. The results were encouraging, with a satisfactory predictive capability of the fitted models. Within this experiment, the HPC department of Cineca – IOP4HPDA developed novel methods for extracting information from unstructured data and the usage of supercomputers resulted in significantly reducing the needed computational time. As a result a more extensive application of the approach in a “Census like manner”, by considering all the Italian enterprises, has been performed and ISTAT is now evaluating the production phase.
Understanding visitors experience at the Caserta Royal Palace
Aiming to promote its cultural heritage, the Caserta Royal Palace has become a “playground” for experimenting innovative technologies using 3D reconstruction, big data and sentiment analysis. Beginning with the modeling of the “Terrae Motus” exhibition, the project aims to create a “sentiment room” where it will be possible to browse digital contents, and sentiments, about the Royal Palace.
In this context, data from social media were collected and analyzed to decipher emotions, opinions and judgments and to provide the Caserta Royal Palace with a real-time reputation monitoring system that is also interactive on historical data and past events. The system tracks the topics being discussed and the sentiments being expressed and can be used to assess the impact of events and communication strategies.
Tax Fraud Detection for SOGEI, the Italian Revenue Agency Computing Centre
Cineca, with its IOP4HPDA data scientists, developed predictive models of the fraudulent behavior of companies in the entailment of tax credit and provided methodological solutions for impact and compliance assessment, in particular relating to training sample bias and model estimation and evaluation. The fraudulent behavior model enabled to increase the auditing success rate from 39% to 65% (precision).
Managing scientific data for various scientific communities
Among the scientific research projects that the HPC department of Cineca supports many can be reported as being both very successful and data intensive projects, eg EMODnet (European Marine Observation and Data Network) and SPHINX (Data Storage and Preservation of High resolution climate experiments)