Focus on Data Analytics: AOSP reflections from the “Big Data Analytics: Bridging the Gap between Theory and Practice” Conference, 12-14 Nov. 2018, Cairo, Egypt

Ancient Egypt has been the source of many innovations. It was therefore appropriate to further explore how the Egypt of today can innovate through using data, in order to embrace the Fourth Industrial Revolution (Artificial Intelligence, Robotics and the Internet of Things (IoT)). During the 3-day “Big Data Analytics: Bridging the Gap between Theory and Practice Conference” in Cairo, Egypt, the impact of data in the world we live in became a reality, through many examples shared by Egypt and industry leaders from across the globe. This international conference – which took place from 12-14 November 2018 in the Steinberger Hotel El Tahrir, Cairo – was hosted by the Bibliotheca Alexandrina, and organised and funded by the Serageldin Institute for Multi-disciplinary Advanced Research (SIMAR). Dr Serageldin is a member of the AOSP Advisory Council, and a passionate advocate for Open Access, Open Source, Open Data, Open Science, the African Open Science Platform, and more.

Although data acquisition, data cleansing, data storage and data management are all crucial activities part of the data curation lifecycle, it seems as if data analytics is the one stage which is highly challenging, currently in the process of rapidly turning to be the heart of the digital revolution. In his keynote, Michael Keller – Stanford University’s Vice Provost and University Librarian – shared examples of advanced analytical techniques such as Artificial Intelligence, Machine Learning and Neural Networking – in the humanities and social sciences.

In his presentation on Blockchain, Prof Amr El Abbadi, Professor of Computer Science, University of California Santa Barbara, gave a fascinating presentation on the basic protocols used in Blockchain. Referring to building blocks created for records, linked using cryptography, timestamps and transaction data, Blockchain is expected to benefit many kinds of “transactions”, including data transactions, making the transfer of data associated information also more secure. It is still early days for Blockchain, but it is looking promising.

Ahmed Ossama, Innovation Senior Manager at Dell EMC, explained that until recently, data scientists designed algorithms based on the assumption that the data being analysed would be moved to a single, centralised repository such as a data lake or a cloud data center. Apart from data growing at incredible rates because of the Internet of Things (IoT), social media and more, data are very distributed, scattered across geographical regions that span the globe, imposing severe constraints on how data can be analysed using Artificial Intelligence in the form of Machine Learning and Deep Learning. Edge native analytics was shared as a solution to address this challenge towards analysing the data in a cost-effective and cost-viable manner. An example of the kind of data that can benefit from Edge Computing, is the data collected through sensors in Smart Cities, to be deployed in highly populated Smart Cities of the future. With Edge Computing, Cloud Computing-like services are brought to the network Edge, and Edge Analytics is expected to complement Cloud Analytics in handling Big Data.

The Internet of Things (IoT) envisions a world where everyday objects are transformed into smart entities using sensors/actuators and other computing technologies. These smart objects are expected to generate Big Sensed Data (BSD). AI techniques can be used in the different parts of the data pipeline, from the sensor network to the analysis of the data to the application. Machine Learning can improve data collection efficiency and reduce network overhead. Real-time data processing, such as complex event processing, and a set of event detection techniques were discussed. Application domains that can benefit include air pollution monitoring, precision agriculture and water quality monitoring.

Many examples on how data analysis is used in Egypt and the Arabic language (through word embeddings) have been shared. Khaled AlAttar, Vice Minister for Digital Transformation, Automation and Administrative Development (MICT), demonstrated how Big Data is used to transform the policy-making process (data-driven policies) in Egypt, in support of the Egypt National Development Plan towards the implementation of the UN Sustainable Development Goals and the AU Agenda 2063. Data is evidence, to inform policymaking in social equity programmes, healthcare and universal coverage as well as policies related to economic activities, incl. the country comparative analysis.

The power of data truly came to life when it was shared how emoticons on social media are used to adapt systems to meet users’ needs. Sensitivity to emotions increasingly drives advertisements, learning processes, software coding, productivity and more.

In biomedical sciences, in order to draw a reasonable biological conclusion, genomes of thousands of individuals need to be studied. Biologists and computer scientists must get on board of Big Data. Recently the rapid advancement in next-generation sequencing (NGS) technologies generate humongous amounts of DNA data. Data analysis is becoming very expensive and challenging compared to the cost of data generation (cost of data generation 5x less than data analysis). Storage is also a challenge, e.g. the size of a single human genome is 140GB. The National Center for Biotechnology Information (NCBI) (public repository for biological data), contains petabytes of data, and increases annually by 15 petabytes. It is estimated that by the year 2020, annual growth will be around 44 trillion zettabytes. Challenges in terms of Big Data concern Volume, Velocity, and Variety. Biological data can be classified mainly into DNA and protein sequences, gene expression data, protein-protein interaction, and pathways data. As current bioinformatics algorithms and techniques won’t be able to handle such a rapidly growing huge varied data, Big Data platforms can be a key solution to overcome challenges.

The above just a selection from the many highlights of the past conference. The presentations from the conference are to be uploaded to the conference web soon. See https://www.bibalex.org/quantitativeanalysisconference/Home/StaticPage.aspx?page=14

The paper presented on behalf of the African Open Science Platform is online available: Research Data towards a Sustainable World.

Thank you to the Bibliotheca Alexandrina and Serageldin Institute for Multi-disciplinary Advanced Research (SIMAR) for the arrangements, support and more. It is an important step towards a better understanding of the needs to be addressed through a future African Open Science Platform (AOSP), aligning policy, capacity building, incentives and infrastructure to allow for advanced data computing, incl. analysis.

—————————————–

Useful Links:

ProgrammableWeb https://www.programmableweb.com/

IBM data technologies https://www.ibm.com/thought-leadership/smart/za-en/

DeepDive http://deepdive.stanford.edu/

GoodTables http://goodtables.io/

Kaggel https://www.kaggle.com/

P300-base Brain Image Viewer https://www.youtube.com/watch?v=a28Abkw7A7Y&list=PLQGWPROIVZ3ij1Fh5aZFOSbFmy5StBzgy

TOBi https://www.vodafone.co.uk/chatbot/

Egyptian Knowledge Bank http://www.asrt.sci.eg/index.php/ekb

HoloClean https://holoclean.github.io/gh-pages/index.html

Tamr https://www.tamr.com/

Trifacta https://www.trifacta.com/

CAPMAS https://en.wikipedia.org/wiki/Central_Agency_for_Public_Mobilization_and_Statistics