Also availabe in/Auch verfügbar auf: English (Englisch)
Dieser Post wird so schnell wie möglich in Deutscher Sprache übersetzt.
More and more the word „Big data“ is heard. Iit seems like it has become the talk of the town. In this post I want to write a bit about what big data are and I also want to go back to the times of Business Objects which are replaced now by the new apps of SAP. I also want to show that the cards, which show data in an Fiori Overview Page, are also used in native (non SAP) web interfaces.
Big Data are not new. In fact data have been collected and analyzed over centuries. I have found a great article about the history of Big Data and if you are interested to explore the evlution of data, click the bar below.
What is Big Data?
When doing research about Big Data, one discovers that there are many definitions. It reminds me a bit of Fiori and the Apps. There are SAPUI5 Apps and there are Fiori Apps. Mostly people mix up these terms and it is quite unknown what are the differences. This also applies to the term Big Data.
Some experts say that it is about large quantities and the challenge to extract important informations to gather knowledge about (just some examples):
- client behaviour
- client preferences
- process data
And some people say, that it is not ablout the quantiy of the data, but the quality of the analytical results. Which theory is true? In my opintion it is both. Large amounts of data are needed, to get results that are really relevant and detailed.
Big Data can have many sources
- Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data on a day to day basis as they have billions of users worldwide.
- E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users buying trends can be traced.
- Weather Station: All the weather station and satellite gives very huge data which are stored and manipulated to forecast weather.
- Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish their plans and for this they store the data of its million users.
- Share Market: Stock exchange across the world generates huge amount of data through its daily transaction.
To keep it simple, Big Data provide informations which are important to operate a successful business. By analyzing
An e-commerce site XYZ (having 100 million users) wants to offer a gift voucher of 100$ to its top 10 customers who have spent the most in the previous year.Moreover, they want to find the buying trend of these customers so that company can suggest more items related to them.
Another case is online advertising. After buying a pair of jeans, I see banners with offers for jeans on every site I visit. At the moment a team is testing a new algorythm to offer realted articles which are an addition to the jeans I bought (i.e. sneakers, t-shirts, jackets). To „know“ that I bought a pair of jeans, Google stores my DataArtificial intelligence (AI) and Internet of Things (IOT) will be of importance in the future, to use Big Data optimal.
Huge amount of unstructured data which needs to be stored, processed and analyzed.
Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File System) which uses commodity hardware to form clusters and store data in a distributed fashion. It works on Write once, read many times principle.
Processing: Map Reduce paradigm is applied to data distributed over network to find the required output.
Analyze: Pig, Hive can be used to analyze the data.
Cost: Hadoop is open source so the cost is no more an issue.
Hadoop (also known as Apache Hadoop) is an open source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.
Hadoop is designed to be robust, in that your Big Data applications will continue to run even when individual servers or clusters fail.
History of Hadoop
As the World Wide Web grew in the late 1900s and early 2000s, search engines and indexes were created to help locate relevant information amid the text-based content. In the early years, search results were returned by humans. But as the web grew from dozens to millions of pages, automation was needed. Web crawlers were created, many as university-led research projects, and search engine start-ups took off (Yahoo, AltaVista, etc.).
Importance of Hadoop
- Ability to store and process huge amounts of any kind of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration.
- Computing power. Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have.
- Fault tolerance. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically.
- Flexibility. Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos.
- Low cost. The open-source framework is free and uses commodity hardware to store large quantities of data.
- Scalability. You can easily grow your system to handle more data simply by adding nodes. Little administration is required.
Hadoop is not a database:
Hadoop an efficient distributed file system and not a database. It is designed specifically for information that comes in many forms, such as server log files or personal productivity documents. Anything that can be stored as a file can be placed in a Hadoop repository.
Hadoop is designed to run on a large number of machines that don’t share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. When you want to load all of your organization’s data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There’s no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides. And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy.
Architecturally, the reason you’re able to deal with lots of data is because Hadoop spreads it out. And the reason you’re able to ask complicated computational questions is because you’ve got all of these processors, working in parallel, harnessed together.
„Hadoop” was the name of a yellow toy elephant owned by the son of one of its inventors 😉
Hadoop with SAP HANA Database
Hadoop can be very helpfull when working with Big Data. So cthe combination of Hadoop with a HANA database builds a strong team.
What is a HANA database?
SAP HANA is an in-memory database
- It is a combination of hardware and software made to process massive real time data using In-Memory computing.
- It combines row-based, column-based database technology.
- Data now resides in main-memory (RAM) and no longer on a hard disk.
- It’s best suited for performing real-time analytics, and developing and deploying real-time applications.
The speed advantages offered by this RAM storage system are further accelerated by the use of multi-core CPUs, and multiple CPUs per board, and multiple boards per server appliance.
Complex calculations on data are not carried out in the application layer, but are moved to the database.
SAP HANA is equipped with multiengine query processing environment which supports relational as well as graphical and text data within same system. It provides features that support significant processing speed, handle huge data sizes and text mining capabilities.
But the main challenge with Hadoop is getting information out of this huge data in real time.
We also have SAP HANA and as we all know that HANA is well suited for processing data in Real time. Hence SAP HANA and Hadoop is a perfect match.
To get real time information from massive storage such as Hadoop, we can use HANA and HANA can be directly integrated to Hadoop.
So we can combine Hadoop and HANA to get real time information from huge data.
With the help of SAP HANA Hadoop Integration we can also combine both structured and un-structured data. Structured and un-structured data are combined and transferred to SAP HANA via a Hadoop / HANA Connector.
In which way Hadoop is beneficial to HANA?
- Cost efficient data storage and processing for large volumes of structured, semi-structured, and unstructured data such as web logs, machine data, text data, call data records (CDRs), audio, video data.
- Batch Processing
- Where fast response times are less critical than reliability and scalability.
- Complex Information Processing
- Enable heavily recursive algorithms, machine learning, & queries that cannot be easily expressed in SQL.
- Low Value Data Archive & Data stays available, though access is slower.
- Post-hoc Analysis
- Mine raw data that is either schema-less or where schema changes over time.
Integration SAP HANA & Hadoop
Hadoop is considered as one of the best in storing the structured, semi-structured and unstructured data.
Combined structured and un-structured data are transferred to SAP HANA via a Hadoop / HANA Connector. BODS is one of main way to pull data to HANA.
SAP has also set up a „big-data“ partner council, which will work to provide products that make use of HANA and Hadoop. One of the key partners is Cloudera. SAP wants it to be easy to connect to data, whether it’s in SAP software or software from another vendor.
Big Data, IoT, and AI
Big data, as defined in Wikipedia, is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Since the data sets are so huge, the challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. However, it’s more likely used to reference predictive analytics, user behavior analytics, advanced data methods (including artificial intelligence) rather than simply the size of the data set.
In 2017, expect the emergence of blockchain technology applications, especially smart contracts which are contracts written in code in a ledger system. These are typically more secure and irreversible than traditional contracts, but creates efficiencies in referencing and executing these contracts.
Separately, the rise of data-as-a-self-service solutions will also enable organizations to analyze their data without the need to build a data science department. This will be extremely valuable to SMEs who don’t have the budget to hire a data scientist, a profession in high demand in 2016.
There was also a rapid decline in the use of Hadoop, a framework that allows for the distributed processing of large data sets as hiring requisite talent to support this framework in house has proved to be challenging. There is also a preference for using applications on the cloud to reduce spend on their data centers which makes the data-as-a-self-service model popular.
As research firm Gartner Inc. noted in its Magic Quadrant for Data Management Solutions for Analytics, “Expectations are now turning to the cloud as an alternative deployment option, because of its flexibility, agility and operational pricing models.”
As a result of this, expect insight to become more accessible to people below C-level executives as more companies empower employees with the right knowledge gained from structured and unstructured data.
This is a double edged sword though as the evolution of Big Data technology creates an expectation from executives to have their data immediately rather than wanting to wait for batch analytics reports. So there’s pressure to deliver actionable analytics faster on near real-time data.
Internet Of Things (IOT):
Forbes describes the Internet of Things as the the concept of connecting any device with an on and off switch to the internet (and/or to each other). If the device has an on and off switch, it can probably be configured to be a part of IoT.
Think “smart home” devices like a lock that unlocks when it detects your phone near it, or maybe a light that only switches on when it detects movement.
In 2016, we saw noise from a lot of vendors with similar solutions. In 2017, we can expect some of these vendors to emerge victorious which will lead to less vendors in the market. As with the reduction in vendors, we can also expect regulation and standardization to come into play to move us into simpler and more cohesive solutions. Coming with it is also security concerns because an IoT cyber attack took down a power grid in western Ukraine last year. Surely, research regarding the car hacking of self-driving cars also cause concern so 2017 will likely bring about security measures for IoT.
Right now, we’re experiencing a lot of fragmentation in the IoT market, but hopefully the picture will become clearer as the rest of 2017 unfolds and IoT solutions will become more integrated and part of open ecosystems and platforms that foster interoperability and offer services based on combined data coming from multiple devices and sources.
Two main areas applications will probably be a focus for IoT, namely smart cities and smart homes. However, in the smart home department, since bandwidth is a prerequisite for any IoT technology to work, expect a surge in mesh or mesh-like products with simpler network management this year.
That’s exactly what Errett Kroeter, vice president of brand and developer marketing for not-for-profit Bluetooth Special Interest Group, is hoping for. “Some other standards for meshing right now are notoriously difficult to set up. Our goal is to keep mesh networks simple so that people will actually want to use them.”
Finally, growth in IoT – in combination with other devices and systems generating vast amounts of data – is accelerating the need for artificial intelligence to create meaning out of this information.
Artificial Intelligence (AI):
The dictionary definition of artificial intelligence is the capability of a machine to imitate intelligent human behavior. While we have seen much growth with AI back in 2016, we’ll see further growth in 2017. Back in 2016, we learned that Amazon’s Alexa, which manifests artificial intelligence in the form of being able to speak human language, is now present in over five million homes. You can ask Alexa about the weather or tell her to order you a taxi, and she’ll respond. This means that last year, AI hit mainstream adoption.
However, there are a lot more developments of artificial intelligence in the healthcare sector. Healthcare focussed AI startups grew from 20 in 2012 to almost 70 in 2016. Apparently the big ones to watch out for is iCarbonX, which is aimed at building an ecosystem of digital life to bring about a personalized health management system and Flatiron Health, which is aiming to fight cancer with organized data and helping oncologists enhance care.
At health technology giant Philips, about sixty percent of the researchers, developers, and software engineers today are working on innovations in healthcare informatics and a big part of them are looking into the application of artificial intelligence in current and future healthcare innovations.
Trends in applications for healthcare artificial intelligence focus on imaging and diagnostics where AI can help find subtle details and changes in images that people cannot see. This is increasingly becoming a crowded sector. But also helping to prevent health deterioration with both healthy people and people at risk or living with a chronic condition using large data sets is an area of focus.
Jeroen Tas, Chief Innovation and Strategy Officer at Philips “sees a valuable role for AI to support radiologists in preparing relevant information for the case and identifying subtle changes in a patient’s condition. Another domain is the intensive care units where AI can help identify early signs of deterioration or onset of acute events, like cardiac arrest or scepsis.”
Tas also claims that “richer patient pictures can be created with combining genetic information with pathology, medical images, lab results family history data, other conditions and previous treatments that did or did not work. This data can be organized with the help of AI to add important additional context helpful to aid clinicians make more precise diagnosis and support personalized treatment choices”.
A multidisciplinary team of software engineers, designers, and other experts, it seems, has created and introduced the first validated application for radiologists. In remote patient monitoring, AI can enable virtual care and include virtual nursing assistants
SAP Business Intelligence (BI)
SAP Business Intelligence (SAP BI) is the data warehouse solution from SAP. It is a central application for the entire enterprise where relevant data for reporting can be extracted from various sources and displayed in a meaningful format for evaluation.
SAP BI is a system and enterprise-wide evaluation tool for reporting and takes on the role as the „single point of truth“ for the company. Reporting capability is endless with the ability to extract data from a variety of sources, some of these include: Source systems and data providers in both SAP and non-SAP systems, databases, local files, web services and, in special ‘tailor-made’ solutions (eg. planning), even manual input by the user is possible.
Although it is not the primary purpose of a BI system, SAP BI also has the ability to export processed and consolidated data to external applications and tools, other BI systems, government agencies, etc.
A data mining environment for modelling of analytical processes is also an integral part of SAP BI.
A suite of front-end tools are also available that meet various reporting requirements from a data and graphical perspective. These tools enable the preparation of reports from simple tabular representations to highly complex graphical cockpits and mobile applications.
In the BI environment, a variety of applications are available (most notably, the tools for enterprise planning and consolidation), which are based on BI technologies and utilise the centralised data of BI.
SAP HANA, with its additional BI capabilities and high performance database, can also be used with a BI system.
Alternatively ‘Sybase IQ’, through a certified interface („CBW NLS IQ“), can be used as near-line storage for archiving and providing BI data for external applications and tools. It also can enhance the performance of your BI reports
SAP Business Objects (BO)
Let’s go back in time a bit
SAP BusinessObjects (BO or BOBJ) is an enterprise software company, specializing in business intelligence (BI). BusinessObjects was acquired in 2007 by German company SAP AG. The company claimed more than 46,000 customers in its final earnings release prior to being acquired by SAP. Its flagship product is BusinessObjects XI, with components that provide performance management, planning, reporting, query and analysis, and enterprise information management. BusinessObjects also offers consulting and education services to help customers deploy its business intelligence projects. Other toolsets enable universes (the BusinessObjects name for a semantic layer between the physical data store and the front-end reporting tool) and ready-written reports to be stored centrally and made selectively available to communities of the users.
Business objects offered the feature to visualize data.
I’m so bold to say, that the days of BO are gone since the exsistance of the new HANA database. This new database is extremely fast due t its own server, its in-build memory and its three dimensionality. Therefor data can be gathered and displayed extremely quick and changes will be shown in real time on cards in a Fioti Overview Page.
Fiori Overview Pages
The Fiori launcpad is the new working environment for SAP users. According to the tasks and th role of an employee, he or she will see only thos apps whhich are relevant to his or her tasks. Those apps are reprented by tiles. By clicking a tile, the user will be forwarded to the interface in which the tasks can be performed.
One of those apps are mainly developed for displaying informations (it is possible to add interaction functionalities, but that would go to deep in the subject) which are called Overview Pages. In my opinion, those OVP will be one of the most wanted and most important apps. By clicking the tile to show the OVP, a next vies will appear displaying cards with multiple informations about one or more congate subjects.
Cards in websites
Cards can display pies, donuts, columns, bars, geo. lists, tables with information. Thoe cards are an invention of Google. When Material Design was introduced, one of the main features was to achieve a simple, convienent and clear information flow. This was realised by using cards. Many websites took over this apporach.
Here’s an example of an elegant card layout (to see the image in full size, click the enlarge button in the right top corner).
But those cards can also be used to display informations for business. Especially since the new Database of SAP was introduced. The speed of HANA Database is extremely high due to its own server and in-build memory. The data will be presented in no time and even be updated in real-time. With these features available, analysing and processing of Big Data won’t be a problem in the future.
Our daily life will become more and more influenced by the Big Data gathered by multinationals like Facebook, Amazon, Telekom, Vodafone etc., by the Internet Of Things and the Artificial Intelligence that comes with it.