Posts

Exploring the SAP Data Lake – Architecture and Features

Image
 SAP launched the HANA Data Lake in April 2020 in a bid to further strengthen its cloud-based ecosystem and provide customers with a cost-effective and highly optimized storage system. The package of benefits included a native storage extension and a relational SAP data lake. Soon the data lake was considered to be in the same league with the leaders in the niche, namely Amazon S3 and Microsoft Azure because of its powerful data processing capabilities and functionalities.  The architecture of the SAP data lake is like a pyramid. At the top is stored data that is critical for businesses and used constantly. The cost of storing this data is, therefore, the highest on SAP data lake. The middle of the pyramid stores data that is not regularly used but important enough not to be deleted. This data is not as high performing as the top tier and access requirements are also quite low. The storage cost of this tier is significantly less. The bottom of the pyramid holds data that is hardly e

Exploring the Functions of the Amazon S3 Data Lake

Image
 Amazon S3 (Simple Storage Service) is a cloud-based data storage service that stores unstructured, semi-structured, and structured data in their native formats. Data durability of S3 is at an amazing 99.999999999 (11 9s) with data protected in a highly optimized and secure environment. Data files having metadata and objects are stored in buckets. For uploading metadata or files, the object has to be uploaded to Amazon S3 after which permissions can be granted on the metadata or the related objects in the buckets. Several competencies are used for building an S3 data lake on Amazon S3. These include media data processing applications, Artificial Intelligence (AI), Machine Learning (ML), big data analytics, and high-performance computing (HPC). All these together help businesses get access to vital business intelligence and analytics from the S3 data lake and unstructured data sets. A benefit of S3 data lake is that computing and storage facilities are in different silos. Hence, all da