Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this course, you will learn how to design a data implementation using the appropriate Azure services. Two services that are especially important are Azure SQL Database and Azure Cosmos DB. Azure SQL Database is a managed service for hosting SQL Server databases (although it’s not 100% compatible with SQL Server). Even though Microsoft takes care of the maintenance, you still need to choose the right options to scale it and make sure it can survive failures. Azure Cosmos DB is the first multi-model database that’s offered as a global cloud service. It can store and query documents, NoSQL tables, graphs, and columnar data. To get the most out of Cosmos DB, you need to know which consistency and performance guarantees to choose, as well as how to make it globally reliable. services that don’t fall into the usual categories of basic storage, transactional databases, or NoSQL datastores. These services help you work with data in other ways, such as finding, transforming, and analyzing it. First up is Azure Data Catalog. Organizations typically have so much data in so many different places that it’s hard to find what you’re looking for. The purpose of Data Catalog is to act as an index to all of those data sources, so you can discover them. Of course, in order for this to work, your employees need to register their data sources in the catalog. The data itself stays where it is, but its location and the metadata about it get added to the catalog. The metadata includes things like column names and data types. Users can also add additional information about a data source, such as a description or some tags. Once various data sources are registered, people can search the catalog to find what they’re looking for. They still need to open the data using another tool, though, since this is just a catalog. Another way to deal with pockets of data is to collect it all in either a data lake or a data warehouse. These serve two different, but related, needs. Data warehouses store data in structured, relational tables, while data lakes store any kind of data, whether it’s structured or not. For example, you could store everything from documents to images to social media streams. Data warehouses are generally used for business reporting, while data lakes are more often used for data analytics and exploration. In fact, one common setup is to process data in a data lake and then export it to a data warehouse. Both types of services are designed for performing massive queries at high speed. Azure Data Lake Storage is built on top of Azure Blob Storage, and it provides the additional capabilities needed for a modern data lake. Its most important feature is that it’s compatible with Hadoop and Spark, which are the most popular open-source software systems for doing data analytics. Azure Synapse Analytics (formerly known as SQL Data Warehouse) offers an interesting mix of data warehouse and data lake capabilities. If you need a data warehouse, you can create a SQL pool, which lets you run SQL queries on structured, relational tables. If you want a data lake, then you can create a Spark pool, which lets you use Spark to query both structured and unstructured data. Spark has become so popular that Microsoft has many services that let you use Spark for data analytics. In addition to Data Lake Storage and Synapse Analytics, you can also use Azure Databricks and Azure HDInsight. Databricks is a managed Spark implementation that was developed by the people who created Apache Spark. HDInsight supports a wide variety of open-source big data frameworks, including Hadoop, Spark, Hive, Storm, and many others.
Thanks Rajani for liking this.
ReplyDelete