Welcome to the Delta Lake Series, where we delve into the fascinating world of delta tables and explore how to effortlessly create them using SQL commands
Before we delve into practical examples, let's first understand the different types of tables that can be created in Delta Lake:
- Managed Tables:
- They are defined without specific location details, and their data files reside within the location configured for the Metastore.
- Beware, dropping these tables is a challenging feat! It erases the metadata (schema information) from the catalog and vanquishes the underlying data and its file folders, leaving no trace behind.
- External Tables:
- External tables, on the other hand, are defined with a specified location to which the data should be saved.
- Spark, the wise guardian, handles only the metadata information of these tables. Thus, when one decides to part ways with external tables, it's only the metadata that bids farewell, while the precious data and its loyal file folders stand firm.
In this series, we will walk you through the step-by-step process of creating and managing both managed and external delta tables using SQL. Whether you're a beginner or an experienced data enthusiast, these comprehensive guides will equip you with the knowledge and skills to efficiently work with Delta Lake and leverage its full potential.
Behold, I present to you a captivating visual journey through the snapshots, where managed and external tables stand side by side. Not only will you gain insights into the various ways to create these tables, but you shall also unravel the subtle yet significant differences in their code.
<<please click on each image to see its text clearly>>
To illuminate the table drop and its actions on underlying data, I have thoughtfully inserted sample records into both tables as follows.
We see the data is stored in different locations for each table such as for Managed tables it is stored under Hive metastore and for External it is stored under the location which we defined while creating the table.
Upon dropping Managed tables, both the table information from the Spark catalog and the associated data files from the storage locations are permanently removed. In contrast, when an External table is dropped, only the table information is erased from the Spark catalog, while the cherished data files remain intact in their designated storage locations. This distinction highlights the resilience and data-preserving capabilities of External tables, granting us peace of mind 😉 when managing our valuable data assets.
Please share the blog if you like the content 🙏
No comments:
Post a Comment