DeltaTable Constraints : NOT NULL

When creating any table, one of the crucial aspects is maintaining data integrity. Each column should adhere to specific rules and constraints to ensure accurate and reliable data. For instance, a company might provide employee IDs in number format and require that basic salary be at least 15K. In such cases, it becomes essential to enforce constraints on these columns, allowing only data that complies with the defined rules.

To accomplish this Delta lake offers four different constraints
  1. NOT NULL
  2. CHECK
  3. PRIMARY KEY 
  4. FOREIGN KEY
So, before delving into the details, it's essential to become familiar with the Metastore in Databricks/Azure Databricks.

In the Lake House architecture, the MetaStore plays a pivotal role by serving as a repository for metadata. As the name suggests, it contains information about all the objects defined within the lake house environment. Further, the Metastore categorized into two types

Built-in Hive Metastore:
  • Each Databricks or Azure Databricks resource contains one built-in Hive Metastore.
  • An instance of this metastore is attached to each cluster created in Databricks.
  • This legacy Hive Metastore provides less centralized access to the tables, and by default, all users connected to the cluster have access to the underlying objects.
  • Unless we have user access control set up, it is very difficult to manage user access.
  • Due to this, the legacy Hive Metastore supports only NOT NULL and CHECK constraints.
Unity Catalog:
  • This provides centralized user access control and audit features.
  • In this, the data is secured by default, which means users will not have access initially; either the catalog admin or owner of the object has to provide the user access.
  • Unlike the legacy metastore, this supports all constraints on the tables.

NOT NULL: In a database, the NOT NULL constraint ensures that the system rejects or throws an error when user attempt to insert new records with either NULL values or when you omit values for these columns.

In Delta tables, this constraint is rigorously enforced to maintain data integrity. Whenever NULL values are passed to a column defined with NOT NULL, the notebook will raise a DeltaInvariantViolationException error, preventing any invalid data from being stored.

Let us see this in a practical way in both SQL format and Python DeltaTable format.

<<please click on each image for enlarged view>>


























Please leave a comment if you have any doubts or issues while working with NOT NULL Constraints.


Thanks for reading 🙏



No comments:

Post a Comment