Views in Pyspark

Similar to SQL Views, the views in PySpark are also virtual tables. This means that they do not store the data physically; instead, they display the result set of a custom SQL SELECT statement.

The PySpark supports three different types of views such as,
  1. Standard Views
  2. Temporary Views
  3. Global Temporary Views
Standard Views: Similar to Standard tables, these views persist on the server without the data and are accessible to all even after the spark cluster restarts.

Let us understand it with examples. For this demonstration, we have created a delta table using Store DataSet used in Tableau.
















Create the view using SQL syntax. (Unlike temporary and global temporary views, standard views cannot be directly created from DataFrames.)










Based on the SHOW VIEW command, it is evident that the view is not a temporary view and is stored in the default database, indicating it's a permanent view. Let's proceed to query the view:










Temporary Views: These views are confined to the scope of the created notebook, which means they are neither visible nor accessible to other notebooks, even if they are connected to the same Spark pool. The lifespan of these views concludes with the closure of the notebook.









Let us proceed to query the view:










Understand these views using the SHOW VIEWS command:

















Given that the scope of these views is limited to the created notebooks, they will not be stored in any database, and the value of the 'Temporary' column is set to True.

Global Temporary Views: These views are scoped to the cluster. This means that they are visible and accessible to all users connected to the same Spark pool or cluster. However, these views are automatically dropped when the notebook that created them is closed or when the cluster itself is closed

Unlike Standard and Temporary views, Global Temporary views are always associated with a system-persisted database called 'global_temp.' This is the reason why Global Temporary views will not be displayed when we use the 'SHOW VIEWS' statement alone. The 'SHOW VIEWS' statement is bound to the current database. To view Global Temporary views, you need to include the 'global_temp' database name in the 'SHOW VIEWS' command.

Similarly, when querying or using these Global Temporary views, it's essential to specify the 'global_temp' database name, as omitting it will result in an 'object not available' error.






























The Complete code used in the article can be found below.

#Sample Stores DF Creation
from pyspark.sql.types import *
from pyspark.sql.functions import *
#DataFrame Creation
stores_DF = spark.read.format("csv")\
                      .option("header","true")\
                      .option("delimiter","\t")\
                      .option("inferSchema","true")\
                      .load("dbfs:/FileStore/Sample_Superstore.txt")
#Replacing the spaces in the column names with "_"
stores_DF = stores_DF.select([col(c).alias(c.replace(' ','_')) for c in stores_DF.columns])
#loading the DF data to a Delta Table
stores_DF.write.mode("append").saveAsTable("storeData")
display(spark.sql("SELECT * FROM StoreData LIMIT 100"))

#View Creation Using SQL
%sql  
CREATE OR REPLACE VIEW vw_Stores
AS
SELECT * FROM storedata LIMIT 100

#Show the View Details
%sql
SHOW VIEWS

#Query the View Using SQL
%sql
SELECT * FROM vw_Stores
#Query the View Using Spark SQL
display(spark.sql("SELECT * FROM vw_Stores"))

###Temporary Views
stores_DF.createOrReplaceTempView("temp_vw_DF_stores")
%sql
SELECT * FROM temp_vw_DF_stores LIMIT 100
%sql
CREATE TEMPORARY VIEW temp_vw_SQL_stores
AS 
SELECT * FROM storedata LIMIT 100
%sql
SELECT * FROM temp_vw_SQL_stores
%sql
SHOW VIEWS

###Global Temporary Views
stores_DF.createGlobalTempView("global_temp_vw_df_stores")
display(spark.sql("SELECT * FROM global_temp.global_temp_vw_df_stores LIMIT 10"))

%sql
CREATE GLOBAL TEMP VIEW global_temp_vw_sql_stores
AS
SELECT * FROM storedata LIMIT 10

%sql
SELECT * FROM global_temp.global_temp_vw_sql_stores

%sql
SHOW VIEWS IN global_temp 

Thanks for reading 🙏. Comment your doubts if any.



1 comment: