PySpark: sort() vs orderBy()

Just like the SQL Server 'ORDER BY' clause, PySpark provides the 'orderBy()' and 'sort()' functions to sort data within RDDs and DataFrames. Since PySpark provides two functions for the same functionality, there is no difference between these two functions. Perhaps, the 'sort()' method is an alias for the 'orderBy()' method.

The syntax for these functions is as follows:

'orderBy()'
                    Syntax: DataFrame.orderBy(*columns, arguments)

'sort()'
                    Syntax: DataFrame.sort(*columns, arguments)

As you can see, the syntax for both functions is also the same.

Now, let's explore the different ways to use these functions using the below dataframe.
















Sorting based on a single column:


















Soring based on Multiple Columns:


















Using Other variations:


































Conclusion:


In this article, we have learned about sort() and orderBy() methods of Pysparks dataframe and their difference.

please check below the complete code used in this article.
 
data = [(1,"English"),(2,"Hindi"),(3,"Urdu"),(4,"Tamil"),(5,"English"),(6,"Hindi"),(7,"Urdu"),(8,"Marati"),(9,"Gujarati"),(10,"Panjabi")]
schema =['Id',"Language"]
df =spark.createDataFrame(data,schema)
df.show()

df.orderBy("Language",ascending=False).show()

df.sort("Language",ascending = False).show()

df.sort("Language","Id", ascending = [True,False]).show()
## Your can write the above code like below as well
df.sort(["Language","Id"], ascending=[True,False]).show()
## Your can write the above code like below as well
df.sort(["Language","Id"], ascending=[1,0]).show()

df.orderBy("Language","Id", ascending = [True,False]).show()
## Your can write the above code like below as well
df.orderBy(["Language","Id"], ascending=[True,False]).show()
## Your can write the above code like below as well
df.orderBy(["Language","Id"], ascending=[1,0]).show()

df.sort(df.Language.desc(),df.Id.asc()).show()

df.orderBy(df.Language.desc(),df.Id.asc()).show()

from pyspark.sql.functions import *
df.sort(asc("Language"),desc("Id")).show()

from pyspark.sql.functions import asc, desc
df.orderBy(asc("Language"),desc("Id")).show()



No comments:

Post a Comment