Course curriculum

  • 2

    Download Resources

    • Download Resources

  • 3

    Introduction to Spark and Spark Architecture Components

  • 4

    Spark Execution

    • Spark Jobs

    • Spark Stages

    • Spark Tasks

    • Practical Demonstration of Jobs, Tasks and Stages

  • 5

    Spark SQL, DataFrames and Datasets

    • Spark RDD (Create and Display Practical)

    • Spark Dataframe (Create and Display Practical)

      FREE PREVIEW
    • Anonymus Functions in Scala

    • Extra (Optional on Spark DataFrame)

    • Extra (Optional on Spark DataFrame) in Details

    • Spark Datasets (Create and Display Practical)

    • Caching

    • Notes on reading files with Spark

    • Data Source CSV File

      FREE PREVIEW
    • Data Source JSON File

    • Data Source LIBSVM File

    • Data Source Image File

      FREE PREVIEW
    • Data Source Arvo File

    • Data Source Parquet File

    • Untyped Dataset Operations (aka DataFrame Operations)

    • Running SQL Queries Programmatically

    • Global Temporary View

    • Creating Datasets

    • Scalar Functions (Built-in Scalar Functions) Part 1

    • Scalar Functions (Built-in Scalar Functions) Part 2

    • Scalar Functions (Built-in Scalar Functions) Part 3

    • User Defined Scalar Functions

  • 6

    Spark RDD

    • Operation in Apache Spark

    • Transformations

    • map(function)

    • filter(function)

    • flatMap(function)

    • mapPartitions(func)

    • mapPartitionsWithIndex(func)

    • sample(withReplacement, fraction, seed)

    • union(otherDataset)

    • intersection(otherDataset)

    • distinct([numPartitions]))

    • groupby(func)

    • groupByKey([numPartitions])

    • reduceByKey(func, [numPartitions])

    • aggregateByKey(zeroValue)(seqOp, combOp, [numPartitions])

    • sortByKey([ascending], [numPartitions])

    • join(otherDataset, [numPartitions])

    • cogroup(otherDataset, [numPartitions])

    • cartesian(otherDataset)

    • coalesce(numPartitions)

    • repartition(numPartitions)

    • repartitionAndSortWithinPartitions(partitioner)

    • Wide vs. Narrow Transformations

    • Actions

    • reduce(func)

    • collect()

    • count()

    • first()

    • take(n)

    • takeSample(withReplacement, num, [seed])

    • takeOrdered(n, [ordering])

    • countByKey()

    • foreach(func)

    • Shuffling

    • Persistence (Cache)

    • Unpersist

    • Broadcast Variables

    • Accumulators

    • Important Lecture