Data Science with Python and Spark

Durée totale
Formateur
Jeroen Janssens

Data Science with Python and Spark

Data Science Workshops B.V.
Logo Data Science Workshops B.V.
Note du fournisseur: starstarstarstarstar 9,5 Data Science Workshops B.V. a une moyenne de 9,5 (basée sur 50 avis)
Best provider of the NetherlandsGagnant du prix "Meilleur fournisseur d'apprentissage des Pays-Bas 2020": #2 BovnlWinnerType. overall trainer. Read here about these awards.

Astuce: besoin de plus d'informations sur la formation? Téléchargez la brochure!

9,3
Note moyenne de Data Science with Python and Spark
Basé sur 7 commentaires Lire tous les commentaireschevron_right
Mateusz Wiacek
starstarstarstarstar
Mateusz Wiacek
Head of Training
10
Data Science with Python and Spark

"Jeroen delivered this as a 3-day training to Textkernel in May 2019. No doubt -- this is a 10 out of 10! He is very knowledgeable about the subject matter, has great interactive teaching style, great balance between explaining and practising, includes a lot of hand-on exercises and covers all from low-level to high-level APIs help to understand the logic behind it. Very clear, structured explanations. Highly recommended for beginners and advanced!" - 07/12/2020 14:58

"Jeroen delivered this as a 3-day training to Textkernel in May 2019. No doubt -- this is a 10 out of 10! He is very knowledgeable about the … lire plus - 07/12/2020 14:58

Dates et lieux de début

Il n'y a pas de dates de débuts connues pour ce produit.

Description

Introduction

Apache Spark is an open-source distributed engine for querying and processing data. In this three-day hands-on workshop, you will learn how to leverage Spark from Python to process large amounts of data.

After a presentation of the Spark architecture, we'll begin manipulating Resilient Distributed Datasets (RDDs) and work our way up to Spark DataFrames. The concept of lazy execution is discussed in detail and we demonstrate various transformations and actions specific to RDDs and DataFrames. You'll learn how DataFrames can be manipulated using SQL queries.

We'll show you how to apply supervised machine learning models such as linear regression, logistic regression, decision tree…

Lisez la description complète ici

Foire aux questions (FAQ)

Il n'y a pour le moment aucune question fréquente sur ce produit. Si vous avez besoin d'aide ou une question, contactez notre équipe support.

Vous n'avez pas trouvé ce que vous cherchiez ? Voir aussi : Data Science, Python, Data privacy, Data management et Big data.

Introduction

Apache Spark is an open-source distributed engine for querying and processing data. In this three-day hands-on workshop, you will learn how to leverage Spark from Python to process large amounts of data.

After a presentation of the Spark architecture, we'll begin manipulating Resilient Distributed Datasets (RDDs) and work our way up to Spark DataFrames. The concept of lazy execution is discussed in detail and we demonstrate various transformations and actions specific to RDDs and DataFrames. You'll learn how DataFrames can be manipulated using SQL queries.

We'll show you how to apply supervised machine learning models such as linear regression, logistic regression, decision trees, and random forests. You'll also see unsupervised machine learning models such as PCA and K-means clustering.

By the end of this workshop, you will have a solid understanding of how to process data using PySpark and you will understand how to use Spark's machine learning library to build and train various machine learning models.

What you'll learn

  • Learn about Apache Spark and the Spark architecture and its components
  • Work with RDDs and lazy evaluation
  • Build and interact with Spark DataFrames using Spark SQL
  • Use Spark SQL and DataFrames to process data using traditional SQL queries
  • Apply a spectrum of supervised and unsupervised machine learning algorithms
  • Handle issues related to feature engineering, class imbalance, bias and variance, and cross validation for building a model

This workshop is for you because

  • You work with data regularly and want to be able to scale up the quantity of data processed
  • You want to understand the methods specific to Spark for wrangling data
  • You want to learn how to apply machine learning algorithms to large amounts of data

Schedule

Day 1:

  • Introduction to Apache Spark
    • Setting up Spark
    • Spark fundamentals
    • Spark architecture
  • Resilient Distributed Datasets (RDDs)
    • Getting data into Spark
    • Actions
    • Transformations

Day 2:

  • Spark DataFrames
    • Speeding up Spark with DataFrames
    • Creating DataFrames
    • Interoperating with RDDs
    • Working with the DataFrame API
    • Applying SQL to Spark DataFrames

Day 3:

  • ML and MLLib packages
    • API Overview
    • Transformers
    • Estimators
    • Pipelines
  • Applying Machine Learning
    • Model selection
    • Cross validation
    • Tuning
    • Classification
    • Regression
    • Recommender system
  • Where to go from here

Prerequisites

Participants are expected to be familiar with the following Python syntax and concepts:

  • assignment, arithmetic, boolean expression, tuple unpacking
  • bool, int, float, list, tuple, dict, str, type casting
  • in operator, indexing, slicing
  • if, elif, else, for, while
  • range(), len(), zip()
  • def, (keyword) arguments, default values
  • import, import as, from import ...
  • lambda functions, list comprehension
  • JupyterLab or Jupyter Notebook

Some experience with Pandas and SQL is useful, but not required.

Recommended preparation

Participants are kindly requested to have the following items installed prior to the start of the workshop:

  • Docker Desktop for Windows or for Mac or for Ubuntu
  • The docker image, by running: docker pull jupyter/pyspark-notebook

More detailed installation instructions will be provided by email after signup.

Clients

I’ve previously delivered this workshop at:

  • KPN ICT Consulting
  • ProRail
  • Textkernel

Testimonials

"Our DataLab team enjoyed a three-day PySpark course from Jeroen. Jeroen's approach is personal and professional. I recommend Data Science Workshops to anyone in the field of data science."

--Laurens Koppenol, Lead Data Scientist, ProRail

9,3
Note moyenne de Data Science with Python and Spark
Basé sur 7 commentaires
Mateusz Wiacek
starstarstarstarstar
Mateusz Wiacek
Head of Training
10
Data Science with Python and Spark

"Jeroen delivered this as a 3-day training to Textkernel in May 2019. No doubt -- this is a 10 out of 10! He is very knowledgeable about the subject matter, has great interactive teaching style, great balance between explaining and practising, includes a lot of hand-on exercises and covers all from low-level to high-level APIs help to understand the logic behind it. Very clear, structured explanations. Highly recommended for beginners and advanced!" - 07/12/2020 14:58

"Jeroen delivered this as a 3-day training to Textkernel in May 2019. No doubt -- this is a 10 out of 10! He is very knowledgeable about the … lire plus - 07/12/2020 14:58

Davey Witter
starstarstarstarstar_border
Davey Witter
IT Consultant
8
Data Science with Python and Spark

"Zeer enthousiast bij het geven van de workshop. De workshop was gestructureerd opgebouwd. Van eenvoudige materie on een vloeiende overgang naar complexe materie op een simpele manier uitgelegd waardoor deze goed te volgen was.
De combinatie tussen praktijk voorbeelden en theoretische uitleg maakte de materie concreet waardoor de complexe materie goed te volgen was.
De kennis en workshop was up-to-date en maakte de workshop zeer interessant. " - 14/11/2020 08:34

"Zeer enthousiast bij het geven van de workshop. De workshop was gestructureerd opgebouwd. Van eenvoudige materie on een vloeiende overgang n… lire plus - 14/11/2020 08:34

starstarstarstarstar_half
Kellner
9
Data Science with Python and Spark

"I really enjoyed Jeroen's workshop, he explained the Spark basics (RDDs, dataframes,transformers,estimators,etc.) very well and the class included a lot of hands-on exercises (including building ML models). The training was given in-house in our company's office on three separate days." - 06/11/2020 15:41

"I really enjoyed Jeroen's workshop, he explained the Spark basics (RDDs, dataframes,transformers,estimators,etc.) very well and the class in… lire plus - 06/11/2020 15:41

Marissa Helmich
starstarstarstarstar
Marissa Helmich
Senior Data Scientist
10
Data Science with Python and Spark

"Ik heb de cursus Data Science with Python and Spark in 2019 gevolgd bij Jeroen Janssens. Jeroen weet complexe informatie op een begrijpelijke manier over te brengen en combineerd in zijn trainingen theorie met hands-on opdrachten. Op die manier kun je aan het eind van de dag echt nieuwe vaardigheden toepassen. Ook de andere cursussen van datascienceworkshops.com kan ik trouwens ten zeerste aanraden!" - 05/11/2020 10:22

"Ik heb de cursus Data Science with Python and Spark in 2019 gevolgd bij Jeroen Janssens. Jeroen weet complexe informatie op een begrijpelijk… lire plus - 05/11/2020 10:22

starstarstarstarstar_border
Eike Dehling
Research Engineer
8
Data Science with Python and Spark

"Goede professionele training, veel handige dingen geleerd deze workshop. De workshop was wel echt gericht op beginners, mensen die spark nog niet kennen. Er waren veel praktische opdrachten, echt learning by doing. Jeroen heb ik als een prettige trainer ervaren, hij weet ieders aandacht te houden. Aanrader om deze training te volgen, je zal er zeker iets van leren." - 03/11/2020 16:39

"Goede professionele training, veel handige dingen geleerd deze workshop. De workshop was wel echt gericht op beginners, mensen die spark nog… lire plus - 03/11/2020 16:39

starstarstarstarstar
Anne-Marie Dekkers
Data Scientist
10
Data Science with Python and Spark

"Even experienced data scientists need to keep working on their skills and knowledge. For the past half a year, Data Science Workshops has come to our office once a month, to teach us about a variety of topics, ranging from NoSQL to t-SNE. This is a great way to stay fresh and look beyond the tools and techniques that you’re already familiar with." - 31/10/2020 12:53

"Even experienced data scientists need to keep working on their skills and knowledge. For the past half a year, Data Science Workshops has co… lire plus - 31/10/2020 12:53

starstarstarstarstar
Laurens Koppenol
Lead Data Scientist, ProRail
10
Data Science with Python and Spark

"Our DataLab team at ProRail enjoyed a three-day hands-on PySpark course from Jeroen. Jeroen’s approach is personal and professional. I recommend Data Science Workshops to anyone in the field of data science." - 21/07/2020 13:25

"Our DataLab team at ProRail enjoyed a three-day hands-on PySpark course from Jeroen. Jeroen’s approach is personal and professional. I recom… lire plus - 21/07/2020 13:25

Jeroen Janssens
Jeroen Janssens
Principal Instructor
9,5

Il n'y a pour le moment aucune question fréquente sur ce produit. Si vous avez besoin d'aide ou une question, contactez notre équipe support.

Recevoir une brochure d'information (gratuit)

(optionnel)
(optionnel)
(optionnel)
(optionnel)
(optionnel)

Vous avez des questions?

(optionnel)
Nous conservons vos données personnelles dans le but de vous accompagner par email ou téléphone.
Vous pouvez trouver plus d'informations sur : Politique de confidentialité.