Creating TFX Pipeline in Vertex AI

What is Vertex AI?

Vertex AI is a Machine Learning (ML) service available on the Google Cloud Platform (GCP). One of its standout features is pipelines, which is a tool for automating, monitoring, and governing ML systems. Pipelines achieve this by orchestrating the ML workflow in a serverless manner and storing the artifacts of the workflow using Vertex ML Metadata. By doing so, it becomes possible to analyze the lineage of workflow artifacts, providing valuable insights into the system’s performance. Overall, Vertex AI’s pipelines component is a powerful tool for optimizing and improving ML workflows on GCP.

TensorFlow Extended (TFX)

TFX is a robust platform that facilitates the creation and management of ML workflows in a production setting. Developed using the Google-production-scale ML platform and built on TensorFlow, it offers a configuration framework and shared libraries to efficiently integrate standard components for defining, launching, and monitoring your machine-learning system
A TFX pipeline comprises a sequence of components that implement the ML pipeline. It is designed specifically for high-performance and scalable machine learning tasks, encompassing modeling, training, serving inference, and managing deployments across multiple targets

TFX Pipeline Components

ExampleGen is the first component of a TFX pipeline. It will ingest and split the data optionally
StatisticGen takes input from ExampleGen and calculates statistics for the given input dataset
SchemaGen will examine the statistics of StatisticGen and also creates a data schema The
ExampleValidator is used to find the anomalies and missing values present in the dataset
The Transform performs feature engineering on the ExampleGen dataset and SchemaGen dataset
The trainer trains the model by using a dataset obtained from Transform
For tuning hyperparameters in the model, we use Tuner
To push the models from infrastructure to production first, we need to analyze the training results with deep analysis using an evaluator which makes you validate and export the model
The InfraValidator is used to check whether the model is Good or not that push and It prevents push off the bad models
The Pusher will deploy the model

TFX Libraries

For analyzing and validating machine learning data TensorFlow Data Validation (TFDV) library is used. In addition to being highly scalable, TensorFlow and TensorFlow Extended(TFX) are compatible with it
For preprocessing TensorFlow Transform (TFT) library is used
For training models, TensorFlow is used
To evaluate TensorFlow models, TensorFlow Model Analysis (TFMA) library is used
The TensorFlow Metadata (TFMD) acts as metadata and it is useful to train machine learning models with the help of TensorFlow
To record and retrieve metadata from Machine Learning (ML) developer and data scientist workflows ML Metadata (MLMD) is used. The metadata mostly uses TFMD representations

TFX Libraries Architecture

Overview

Create a new project in a google cloud platform and enable the services for Vertex AI and cloud storage API and install Python packages for TFX pipelines to integrate with ML pipelines in Vertex AI. Set up the variables to customize the pipelines by using the GCP project ID, GCP region to run pipelines, and the name of the GCP storage bucket i.e., to store pipeline outputs.

Pre-requisites

Google Cloud Storage Bucket
Vertex AI and Cloud Storage API

Steps to Create a Pipeline

Create a google cloud project using https://console.cloud.google.com/vertex-ai?project
Configure the created cloud project with Vertex AI pipelines
Enable the APIs of both Vertex AI and Cloud Storage
A cloud storage bucket should be configured for pipeline artifacts
Go to the Vertex AI workbench and enable Compute Engine API using https://console.cloud.google.com/vertex-ai/workbench/legacy-instances?project
Choose the new Colabs notebook by Launching or selecting it
Install required Python packages and setup variables related to the project and region
Create an ML pipeline and setup the paths for Pipeline artifact, Python modules, User data, Vertex AI endpoint
Prepare or Import the dataset
Create a TFX pipeline using (Example Gen&Val, Stat Gen, Schema Gen) on Python script in the same notebook
Write a pipeline definition and job (Transform) using (Trainer & Pusher) on Python script in the same notebook
Run the pipeline on Vertex pipelines by using selective orchestrator (Kubeflow or Apache or LocalDagRunner) Python script in the same notebook
Test with a prediction request by setting the endpoint ID

Architecture

Vertex AI with TFX Pipeline

Vertex AI and TFX collaboration is tailored to facilitate scalable, high-performance machine learning tasks for enterprises, enabling them to plan and execute superior production deployments. AutoML’s internal components streamline the automation and modeling of datasets, resulting in enhanced models. Additionally, the notebook supports deployment to various target environments, while Vertex ML Metadata enables workflow artifact storage.

Creating TFX Pipeline in Vertex AI

What is Vertex AI?

TensorFlow Extended (TFX)

TFX Pipeline Components

TFX Libraries

TFX Libraries Architecture

Overview

Pre-requisites

Steps to Create a Pipeline

Architecture

Vertex AI with TFX Pipeline

About the author

Sandeep Kumar Rongali

Add comment

Cancel reply

Welcome to Miracle's Blog

Who we are?

Recent Posts

What is Vertex AI?

TensorFlow Extended (TFX)

TFX Pipeline Components

TFX Libraries

TFX Libraries Architecture

Overview

Pre-requisites

Steps to Create a Pipeline

Architecture

Vertex AI with TFX Pipeline

About the author

Sandeep Kumar Rongali

Add comment

Cancel reply

Read more

Welcome to Miracle's Blog

Who we are?

Recent Posts