Creating TFX Pipeline in Vertex AI

What is Vertex AI?

Vertex AI is a Machine Learning (ML) service available on the Google Cloud Platform (GCP). One of its standout features is pipelines, which is a tool for automating, monitoring, and governing ML systems. Pipelines achieve this by orchestrating the ML workflow in a serverless manner and storing the artifacts of the workflow using Vertex ML Metadata. By doing so, it becomes possible to analyze the lineage of workflow artifacts, providing valuable insights into the system’s performance. Overall, Vertex AI’s pipelines component is a powerful tool for optimizing and improving ML workflows on GCP.

TensorFlow Extended (TFX)

  • TFX is a robust platform that facilitates the creation and management of ML workflows in a production setting. Developed using the Google-production-scale ML platform and built on TensorFlow, it offers a configuration framework and shared libraries to efficiently integrate standard components for defining, launching, and monitoring your machine-learning system
  • A TFX pipeline comprises a sequence of components that implement the ML pipeline. It is designed specifically for high-performance and scalable machine learning tasks, encompassing modeling, training, serving inference, and managing deployments across multiple targets

TFX Pipeline Components

  • ExampleGen is the first component of a TFX pipeline. It will ingest and split the data optionally
  • StatisticGen takes input from ExampleGen and calculates statistics for the given input dataset
  • SchemaGen will examine the statistics of StatisticGen and also creates a data schema The
  • ExampleValidator is used to find the anomalies and missing values present in the dataset
  • The Transform performs feature engineering on the ExampleGen dataset and SchemaGen dataset
  • The trainer trains the model by using a dataset obtained from Transform
  • For tuning hyperparameters in the model, we use Tuner
  • To push the models from infrastructure to production first, we need to analyze the training results with deep analysis using an evaluator which makes you validate and export the model
  • The InfraValidator is used to check whether the model is Good or not that push and It prevents push off the bad models
  • The Pusher will deploy the model

blog-image

TFX Libraries

  • For analyzing and validating machine learning data TensorFlow Data Validation (TFDV) library is used. In addition to being highly scalable, TensorFlow and TensorFlow Extended(TFX) are compatible with it
  • For preprocessing TensorFlow Transform (TFT) library is used
  • For training models, TensorFlow is used
  • To evaluate TensorFlow models, TensorFlow Model Analysis (TFMA) library is used
  • The TensorFlow Metadata (TFMD) acts as metadata and it is useful to train machine learning models with the help of TensorFlow
  • To record and retrieve metadata from Machine Learning (ML) developer and data scientist workflows ML Metadata (MLMD) is used. The metadata mostly uses TFMD representations

TFX Libraries Architecture

blog-image

Overview

Create a new project in a google cloud platform and enable the services for Vertex AI and cloud storage API and install Python packages for TFX pipelines to integrate with ML pipelines in Vertex AI. Set up the variables to customize the pipelines by using the GCP project ID, GCP region to run pipelines, and the name of the GCP storage bucket i.e., to store pipeline outputs.

Pre-requisites

  • Google Cloud Storage Bucket
  • Vertex AI and Cloud Storage API

Steps to Create a Pipeline

  • Create a google cloud project using https://console.cloud.google.com/vertex-ai?project
  • Configure the created cloud project with Vertex AI pipelines
  • Enable the APIs of both Vertex AI and Cloud Storage
  • A cloud storage bucket should be configured for pipeline artifacts
  • Go to the Vertex AI workbench and enable Compute Engine API using https://console.cloud.google.com/vertex-ai/workbench/legacy-instances?project
  • Choose the new Colabs notebook by Launching or selecting it
  • Install required Python packages and setup variables related to the project and region
  • Create an ML pipeline and setup the paths for Pipeline artifact, Python modules, User data, Vertex AI endpoint
  • Prepare or Import the dataset
  • Create a TFX pipeline using (Example Gen&Val, Stat Gen, Schema Gen) on Python script in the same notebook
  • Write a pipeline definition and job (Transform) using (Trainer & Pusher) on Python script in the same notebook
  • Run the pipeline on Vertex pipelines by using selective orchestrator (Kubeflow or Apache or LocalDagRunner) Python script in the same notebook
  • Test with a prediction request by setting the endpoint ID

Architecture

blog-image

Vertex AI with TFX Pipeline

Vertex AI and TFX collaboration is tailored to facilitate scalable, high-performance machine learning tasks for enterprises, enabling them to plan and execute superior production deployments. AutoML’s internal components streamline the automation and modeling of datasets, resulting in enhanced models. Additionally, the notebook supports deployment to various target environments, while Vertex ML Metadata enables workflow artifact storage.

About the author

Sandeep Kumar Rongali

I'm a tech enthusiast with a passion for Artificial Intelligence specifically Machine Learning, Deep Learning, and Natural Language Processing (NLP) technologies. I have experience working with platforms such as Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure, giving me a practical understanding of the Machine Learning lifecycle. I am also a regular reader who keeps up-to-date with the latest trends, insights, and developments in my field through various blogs and publications.

Add comment

Welcome to Miracle's Blog

Our blog is a great stop for people who are looking for enterprise solutions with technologies and services that we provide. Over the years Miracle has prided itself for our continuous efforts to help our customers adopt the latest technology. This blog is a diary of our stories, knowledge and thoughts on the future of digital organizations.


For contacting Miracle’s Blog Team for becoming an author, requesting content (or) anything else please feel free to reach out to us at blog@miraclesoft.com.

Who we are?

Miracle Software Systems, a Global Systems Integrator and Minority Owned Business, has been at the cutting edge of technology for over 24 years. Our teams have helped organizations use technology to improve business efficiency, drive new business models and optimize overall IT.

Recent Posts