A story of a Feature Store

Sindhu Murugavel
2 min readMay 4, 2021

Feature Store is the data powerhouse of any Machine Learning Ecosystem.

Photo by Tobias Fischer on Unsplash

Definitions:

Firstly let’s establish some prevalent terms in the ML world.

  1. Feature: A query which would generate a subset of the data with the transformations embedded into it. Eg. “Select all the employees who are over 50 years old” is a feature by itself.
  2. Feature Set: A group of relevant feature queries is called a Feature Set.
  3. Feature Data: The executed output of a Feature query.
  4. Feature Store: A data management product that stores and manages all the feature data.

Capabilities:

So what does an ideal feature store do? What is it good at?

  1. Data Manager: It should be capable of accepting and storing batch and streaming data.
  2. Data Transformer: It should be able to perform data transformations on-demand so that it can be used for training models
  3. Registry: It should be capable of tracking audit information regarding what features were created and when were they created.
  4. Statistics Manager: It should be capable of computing statistics on how the feature data changes and report them.
  5. Feature Server: It should be able to serve the feature data to the models in batch or real-time modes.
  6. Speedy worker: It should be able to perform all its operations at high speed.

Opportunities:

Below are the nice-to-have features for a feature store:

  1. Unstructured data handling: For computer vision ML applications, the source of data are all images — unstructured data.
  2. Data Discovery: Provide a zone for data scientist to play with features and perform discovery to help identify valuable ML usecases.
  3. Data Preparation and Validation: Data in any store is not fully usable unless its prepped and validated to become what we want it to be.
  4. Data Governance: Proper governance to be set up on the data layer so that it can be accessed by the right users.

There are so many companies out there building the ideal feature store. Evaluating each of the products with your usecases to see its detailed capabilities and comparing with your requirements will help you select the right product.

--

--