StreamSets Debuts First Solution to Discover, Secure and Govern Personal Data in Motion

StreamSets Data Protector™ Catches Sensitive Data “In Flight” to Reduce Risk and Help Companies Comply with GDPR and Other Privacy Requirements

| Source: StreamSets

StreamSets Data Protector

Anonymization SDP

StreamSets Data Protector detects and obfuscates sensitive data at the point of ingestion, before data is stored.


SAN FRANCISCO, March 06, 2018 (GLOBE NEWSWIRE) -- StreamSets Inc., provider of the industry’s only enterprise DataOps platform, announced today immediate availability of the industry’s first solution to discover, secure and govern personal identifiable information (PII) while “in flight” — as it arrives from a batch or streaming data source or moves between compute platforms. Designed with data privacy regulations in mind, StreamSets Data Protector reduces risk of expensive and embarrassing violations by helping companies meet requirements for GDPR, HIPAA and other compliance regimes.

Until now, solutions for handling personal data have relied on “after the fact” scanning of data stores which, while valuable, can only discover sensitive data once it lands and potentially has already been shared. Companies are missing the opportunity to encrypt, mask, generalize or discard personal data as it arrives rather than storing it in the clear. 

StreamSets Data Protector extends protection to the point of initial data ingestion, leveraging unique Dataflow Sensors that are part of StreamSets Data Collector. These sensors discover PII by comparing incoming data to built-in patterns such as national ID, tax ID or driver license numbers, bank account or credit card numbers, or IP addresses, or additional patterns created by the customer.

Without the automation StreamSets provides, laborious hand-coding is required to continuously check each data source against dozens or hundreds of PII patterns. This approach becomes impossible, especially as unstructured data and data drift — unexpected changes to the structure and semantics of the incoming data — come to the fore.

“Data protection is crucial in today’s increasingly regulated environment, where numerous rulesets apply and violations bring heavy fines and the potential of brand damage,” said Girish Pancha, CEO of StreamSets. “Current solutions are insufficient, as they only deal with data after it has already landed, and are blind to data drift that can add new PII to the mix. StreamSets Data Protector closes this compliance gap by extending policy-based control over sensitive data out to the point of ingestion, while gracefully handling data drift whenever it occurs.”

StreamSets Data Protector gives enterprises an automatic, centralized and data drift-resistant way to implement data protection policies across all inbound pipelines. The key capabilities of StreamSets Data Protector are to discover sensitive data, secure it “in flight” and provide centralized governance to ensure continuous policy compliance:

  • Discover — Dataflow Sensors detect sensitive data as it arrives. Incoming data is checked against hundreds of built-in identifiers or patterns defined in enterprise data catalogs.  Enterprises can also customize protection by designing their own identifiers.
  • Secure — Once sensitive data is detected, processors can perform a number of standardized operations such as the application of reversible or irreversible obfuscation algorithms, and also take actions such as route, filter, quarantine or alert.
  • Govern — Enterprise-wide policies are centrally managed and applied to pipelines while audit reports trace where personal data came from and how it has been handled. It includes the concept of Security Zones that allow security architects to design defense-in-depth strategies around data. It complements data governance solutions for data at rest, integrating with catalogs such as Alation, Apache Atlas, Cloudera Navigator, IBM Information Governance and Waterline Data.

“We’re excited to continue to work with StreamSets to deliver Cloudera’s industry-leading  modern platform for machine learning and analytics optimized for the cloud,” said Eddie Garcia, chief security officer at Cloudera. “StreamSets Data Protector is yet another layer of defense, helping companies build robust dataflow pipelines that immediately detect and secure sensitive information to ensure it doesn’t get into the wrong hands. StreamSets’ direct integration with Cloudera Navigator uniquely enables us to deliver comprehensive, secure and compliant architectures required for meeting a wide range of regulations, including GDPR.”

About StreamSets
StreamSets provides the industry’s only DataOps platform that enables companies to build, execute, operate and protect the dataflows that drive pervasive analytics.  It combines award-winning open source software featuring Dataflow Sensors that uniquely handle data drift with a cloud-native control plane that helps enterprises manage their data movement as a continuous ingestion practice. Founded by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, a former engineering leader at Cloudera, StreamSets is backed by top-tier Silicon Valley venture capital firms, including Battery Ventures, New Enterprise Associates (NEA), and Accel Partners. For more information, visit

StreamSets and associated marks and trademarks are registered trademarks of StreamSets Inc. All other company and product names may be trademarks of their respective owners.

Media Contact:
BOCA Communications

A photo accompanying this announcement is available at