Are you thinking about implementing the Informatica Data Catalog for your organization? Curious about the costs involved? This article will guide you through the total cost of ownership of a data catalog and explore the specifics of Informatica Data Catalog pricing.
All Informatica products follow their IPU-based (Informatica Processing Unit) pricing model.
The exact definition of an IPU, including the amount and type of compute and storage it represents, is unclear. From the AWS Marketplace listing of the Informatica Intelligent Data Management Cloud, we know that 120 IPUs per month cost USD 129,600 annually. The best approach is to gather bits and pieces of information from various sources to form a consistent view of the pricing.
On the AWS Marketplace, the cost of spinning up Informatica for up to 50 metadata resources is USD 100,000 per year. Another source mentions that the subscription for 100 users can start at USD 150,000 annually.
CDW, a Fortune 500 full-stack technology implementation company, lists Informatica’s licensing fees for up to 1800 metadata resources at USD 531,149.99. This cost is just for licensing and does not include cloud compute or infrastructure components.
None of these pricing references come directly from Informatica’s official page. The prices we found range from a hundred thousand dollars to over half a million dollars.
In contrast, Metafoundry DataHub offers a different approach and pricing model, which may provide more clarity and cost-effectiveness for your data catalog needs.
Although licensing is the most obvious cost, we need to zoom out a little to understand the total cost of a data catalog.
Implementing a data catalog means that users across your organization will use the catalog to find, trust, and share data assets daily. This requires the catalog to interface with all data-related components of your data architecture. Additionally, all data users must be onboarded to ensure a consistent experience for accessing and using data.
Several aspects of pricing need to be considered when evaluating a data catalog:
- Base licensing and hosting costs
- Implementation costs
- Training costs
- Ongoing support and maintenance costs
Let's explore each aspect further.
1. Base Licensing and Hosting Costs
The base pricing of data catalogs usually involves a mix of variables, such as the number of active users, administrators, connectors, APIs, and storage. Informatica Data Catalog also charges separately for different services it provides, such as:
- Storage of data assets in the catalog
- Metadata record consumption
- Scanner usage for profiling, discovery, and classification
- Scanner usage with an advanced serverless option
Informatica’s data catalog tracks metrics, such as daily assets stored, compute units, and the number of API calls. For example:
- Scaler: Daily Assets Stored
- Metric: Per One Hundred Thousand Assets
- IPU Per Metric Unit: 0.83 for the first 500,000 assets and 0.067 for > 500,000 assets
Each function and service has a separate pricing structure. You can find the finer details in Informatica’s CDS & PDS document.
One user who had a positive experience with the data catalog mentioned that Informatica "... is a resource-intensive tool and costly too." Another user noted that "the pricing is a bit on the higher end." Additionally, users have faced challenges with data lineage and the product’s overall performance.
2. Implementation Costs
Setting up a data catalog can be time-consuming, taking anywhere from weeks to months of engineering effort, depending on the integrations and customizations needed for your business.
Informatica doesn’t offer a DIY implementation option, so you might need experienced professionals (typically system integrators) to do it for you. This not only increases the initial cost but also creates a dependency on third parties for implementation.
3. Training Costs
Once the tool is in place, training all data users across the organization is necessary. Training costs vary greatly depending on the complexity of the tool and how long it takes for a typical data user to get onboarded and start benefiting from the data catalog.
Tools without clear documentation and guides may require continuous training as issues arise and solutions are sought.
4. Ongoing Support and Maintenance Costs
Data catalog implementations can become unmanageable, especially when the DIY setup isn’t easy or there isn’t sufficient documentation. This is where assistance from the service provider becomes necessary.
One user described their problem with the Informatica data catalog, stating it "crashes often, resulting in prolonged downtimes. Searching the catalog can be slow, even if cluster recommendations comply with Informatica’s guidelines. Scanner jobs often fail even with container sizes specified per Informatica’s recommendations."
These issues increase dependency on the support team, costing more time, effort, and money. This complexity can quickly escalate with the number of integrations and resources managed with Informatica.
In contrast, MetaFoundry.io offers a different approach and pricing model, which may provide more clarity and cost-effectiveness for your data catalog needs.
Although licensing is the most obvious cost, we need to zoom out a little to understand the total cost of a data catalog.
Implementing a data catalog means that users across your organization will use the catalog to find, trust, and share data assets daily. This requires the catalog to interface with all data-related components of your data architecture. Additionally, all data users must be onboarded to ensure a consistent experience for accessing and using data.
Several aspects of pricing need to be considered when evaluating a data catalog:
- Base licensing and hosting costs
- Implementation costs
- Training costs
- Ongoing support and maintenance costs
Let's explore each aspect further.
1. Base Licensing and Hosting Costs
The base pricing of data catalogs usually involves a mix of variables, such as the number of active users, administrators, connectors, APIs, and storage. Informatica Data Catalog also charges separately for different services it provides, such as:
- Storage of data assets in the catalog
- Metadata record consumption
- Scanner usage for profiling, discovery, and classification
- Scanner usage with an advanced serverless option
Informatica’s data catalog tracks metrics, such as daily assets stored, compute units, and the number of API calls. For example:
- Scaler: Daily Assets Stored
- Metric: Per One Hundred Thousand Assets
- IPU Per Metric Unit: 0.83 for the first 500,000 assets and 0.067 for > 500,000 assets
Each function and service has a separate pricing structure. You can find the finer details in Informatica’s CDS & PDS document.
One user who had a positive experience with the data catalog mentioned that Informatica "... is a resource-intensive tool and costly too." Another user noted that "the pricing is a bit on the higher end." Additionally, users have faced challenges with data lineage and the product’s overall performance.
2. Implementation Costs
Setting up a data catalog can be time-consuming, taking anywhere from weeks to months of engineering effort, depending on the integrations and customizations needed for your business.
Informatica doesn’t offer a DIY implementation option, so you might need experienced professionals (typically system integrators) to do it for you. This not only increases the initial cost but also creates a dependency on third parties for implementation.
3. Training Costs
Once the tool is in place, training all data users across the organization is necessary. Training costs vary greatly depending on the complexity of the tool and how long it takes for a typical data user to get onboarded and start benefiting from the data catalog.
Tools without clear documentation and guides may require continuous training as issues arise and solutions are sought.
4. Ongoing Support and Maintenance Costs
Data catalog implementations can become unmanageable, especially when the DIY setup isn’t easy or there isn’t sufficient documentation. This is where assistance from the service provider becomes necessary.
One user described their problem with the Informatica data catalog, stating it "crashes often, resulting in prolonged downtimes. Searching the catalog can be slow, even if cluster recommendations comply with Informatica’s guidelines. Scanner jobs often fail even with container sizes specified per Informatica’s recommendations."
These issues increase dependency on the support team, costing more time, effort, and money. This complexity can quickly escalate with the number of integrations and resources managed with Informatica.
In contrast, MetaFoundry.io offers a different approach and pricing model, which may provide more clarity and cost-effectiveness for your data catalog needs.
Collibra is a data intelligence platform that offers a data catalog, data governance tools, and other features. While these tools are valuable, Collibra's price point can be prohibitive.
Pricing a data catalog involves several factors. Costs can vary significantly based on business needs, which is why straightforward pricing sheets are rare. Moreover, as with any major software purchase, the final price includes more than just the cost of licenses.
So, how do you gauge Collibra pricing? Let’s break it down step-by-step.
How Much Does Collibra Cost Per Year?
Although data catalog pricing can be challenging to find, some baseline information is available for Collibra. A subscription for Collibra intelligence platform is listed at $170,000 annually on the AWS marketplace.
Collibra Pricing: Examples of Hidden Costs
When considering other factors such as personnel and implementation costs, the total operating costs will be higher. Here’s how:
- Personnel Costs: A significant portion of the costs involved in implementing a data catalog comes from importing data from a variety of complex sources and enriching metadata. For a large organization with a diverse data estate, these costs can easily amount to six times the cost of the tooling itself.
- Implementation Costs: Some users have reported challenges in getting Collibra up and running due to a shortage of systems integration (SI) partners familiar with the extensive tool suite. Collibra requires an SI expert familiar with the platform for successful setup. For large organizations, this cost can exceed $1 million.
Several factors need to be considered when evaluating the total cost of implementing a data catalog like Collibra. Let’s explore the most significant factors in the next section.
The cost of a data catalog is more than just a price tag. Data catalogs are systems that interface with an organization’s entire data ecosystem.
To understand the cost of a data catalog, consider the following factors together:
- Upfront licensing costs
- Hosting costs
- Implementation costs
- Personnel costs
- Ongoing maintenance and troubleshooting costs
Only with this complete picture of the total cost of ownership can you decide whether it brings enough value relative to the expense.
Let's get into the specifics:
1. Upfront Licensing Costs
Licensing fee structures vary depending on the service provider. These usually depend on several factors, including but not limited to:
- Number of ordinary users
- Number of administrative users
- Number of data connectors used to connect to your data sources
- Additional features considered "add-ons" by the vendor
For a tool like Collibra, the base licensing cost typically represents around 30-50% of the total licensing cost, with data connector and user fees comprising the remainder.
Collibra's licensing costs can add up as different features are priced separately. For example, separate licensing fees may be required for governance, lineage, and data quality. Additionally, connectors in Collibra vary in price based on the type of connector, and obtaining lineage from a data source with a lineage harvester may incur an extra cost.
2. Hosting Costs
There are three basic ways to consume a data catalog:
- As a Software as a Service (SaaS) offering
- As a cloud-based catalog deployed on general cloud service providers like Amazon Web Services (AWS) or Google Cloud Platform (GCP)
- As an on-premises catalog, hosted in your own data centers
For cloud or on-premises computing, a data catalog requires dedicated storage and computation. Depending on your hosting service, there may be data egress costs when transferring between systems. On-premises installations also incur additional capital expenses for server racks and other hardware.
For SaaS offerings, hosting costs are generally included in the licensing cost, though additional charges may apply for exceeding capacity limits (e.g., a certain amount of GB of data egress or metadata storage).
Collibra runs as a SaaS product, with hosting costs built into the core licensing fees. While cloud support seems solid, some reviewers have noted that Collibra's support for on-premises use cases needs improvement.
3. Implementation Costs
Many legacy data catalogs are notoriously difficult to install, often requiring specialized system integrators (SIs) who don't come cheap.
If a catalog takes months to set up or doesn't integrate properly with your data systems, the lost time and effort quickly become expensive. Therefore, it's crucial to deploy a data catalog that supports easy, DIY installation.
4. Personnel Costs
The personnel costs associated with implementing a data catalog can be as much as six times the base licensing costs. This includes:
- Fully integrating the data catalog into an organization’s existing architecture
- Connecting the organization’s numerous complex data sources to the data catalog
- Assembling a team of data stewards to enrich the metadata (the largest of the three costs)
Some public reviews indicate that Collibra makes this more difficult. One user mentioned that "setting up the tool and seeing results takes a significant amount of time and manual effort." The lack of available consultants to assist with setup also increases the difficulty of adoption.
Once the tool is in place, training all data users across the organization is necessary. Training costs vary greatly depending on the complexity of the tool and how long it takes for a typical data user to get onboarded and start benefiting from the data catalog. Tools without clear documentation and guides may require continuous training, occupying a significant portion of an employee’s time.
5. Ongoing Maintenance and Troubleshooting Costs
Every software system encounters issues, and a data catalog is no different. Your data engineering team needs a stable, low-defect data catalog with clear, easy-to-use tools for diagnosing problems such as system integration or data source connection issues.
If documentation is unclear or support isn’t available when needed, valuable uptime can be lost. Additionally, if the system is hard to troubleshoot and debug (e.g., it doesn't emit clear and verbose error messages, lacks dashboarding to track metrics, or has unclear log access), engineers may face a growing backlog of support tickets that take days to resolve.
By considering all these factors, you can make a more informed decision about whether Collibra or another data catalog solution is the right fit for your organization.
MetaFoundry
Copyright © 2024 MetaFoundry - All Rights Reserved.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.