Helping a Large Pharmaceutical Company Establish the Optimal Combination of Teradata and Hadoop Architecture for Data Management and Analytics


The client uses Teradata-based infrastructure. It currently takes three days to process data for 10 products and five business units. By 2016, the client expects to have 14 to 16 products and a significant rise in data volumes. Client wanted to explore ways to allow addition of products without adding to the infrastructure costs


ZS proposed a four-step approach to move to a Hadoop-based platform. We then conducted the POC on structured and unstructured data analysis using Hadoop in two phases.

POC—1, Part A: Hadoop for unstructured data processing; capture Twitter feeds and derive insights

POC—1, Part B: Re-create data warehouse processing on a 10-node in-house Hadoop cluster

Key Findings

  • Demonstrated the ability of Big data technologies to handle structured and unstructured data equally well
  • Traditional data warehousing applications can be successfully migrated to Hadoop.
  • Performance gains using Hadoop seen to be 3 to 15 times better than existing Teradata infrastructure, depending on the functionality being considered
  • Amazon EMR (Elastic MapReduce) provides an easy to create and scalable platform that can be used for the processing requirements.
  • Organizations start by using AWS for their processing needs. In the long term, they set up in-house clusters once they acquire skill sets to maintain the same.
  • A whole host of Big Data technologies are available. They need to be selected judiciously based on the requirement.