My learnings being a Software Engineer: BigData Challenges

Problem: Huge amounts of data are produced and accumulated daily, but large-scale processing of that data on commodity computers is difficult → Big Data is difficult

Commodity Hardware: We have lots of resources (1000s of cheap PCs), but they are very hard to utilize
Parallel Programming: We have clusters with over 10k cores, but it is hard to program 10k concurrent threads
Fault Tolerance: We have 1000s of storage devices, but some may break daily. Failure is a norm rather than an assumption...
Scalable: Scale Up vs Scale Out
Expensive: There are many technologies available in the market for Big Data processing, but are proprietary in nature

Solution:

Hadoop: Runs on commodity hard ware, supports scale out and it is free ware
HDFS(Hadoop Distributed File System): Supports data replication, thereby high availability
MapReduce: Supports parallel execution of tasks

Wednesday, July 24, 2013

BigData Challenges