The word “Big Data”, is almost self-explanatory and easily can be inferred it is related to a huge amount of data. But, it usually applies when we are talking in IT industry and more specifically in web technologies. The most famous users of “big data” technologies, can be guessed easily. That’s right: The search engines like Google and Bing. It is good to know that almost there are just this 2 search engines and other famous websites which have search engine (like yahoo.com or msn.com or etc), just use one of these two search engines. In this article, I want to talk about commercial technologies which used in “Big Data” industry.
One of most famous technologies is “Hadoop”. Hadoop is a large environment or echosystem which includes lots of sub-systems or sub-environment like: Apache Hive, Apache Pig, Impala, Apache Mahout, Zookeeper, Apache HBase and etc.
In big data, the data will store in distributed data storage units and it is not like usual SQL based databases which data stores in one place. The Hadoop use “MapReduce programming model” to handle accessing to data and fetching data from database. Hadoop is written in Java and like other java applications, is cross-platform software environment which runs in any operating system.
Hadoop came out of the projects designed in GOOGLE Company for data processing in 2006, but after a while was widely used in many other big companies worldwide. Hadoop has apache license and it is good to know, most of modern technologies in software and IT industry, are licensed by “Apache Software Foundation” or “MIT”.
We can name other technologies in Big Data as: Apache Cassandra, NoSql databases like MongoDB, RapidMiner, Elasticsearch, Apache Kafka, Apache Spark, Splunk, RainStor, Hunk, Presto and etc. But Hadoop is widely use and it is accessible in cloud provider servers like AWS, Azure, Cloudera and etc and in couple of minutes, you can buy and implement Hadoop in one of those cloud servers.