Big Data Tutorial for Beginners - MetaTutorials

MetaTutorials

The Tutor is Yours!

Big Data Tutorial for Beginners

What is Data?

Data is a either a number, picture, audio clip, video, or any text, which can be processed by a computer system and stored in the memory (hard disk) of the computer.  

What is Big data?

Big data is nothing, but it is also a type of data, which is very large in size. It is a concept which contains both structured and unstructured large and complex data.
"The large and complex data set which cannot be managed and processed by the traditional management systems are called as Big data"
According to Gartner, the definition of Big data is as follows:
"Big data is high-volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making".
In today's time, Big data manages and processes the data which is increased day by day in the activities like e-learning, social networking, online shopping, and e-banking. 

5 V's or Characteristics of Big Data 

In the Big data, there are 3 fundamental V's (volume, velocity, and variety), which are important to know for each user. But, the data is growing rapidly day by day, so two new V's also added by the Gartner in the data processing concept. Hence, there are 5 V's in Big data, which are also known as the characteristics of Big data. These 5 V's are as follows:
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value

Volume

The Volume refers to the amount of data, which is increasing rapidly day-by-day at a very fast speed. The size of the data generated by machines and humans on social media applications is more than the petabytes and terabytes. It is estimated that 2.5 quintillion bytes of data created per day. And the 90% of today's data is generated in the past two years.

Velocity

The Velocity is defined as the speed with which the data is processed and generated. There are basically two types of velocity related to the big data:
1. Frequency of Generating the Data
2. Frequency of handling, recording, and publishing the data. 
As the data is growing at a fast speed, so the data's volume will double in every two years. The flow of generating the data is massive and continuous.  

Variety

The Variety is defined as the structured, semi-structured, and unstructured data generated by humans and machines from multiple sources. The structured data generated by humans are: tweets, texts, videos, and pictures. And, the unstructured data generated by humans are: voicemails, e-mails, audio recordings, PDFs, etc. Variety also refers to as heterogeneous sources. 

Veracity

The Veracity is defined as the abnormality or uncertainty in data generated by humans and machines.  It is also referred to as the quality of data. In simple words, we can say that incompleteness and inconsistency of data is the veracity

Value

The Data, which is generated in itself is of no use, unless we convert it into something valuable to exact the information. Hence, we can state that Value is one of the most important V in all the 5 V's of Big Data. 

Types of Big Data

In Big Data, there are following three types of Data:
1. Structured Data
2. Semi-Structured Data
3. Unstructured Data

Structured Data

The data which is stored, processed, and accessed in a fixed format or organised manner is called as Structured Data. The storing and managing the data in the RDBMS is an example of this type of data. Users can easily access this data from the database by using the simple queries and algorithms because they have a fixed format. 

Unstructured Data

The data which is stored, processed, and accessed in an unknown form or structure is called as Unstructured data. This type of data cannot be stored and processed in the Relational DBMS. NoSQL databases like CouchDB and MongoDB are used to store and manage the unstructured data. To process and analyze the unstructured data is very difficult and time-consuming.  
Examples of Unstructured data are as follows
  • Text files, 
  • Audio files, 
  • Video files, 
  • E-mails, 
  • Reports, 
  • Images, etc. 
In Big Data, Unstructured data is of two types:
a. Machine-Generated Data 
b. User-Generated Data

Machine-Generated Data 
The data which is collected or gathered from the person behaviours. This type of data is usually an unstructured and can be internal or external. 
The machine-generated data includes the following examples:
  • Surveillance photos and video,
  • Geographical Data,
  • Meteorological Data,
  • Weather Data,
  • Oceanographic Data, etc.
User-Generated Data
The data which is generated when any person or user put the data over the social media or Internet websites itself.  
The user-generated data includes the following examples:
  • Data From Twitter, Facebook, LinkedIn,
  • Mobile Data From locations and Text Messages,
  • Websites Data From Youtube and Instagram 
  • Media Data From  Chat, Phone Recording, IM,
  • etc. 

Semi-Structured Data

It is the third type of Big data. Semi-Structured is that type of data, which contains both the forms of data i.e., Structured and Unstructured data. Users can see the semi-structured data in the fixed format but it is not actually defined. 
This type of data lacks the rigid or fixed schema. 
Examples of Semi-Structured Data are as follows:
  • XML files,
  • JSON files,
  • E-mails, 
  • CSV files, etc.

Applications of Big Data

Big Data is a technology which is growing popular day-by-day among us. This technology is widely used in various areas to get the valuable information by extracting the data generated by humans, machine, and the interaction of both.
In today's time, following are the various applications where the Big Data is widely used:

1. Healthcare 

Now a Days, the Big Data technology is enormously used in Health Care. It improved the healthcare by providing the personalized medicines and perspective analysis. Big data is used for optimizing the growth of the hospital by improving the efficiency and care effectiveness.
Over the past decades, hospitals and clinics in the world the adopted the Electronic Health Record (EHR). EHR is the main application of big data in health care, which helps for improving patient care.
This technology helps the hospital staff to work more efficiently.
Various sensors are used beside patient beds for monitoring the heartbeat, blood pressure, and the respiratory rate. If there is any problem, then the sensors alerted to doctors and the administrators of healthcare.

2. Banking

The volume of data generated in the banking sector is enormous. Nowadays, banking sector allows us to receive cash from ATM and also to submit the papers at our branch online. Big Data analytics is also used in banking for enhancing our cybersecurity and for reducing the risks. 
One of the biggest disadvantage of the banking sector is the fraud. So, this technology allows banks to that there is no unauthorized transaction will be done. 
Basically, this technology reduced/decreases the number of risks generated in the banking sector. 


3. Media & Entertainment

In Media and Entertainment, Big Data helps to predict what the audience and consumers want. Various companies like Netflix and Spotify analyzing the data collected from their users around the globe and then predict what type of music and videos, users are listening and watching. 
Big Data provides the new features for analyzing the public demand. 
Big Data helps the companies and advertisers for better understanding the what type of ads consumers or viewers are willing to watch. Due to this, advertisers create the personalized ads, which helps in increasing the CTRs. 


4. Agriculture

Big Data is playing an important role in the agriculture sector for enhancing the performance of firms. Its goal is to minimise the loss of firms. 
This technology can take the data from the past years and suggest the farmers to use the best pesticides which work under the specific conditions. 
It allows the owners of the firms to use the same land for various purposes. It also automates the watering system of the firms. 
Due to years of event tracking, now, Big Data technology can predict the weather events and even pest problems. 


5. Education Sector

Big Data is used in education for providing the innumerable benefits to both the teacher and student. Big Data helps the education sector in maintaining the security. It also helps in improving the performance of the students. It implements the study plans by analyzing the grades of a student in a different subject.
This technology also point out the factors which affect the performance of an individual student and provides the best solutions. It also helps the students for choosing the best career option. Big Data analytics allow the institutions to build the different learning plans.  

6. Government

The government handles the various local, global, and national complex issues daily. Big Data is an important technology in government, which allows us to analyze the impact and opinion of the decision. 
By the use of this technology, the government sector can easily reach to the demand of the public and act according to them. It also identifies those areas which need attention.

7. Disaster Management

When there is no Big data technology, scientists are not able to predict the possibility of disaster. But, now, Big Data helps to identify the disaster by evaluating temperature, wind pressure and other related factors. 
Using this technology, the government takes necessary actions for minimizing the effects of natural calamities like floods, hurricane, and earthquakes. 
It also analyze the weather conditions every twelve hours from satellite and radar. Big Data also identifies the possibilities of flood in any particular area at a specific time. 

Advantages of Big Data

Following are the various benefits or advantages of Big Data:
1. One of the biggest advantages of Big Data is fraud detection. It helps the banks and other financial institutions for detecting the fraud at that moment when it happens. 
2. Another advantage of Big Data is that it is cost saving. The use of real-time applications may be expensive, but saves a lot of time in analyzing the data. 
3. It also allows us to predict against the particular situations. 
4. It helps in improving the science and research. 
5. This technology helps in optimizing the business processes and helps the business in making more effective decisions. 
6. Big Data provides the useful insights. 
7. It can provide the information which is used for helping customers.
8. The tools of Big Data helps the analysts to analyze the information at a faster speed. Due to this, the production level is increased. 


Disadvantages of Big Data

Following are the various limitations or disadvantages of Big Data
1.
One of the disadvantages of big data is privacy and security.
2.
Sometimes, the results of big data analysis mislead.
3.
There is a lot of data which is unstructured.
4.
 There may be a shortage of talent in the Big Data sector for most businesses.