INTRODUCTIONA more. Therefore as a result, it

INTRODUCTIONA relatively large percentage of the data that we have and can access in recent times has been generated over the years on a daily basis and presumably hourly. We generate tones of millions of bytes of data coming across from multiple platforms such as from social media platforms, capturing transactional information in banks, high quality digital images and motion pictures and so much more.

Therefore as a result, it is only fitting that companies develop much more complex big data solutions that will enable a robust, affordable, scalable and flexible way of capturing, processing and even storing the data they generate. As such, it is important to move from a simple traditional way of capturing this data to a way that allows the organizations to capture this newly complex and unstructured data that has evolved with time and technological advancements. It is therefore important to know which architecture to use and the resources to employ to work hand in hand with the immensely large data architectures as data nowadays does not encompass storing figures into relations and has seen different types of content such as photos/pictures, voice notes, documents in their various forms and videos also requiring storage.DEFINITION OF BIG DATABig data refers to massive data quantities that are impossible to store and process when employing a simple system used for managing a database and its approach in a specific time frame.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
4,80
Writers Experience
4,80
Delivery
4,90
Support
4,70
Price
Recommended Service
From $13.90 per page
4,6 / 5
4,70
Writers Experience
4,70
Delivery
4,60
Support
4,60
Price
From $20.00 per page
4,5 / 5
4,80
Writers Experience
4,50
Delivery
4,40
Support
4,10
Price
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

It references all data that is in petabytes or greater in memory size that causes drawbacks in storing, analysing and envisioning the data, i.e. Terabytes, Exabytes, Zetabytes etc. Its volume outweighs the resources that are used to store it or even process it.

This type of data is not transactional and has evolved to either be user generated or can be generated by machines that are of artificial intelligence.BIG DATA ARCHITECTURE Big data architecture, a basis for big data analytics, is an outcome of the intercommunication of big data application resources. These resources or database technology are put together to achieve high performance, high fault comprehension and scalability. It is dependent upon resources that the organization has and also on the data environment an organization has. A big data structure is devised to handle the ingestion of data, its processing and analysis of data that is too large and difficult for simple traditional database systems. The Solutions normally involve the processing of big data sources in batches (at rest), the big data processing in real-time (in motion), interactive study of this data and analytics and machine learning that are apocalyptic.A bunch of big data structures involve some or most of the following components;Data source: It is possible to find a stand-alone data source or they can be many and used interchangeably based on the amount of data the organisation creates.

These range from mounted data store databases to files that implementations like web server log files make.Data storage: Operational data that results from bulk processing gets written to a distributed storage file that has the ability to hold immense data quantities in their various forms commonly referred to as a data lake.Batch processing: The solution must systematically digest data using reliable tasks to choose, assign and make it ready for it to be analysed. This process involves reading source files, processing them and writing output to new files.Real-time message recording: the architecture should include ways to record or store real-time communication for online processing only when the solution involves real-time sources.

Analytical Data Store: The Solution should prepare data for inspection and give out the examined one in an organized form that will allow it to simply be accessed using analytical resources.Orchestration: Orchestration technology can be employed to enforce correlation and correspondence for solutions that involve repetition of operations responsible for digesting data and positioning the data into a data store and assemble the output in the form of a report.EXAMPLES OF BIG DATA ARCHITECTURE Internet of Things (IoT) architecture: The Internet of Things has no precise and universally agreed consensus regarding its architecture. As such, multiple architectures have been proposed. These include three & five-layer architectures, cloud and fog based architectures, social IoT, representative architecture and a whole lot of others. A basic layer of the IoT architecture has a sensor/device, edge, data intelligence and application layers that are stacked one over the other to carry out unique tasks with each having sub layers within it.

source:https://www.researchgate.net/figure/A-generic-Internet-of-Things-IoT-network-architecture_fig1_282853869Lambda architecture: Data processing architecture modelled to handle large quantities of data by making use of both batch and stream processing methods. Such an approach to architecture makes an effort to find a balance between latency, throughput, and fault tolerance by using batch processing to provide well-rounded and precise views of batch data, while simultaneously using real-time stream processing to cater for online data views. The rise of lambda architecture corresponds with the growth of big data, real-time analytics and the drive to mitigate the latencies of MapReduce.

Lambda architecture is dependent on a data model with an append-only immutable data source that serves as a system of record. It is intended for ingesting and processing time stamped events that are appended to existing events rather than overwriting them. It has three layers:1. The Batch Layer manages the master data and precomputes the batch views2. The Speed Layer serves recent data only and increments the real-time views3.

The Serving Layer is responsible for indexing and exposing the views so that they can be queried.The three layers are outlined in the below diagram along with a sample choice of technology stacks: Source: https://dzone.com/articles/lambda-architecture-big-dataHadoop Architecture: Hadoop Skill Set needs a considerable amount of knowledge of every process in the Hadoop stack right from understanding the various components in the Hadoop architecture and devising a Hadoop cluster that includes performance, tuning it and setting up the top chain responsible for processing data. It follows a primitive master slave architecture devise for storing data and processing distributed data using HDFS and MapReduce respectively. Hadoop is the master node for data storage, HDFS is the Name Node and the Job Tracker is the master node for parallel processing of data using Hadoop’s MapReduce. Slave nodes in the Hadoop architecture are other machines in the Hadoop cluster which store data and carry out complex operations.

Each slave node is designated a Task Tracker daemon and a Data Node that links the processes with the Job Tracker and Name Node respectively. In Hadoop architectural implementation the master or slave systems can be established in the cloud or on premise. Image Credit: OpenSource.comBIG DATA TECHNOLOGIESBig data technologies are the means through which drawbacks in data analytics, visualization and storage are tackled. Because of the problems brought forth by big data’s volume, variety and velocity, it prompts for new technology solutions. The most prominent and widely used big data technology is the Hadoop open source project which was invented by Apache. This open source library was created with the focus placed on scalable, reliable, distributed and flexible computing systems that can handle this big data.

Hadoop is made up of two components that work hand in hand. First up is the Hadoop Distributed File System (HDFS) which gives way to high-bandwidth that is necessary for big data computing.The second component that makes up Hadoop is a data processing structure or platform known as MapReduce. It is important as it distributes huge data sets from search engines (e.g. google search technology) across many servers which will in turn process the overall data set it receives and creates a summary before more traditional analysis resources are used. The distribution and summary creation of the large data sets is what is presumed to be the “map” and “reduce “respectively.

Hadoop technology and various big data resources have evolved to solve the challenges faced in the big data environment. These big data resources can be classified into categories as follows;1. Data Storage and ManagementExamples include NoSQL MongoDB, Couch DB, Cassandra, HBase, Neo4J, Talend, Apache Hadoop, Apache Zoo Keeper etc.

2. Data Cleaning Examples include MS Excel, Open Refine etc.3. Data MiningThis is a process of discovering insights in a database. Examples include Rapid miner, Teradata etc.NoSQLA collection of concepts that enable efficient, effective and rapid processing of data sets which are characterised by reliability, scalability, flexibility, agility and performance. Because it is called ‘NoSQL’, which is a short notation for “not-SQL” or rather “not only SQL”, it does not mean that it employs the use of a language other than SQL.

It utilises SQL as well as other query languages.NoSQL is an advancement to databases that shows a drift from simple popular relational database management systems (RDBMS). When explaining NoSQL, caution is taken to ensure that we first explain SQL, which is a structured step by step query language employed by the RDBMS. These type of databases depend on relations/tables, rows, columns or schemas to categorize and recover data.In comparison, NoSQL does not rely on the later.

It rather uses much more reliable and flexible data models. Because Relational Database Management Systems have tremendously been unable to meet the flexibility, performance and scalability needs required by these data-intensive next-generation applications, NoSQL databases have been embraced by multiple mainstream organizations fulfil the shortcomings of these RDBMS. NoSQL is specifically used to store data that is unstructured, grows much more rapidly than structured one and does not fit into tables in the RDBMS.

Regular examples of unstructured data comprises of: Massive objects such as videos and images, chat/messaging and log based data, user entered and session generated data and lastly time series(real-time) data such as IoT and device data.TYPES OF NOSQL DATABASESA few distinct variations of NoSQL databases have been made to help particular needs and use cases. These fall into four principle classifications: Key-value data stores: Key-value NoSQL databases accentuate effortlessness and are extremely helpful in quickening an application to help fast peruse and compose preparing of non-value-based information.

Put away qualities can be any sort of twofold question (content, video, JSON report, and so on.) and are gotten to by means of a key. The application has finish authority over what is put away in the esteem, making this the most adaptable NoSQL demonstrate. Information is parcelled and recreated over a group to get versatility and accessibility. Therefore, key esteem stores regularly don’t bolster exchanges. In any case, they are exceedingly compelling at scaling applications that bargain with high-speed, non-value-based information. Document stores: Document databases regularly store self-depicting JSON, XML, and BSON archives.

They are like key-value stores, however for this situation; a value is a solitary archive that stores all information identified with a particular key. Prevalent fields in the archive can be filed to give quick recovery without knowing the key. Each archive can have the equivalent or an alternate structure.

Wide-column stores: Wide-column NoSQL databases store information in tables with columns and rows like RDBMS; however names and the organization of segments can differ from column to push over the table. Wide-segment databases bind segments of related information together. A query can recover related information in a solitary task on the grounds that just the sections related with the inquiry are recovered. In a RDBMS, the information would be in various rows put away in various places on a disk store, requiring numerous disk operations for recovery.

Graph and chart stores: A diagram database utilizes chart structures to store, map, and query connections. They give reference free adjacency, so neighbouring components are connected together without utilizing an index. Multi-modal databases use a mix of the four sorts portrayed above and in this way can support a more extensive scope of usesBENEFITS OF NOSQLNoSQL characteristics can address the challenges of big data in the following ways:Scalability: NoSQL databases have revolutionised the ability to insert or deduct capacity using commodity hardware that is quicker and in a much simpler way without demolishing it. This practice eradicates high cost and complexity brought about by sharding that is done manually when trying to attempt to scale an RDBMS.Performance: A Commodity resource improves performance with NoSQL databases and makes it easy for companies to consistently offer really fast operator experience with no overheads related to non-automatic sharding.High Availability: NoSQL relays data between multiple concurrent users and resources in a fair and just manner to enable the application to support read and write actions to be invoked on the application.

These databases avoid complexities that come with RDMS architecture.Global Availability: non-SQL can reduce the delays incurred during an automated replication of data in between servers, cloud tools and data centres by making sure that users experience consistent interaction with the application regardless of their geographical experience. It reduces the burden of having to assign personnel to manage the database manually.Flexible Data Modelling: With a NoSQL database, application and system developers can reach a fluid and expandable model that employs the use of unique data manipulation techniques that are entirely natural for a peculiar system use.

These will give birth to a quicker and more reliable system to database interaction.

You Might Also Like
x

Hi!
I'm Alejandro!

Would you like to get a custom essay? How about receiving a customized one?

Check it out