Home Uncategorized Data Science & Big Data Research Paper - PhD (No Plagiarism)

Uncategorized

Data Science & Big Data Research Paper – PhD (No Plagiarism)

We have discussed PPT with professor and he said the below points

1. Need to compare our PPT with similar scholar paper and mention why are PPT is unique to the other scholar paper(Add a Slide with a topic which is unique or additional research compare to the other similar scholar paper)

2. Take a scenario and explain atleast 4 tools as a example how will use it in the scenario

3. We need a research paper to the same topic as of above PPT with below template format(just elaborate a bit with same details)

Research Paper Format

– Minimum 12 pages with APA format

– Cover Page

– Abstract

– Table of Content

– Discussion – Maint Content

– Justification and explanation

– Conclusions

– Citations/ References

Note: As Discussed please give a new PPT as per the requirement.

Running Head:

BIG DATA PROCESSING OF SOFTWARE AND TOOLS

2
BIG DATA PROCESSING OF SOFTWARE AND TOOLS

University of the Cumberlands

Big Data Processing of Software and Tools

Data Science & Big Data Analytics

ITS 836-21 Group-1

Prof: Gamini Bulumulle

Date submitted: 02/23/2020

Submitted By:

Table of contents

Abstract

………………………………………………………………………………………………………..3

Executive summary

………………………………………………………………………………………..4

Big data analytics software……………………………………………………………………………..6

Apache Hadoop

……………………………………………………………………………..6

· CDH……………………………………………………………………………………7

· Casandra……………………………………………………………………………………….7

Knime

…………………………………………………………………………………………..7

Datawrapper

………………………………………………………………………………….8

MongoDB

……………………………………………………………………………………..8

Lumify

………………………………………………………………………………………….9

HPCC

……………………………………………………………………………………………9

Storm

…………………………………………………………………………………………..10

Apache SAMOA

…………………………………………………………………………..10

Talend

………………………………………………………………………………………….10

RapidMiner

…………………………………………………………………………………..11

Analyzing the data sets using R language…………………………………………..12

Conclusion……………………………………………………………………………………………………12

References

…………………………………………………………………………………………………….14

Abstract

The concept of big data analytics has been used over the years and most companies have embraced the idea, to harness data that is being used in their day to day company routines. Companies can apply analytics and receive huge benefits from it, back in the 1950s, companies were using big data in in terms of spreadsheet analysis. This was a crude form of big data analytics used to reveal small bits of data and data patterns. Nowadays companies use big data analytics software to handle huge chunks of data because it has a variety of benefits to businesses. Some of the advantages of big data analytics include: the speed in handling data, efficiency and productivity. Many businesses prefer to accumulate huge data and later run analytics of the data to be used for future references in the company. Big data analytics ensures that businesses make the right choices when it comes to handling data in the organization. The ability of big data to work quicker and remain efficient gives companies the advantage that they did not have previously. This research paper will focus majorly on the big data analytics software and their benefits to an organization.

Keywords: Big data, analysis, spreadsheet, efficiency, organization

Executive summary

Big data analysis software gives organizations the ability to get new ideas based the results of the analysis. It then encourages more effective and efficient business ideas, increased benefits, increased proficiency, and happy clients. In a research by Tom Davenport more than fifty companies were analysed to see how they employed the use of big data analysis software (Chandarana, P., & Vijayalakshmi, M., 2014, April). The conclusions that were made from the research was that there were decreased costs when it comes to data analysis. The companies that were using big data analytics software such as Apache Hadoop and a cloud based analysis had reduced costs when it comes to storage and analysis of data, these companies also had an upper hand in making business decisions. The research also proved that the companies that were making use of bid data analysis software were quicker and had better dynamics when analysing data. With in memory analysis and Hadoop, combined with the ability to analyse new collections of data, companies can be able to analyse data with a considerable speed and come into conclusion based on the results of the analyses.

With the use of big data analytics software, there is an increased ability to measure the needs of customers and know what they need. Davenports research brings emphasis on bid data analytics, there is an increased understanding of the needs of the clients and better ways to address these issues. Nowadays, many organizations widely use big data analytics to make a big difference in the market. With open source big data analytics software, the most valuable sections of the organizations are secure, expenses are reduced. Hadoop is one of the best big data analytics software that most business currently use and many vendors currently employ the services of Hadoop.

Hypothetically, a company may be faced with the need to do market analysis in order to ascertain the trends in the market. This scenario calls for the use of big data to help in the marketing trend analysis. Big data software such as Hadoop, Apache SAMOA, Casandra and Datawrapper can be used to analyse the data and come up with an idea of what the market looks like. All the software listed above play a role when it comes to market trend analysis. For example, Hadoop will be used to analyse huge data sets and help in giving out information that relates to the future trends in that line of business. Datawrapper will help the organization to perceive the type of information to be analysed for market trends.

Big data analytics software

There are many things that come to the limelight when it comes to the use of big data analytics in the modern world. Some of the things that come to mind when it comes to big data include what analysis software are to be used, how big the data indices are, what is the normal data yield within an organization and so on (Bhosale, H. S., & Gadekar, D. P., 2014). Big data analysis can be broadly classified in the following ways: improvement stages, advanced devices, as analysis instruments, for data analytics and other analysis devices. Some of the software used for big data analytics include the following:

Apache Hadoop

This software is used in big data analytics to analyse huge chunks of data and grouped file systems. Hadoop forms a part of big data and MapReduce model of programming. It is an open source software that uses Java programming to give a cross functional support and analysis of data. It is one of the widely used analytics software. Research has it that more that fifty Fortune companies use Hadoop in their data analysis systems. Some of the noteworthy companies that use Hadoop include Facebook, Intel, Amazon Web services, Hortonworks, IBM statistics, Microsoft and many more.

The are many benefits that comes with using Hadoop and some of them are listed below: the entire system of Hadoop has a distributed file system which has the capacity to carry all kinds of data such as pictures, XML, JSON, Hadoop is also very valuable when it comes to R&D uses, the software also has an advantage when it comes to access to data, the tool is highly versatile and easily accessible when it comes to using a system of computers. However, there are many disadvantages that come with using Hadoop. Some of the downfalls include the issue of repetition and a reduced functionality when it comes to I/O activities.

CDH (Cloudera Distribution for Hadoop) software

CDH focuses on big merchantry matriculation arrangements of that innovation. It is a thoroughly open-source and has a self-ruling stage plagiarism that includes Apache Hadoop, Apache Spark, Apache Impala, and some more. It permits you to gather, process, oversee, find, model, and circulate widespread information. Benefits of using CDH software: Comprehensive dissemination, Cloudera Manager oversees the Hadoop group well indeed, Easy usage, Less ramified organization, Upper security and wardship. Disadvantages of using CDH software include: Few muddling UI highlights like outlines on the CM administration, Multiple prescribed methodologies for establishment sound befuddling and, in any case, the Licensing forfeit on a for every hub premise is truly costly.

Cassandra

Apache Cassandra is liberated from forfeit and open-source sparse NoSQL DBMS ripened to oversee immense volumes of information spread over various item servers, conveying upper accessibility. It utilizes CQL (Cassandra Structure Language) to cooperate with the database. A portion of the prominent organizations utilizing Cassandra incorporates Accenture, American Express, Facebook, General Electric, Honeywell, Yahoo, and so on. Benefits of using big Apache Casandra include: No single purpose of disappointment, Handles big data rapidly, Log-organized capacity, Automated replication, Linear tensility and Simple Ring diamond. Disadvantages of using Casandra include: Requires some spare endeavours in investigating and upkeep, Clustering could have been improved and Row-level locking highlight isn’t there.

Knime

KNIME represents Konstanz Information Miner which is an open-source device that is used for Enterprise detailing, incorporation, analytics, CRM, information mining, information analysis, content mining, and merchantry insight. It underpins Linux, OS X, and Windows working frameworks. It very well may be considered as a decent option in unrelatedness to SAS. A portion of the top organizations utilizing Knime incorporates Comcast, Johnson and Johnson, Canadian Tire, and so forth. Benefits of using KNIME include: Simple ETL activities, it integrates very well with variegated innovations and dialects, Rich numbering set, highly usable and sorted out work processes, automates an unconfined deal of transmission work, no steadiness issues and Easy to set up. Disadvantages of using KNIME software for data analytics: Data dealing with a limit can be improved, it occupies nearly the whole RAM and it Could have permitted joining with diagram databases.

Datawrapper

Datawrapper is an open-source stage for information perception that guides its clients to produce basic, word-for-word and embeddable outlines rapidly. Its significant clients are newsrooms that are spread everywhere throughout the world. A portion of the names incorporates The Times, Fortune, Mother Jones, Bloomberg, Twitter and so forth. Benefits of using Datawrapper for big data analytics: The device is well tending of. Works very well on all sorts of gadgets – versatile, tablet or work area, fully responsive, Fast, Interactive, brings all the diagrams in a single spot, Unconfined customization and fare choices and It requires zero coding. Disadvantages: Limited shading palettes

MongoDB

MongoDB is a NoSQL, report serried database written in C, C #, and JavaScript. It is unviable to utilize and is an open-source device that bolsters variegated working frameworks including Windows Vista (and later forms), OS X (10.7 and later forms), Linux, Solaris, and FreeBSD. Its primary highlights incorporate Aggregation, Adhoc-inquiries, Uses BSON group, Shading, Indexing, Replication, Server-side execution of JavaScript, Schema less, Capped assortment, MongoDB the workbench wardship (MMS), load adjusting and record stockpiling. A portion of the significant clients utilizing MongoDB incorporates Facebook, eBay, MetLife, Google, and so on. Benefits of using MongoDB for big data analytics: Easy to learn, Provides support for various innovations and stages, No hiccups in establishment and support, Reliable and minimal effort. Disadvantages of using MongoDB for big analytics: Limited analytics and Slow for unrepeatable utilization cases.

Lumify

Lumify is a self-ruling and open-source instrument for big data combination/reconciliation, analysis, and representation. Its essential highlights incorporate full-content pursuit, 2D and 3D orchestration perceptions, programmed formats, connect analytics between diagram elements, combined with mapping frameworks, geospatial analysis, sight and sound analytics, a continuous coordinated effort through a lot of undertakings or workspaces. Benefits of using Lumify for big data analytics: Scalable, Secure, supported by a single-minded full-time urging group, Supports the cloud-based condition. Functions admirably with Amazon’s AWS.

HPCC

HPCC represents High-Performance Computing Cluster. This is a finished big data wattle over an uncommonly versatile supercomputing stage. HPCC is likewise alluded to as DAS (Data Analytics Supercomputer). This device was created by LexisNexis Risk Solutions. This workings are written in C and an information-driven programming language knowns as ECL (Enterprise Control Language). It depends on Thor engineering that bolsters information parallelism, pipeline parallelism, and framework parallelism. It is an open-source device and is a decent substitute for Hadoop and some other Big information stages. Benefits of using HPCC for big data analytics: The engineering depends on product processing groups which requite superior, Parallel information preparing, Fast, incredible and profoundly adaptable, supports superior online inquiry applications, and it is Cost-powerful and exhaustive.

Storm

Apache Storm is a cross-stage, conveyed stream handling, and shortcoming tolerant unvarying computational structure. It is self-ruling and open-source. The designers of the tempest incorporate Back type and Twitter. It is written in Clojure and Java. Its engineering depends on tweaked gushes and darts to portray wellsprings of data and controls to indulge cluster, sparse handling of unbounded surges of information. Among many, Groupon, Yahoo, Alibaba, and The Weather Channel are a portion of the well-known organizations that utilization Apache Storm. Benefits of using Apache storm for big data analytics: Reliable at scale, very quick and shortcoming tolerant, Guarantees the handling of information, it has numerous utilization cases – ongoing analytics, log preparing, ETL (Extract-Transform-Load), resulting calculation, conveyed RPC, AI. Disadvantages of using Apache storm for big data analytics: Difficult to learn and utilize, Difficulties with investigating, and the use of Native Scheduler and Nimbus wilt bottlenecks.

Apache SAMOA

SAMOA represents Scalable Advanced Massive Online Analysis. It is an open-source stage for big data stream mining and AI. It permits you to make sparse spilling AI (ML) calculations and run them on numerous DSPEs (appropriated stream preparing motors). Apache SAMOA’s nearest elective is a BigML device. Benefits of using Apache SAMOA for big data analytics: Simple and witty to utilize, Fast and versatile, True continuous spilling and it has a Write Once Run Anywhere (WORA) engineering.

Talend

Talend Big information coordination items include: Open studio for Big information: It goes under self-ruling and open-source permit. Its parts and connectors are Hadoop and NoSQL. It gives network perpetuate as it were, Big information stage: It accompanies a client-based membership permit. Its parts and connectors are MapReduce and Spark. It gives Web, email, and telephone support and Real-time big data stage: It goes under a client-based membership permit. Its parts and connectors incorporate Spark gushing, Machine learning, and IoT. It gives Web, email, and telephone support. Benefits of using Talend for big data analytics: Streamlines ETL and ELT for Big information, Accomplish the speed and size of sparkle, accelerates your transition to continuous, handles numerous information sources and It provides various connectors under one rooftop, which thus will permit you to redo the wattle equal to your needs. Disadvantages of using Talend for big data analytics: Community valuables could have been something more, could have an improved and simple to utilize interface and Difficult to add a custom segment to the palette.

RapidMiner

RapidMiner is a cross-stage workings that offers a coordinated domain for information science, AI and prescient analytics. It goes under variegated licenses that offer little, medium and huge restrictive versions just as a self-ruling release that takes into consideration 1 legitimate processor and up to 10,000 information columns. Organizations like Hitachi, BMW, Samsung, Airbus, and so along have been utilizing RapidMiner. Benefits of using RapidMiner in big data software analytics: Open-source Java centre, the repletion of wearing whet information science instruments and calculations, the facility of code-discretionary GUI, Integrates well with APIs and cloud, Superb vendee assistance and specialized help. However, while using RapidMiner, Online information administrations ought to be improved.

Analyzing the data sets using R language

Data simulation is the crucial stage in processing raw data to identify and trace certain patterns and generate the reports to enhance the productivity. We have taken some sample data set regarding a computer store, where we did some simulation to show the different type of RAM available in the store and simulated to hard disk prices.

Conclusion

The computerized age has made it simpler for experts to get to the information that would permit you to improve your business execution (Manikandan, S. G., & Ravi, S., 2014). In any case, to use this data, you will require information examination programming that can give you devices for information mining, association, investigation, and perception. Besides, it ought to be furnished with AI and propelled calculations to change your crude information into significant bits of knowledge right away. Along these lines, you can stay aware of business drifts, and even discover approaches to additionally improve your general tasks. In any case, there are a lot of components associated with finding the privilege investigation apparatus for a specific business. From looking at its exhibition to figuring how well it plays with different frameworks, the exploration procedure can be overpowering. In this way, to support you, we have assembled the main items available and surveyed their functionalities and ease of use. Big Data tools help us to store and transform the huge data into analytics to track and understand to predict certain patterns and gain the productivity of the organization. Thusly, it will be simpler for you to decide the most ideal information investigation stage for your tasks.

References

Bhosale, H. S., & Gadekar, D. P. (2014). A review paper on big data and hadoop. International Journal of Scientific and Research Publications, 4(10), 1-7.

Chandarana, P., & Vijayalakshmi, M. (2014, April). Big data analytics frameworks. In 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA) (pp. 430-434). IEEE.

Manikandan, S. G., & Ravi, S. (2014, October). Big data analysis using Apache Hadoop. In 2014 International Conference on IT Convergence and Security (ICITCS) (pp. 1-4). IEEE.

Talia, D. (2013). Clouds for scalable big data analytics. Computer, (5), 98-101.

Allen, G., Campbell, F., & Hu, Y. (2015). Comments on “visualizing statistical models”: Visualizing modern statistical methods for Big Data. Statistical Analysis And Data Mining: The ASA Data Science Journal, 8(4), 226-228. doi: 10.1002/sam.11272

Griffith, D. (1993). Advanced spatial statistics for analysing and visualizing geo-referenced data. International Journal Of Geographical Information Systems, 7(2), 107-123. doi: 10.1080/02693799308901945

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Data Science & Big Data Research Paper – PhD (No Plagiarism) ”

Get high-quality paper

NEW! AI matching with writer

Hire a Writer

Client Reviews

4.9

Sitejabber

4.6

Trustpilot

4.8

Our Guarantees

100% Confidentiality

Information about customers is confidential and never disclosed to third parties.

Original Writing

We complete all papers from scratch. You can get a plagiarism report.

Timely Delivery

No missed deadlines – 97% of assignments are completed in time.

Money Back

If you're confident that a writer didn't follow your order details, ask for a refund.

New to Your Trusted Assignment Help Service? Sign up & Save

Calculate the price of your order

Type of paper needed:

Pages:

You will get a personal manager and a discount.

Academic level:

We'll send you the first draft for approval by at

Total price:

$0.00

Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.

Power up Your Study Success with Experts We’ve Got Your Back.

Order Now Order Now