Posts

Showing posts with the label Article

Data Science with Spark

Image
  DATA SCIENCE AND APACHE SPARK Data Science has transformed the world. It has contributed towards the excessive growth of data and to develop intelligent systems. To analyze large amounts of data, various Data Science tools are available to Data Scientists. Among several available tools, Apache Spark has revolutionized the Data Science industry in a great manner. Apache Spark Spark is one of the open-source which is capable to process huge amount of data efficiently with very high speed. Due to its data streaming capability, Spark has left behind the other existing Big Data platforms. It also carry out machine learning operations and SQL workloads that allow us to access the datasets. Spark is developed on application levels through multiple languages like Python, Java, R, and Scala. Components of Apache Spark for Data Science Main components of Spark are – Spark Core, Spark SQL, Spark Streaming, Spark MLlib, Spark R and Spark GraphX. SNo Components Description 1 Spark Core This i...

Hack-o-Problem

From this month we are starting a post “Hacko-problem” related to hackathon problems. This initiative will help and ignite spark in the minds of young people as to what are real world problems and what are the ways or approaches to solve a problem. Hope to find tangible solutions to the problems and activate hidden talent of problem solving inside the youth of today. For the problems one needs to submit their solutions in a document with the whole plan, or could create a video explaining the solution. We have attached a Google Form for the submission of the solution. Problem Build an online system for monitoring water quality, leaks, contamination, and managing pipeline network. Design simulation of the hardware that would provide the solution. Organization Andhra Pradesh Innovation Society, ITE&C Department Dataset Source NEERI Dataset Year of study 2005 Cities covered 1. Allahabad  2. Bangalore  3. Bhopal 4. Bhubaneshwar  5. Chandigarh  6. Coimbatore 7. Dehradu...

DOCUMENTATION ON LINEAR REGRESSION

Image
Linear Regression Linear Regression is the simplest algorithm . Linear Regression is basically modelled using a straight line . It is used with continous variable to predict values .  It is illustrated by the equation :       y= a0+a1x+ ε   This equation tells us the relationship b/w the two variables i.e x and  y.  y depends on the value of x . and where,a0 is the intercept ,a1 is Linear regression coefficient and ε = random error. Types Positive  and Negative Linear Relationship: In Positive linear relationship, if y increases then x increases.  In Negative linear relationship, if y decreases then x decreases. Our main aim in this regression is to find the best fitted line . So, basically there are three common evolution metrics to find that : Mean Absolute Error Mean Squared Error Root Mean Squared Error Linear regression is furture categorized  into two types of the algorithm: Simple Linear Regression: If only one variable(...

Ubuntu on Windows

Image
This tutorial is for all those viewers who have Windows operating system and want to install Ubuntu in their system. This tutorial will explain the following points:- How to install Virtual Machine( Virtualbox ) in your system?   How to install Linux ( Ubuntu ) in your virtual machine Let's start the process. 1.      Download the Virtualbox software from virtualbox.org . VirtualBox Website 2.      After download, install the software by double-clicking the installer. 3.      Now download Ubuntu ISO Image from https://ubuntu.com/download/desktop . Ubuntu Website 4.      After installation of VirtualBox now let us create a virtual machine so that we can install Linux on it. For Linux users, all the further steps are the same. Click on the New icon in the toolbox. 5.      Type Ubuntu in Name and click next. 6.      Set memory size as 512 MB. 7. ...

Can I make my own search engine from scratch-2

Image
Design a Crawler – 1 I had designed this article to make people understand how to Design a Crawler that will do its basic task of fetching documents from web. This article will contain information about the first phase of the search engine i.e., crawler. In my last article, I had given an overview of the search engine and its working. I will define various functions of the crawler and its working. Crawler Introduction The web crawlers crawl through the content of the webpage to crawl all the different web pages linked to it. It has many other synonyms like spider and bot. The crawler scans the content of a site, being crawled, and learns about some of the information like the domain of the website, URL, links, etc. It takes the first page of the site as a seed page, which directs the crawler to crawl all the different web pages linked to it. Here is the algorithm that defines the process that web crawler follows:- Algorithm Basic-Crawler: Input: URL Output: links stored in storage Inpu...