Skip to content

Ravikumar-3064/Exploratory-Data-Analysis-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

                                               EXPLORATORY DATA ANALYSIS

Exploratory Data Analysis (EDA) is the initial step in data analysis where we explore datasets to understand their structure, patterns, and key characteristics. It helps uncover relationships, anomalies, and trends before applying machine learning or statistical models.

✨ Key Objectives of EDA:

πŸ” Understand the dataset – shape, size, data types, missing values.

πŸ“ˆ Summarize data distributions – mean, median, variance, outliers.

🧩 Identify patterns & correlations between features.

🚨 Detect anomalies or unusual observations.

🎨 Visualize data for deeper insights using plots and graphs.

πŸ› οΈ Tools & Techniques:

Descriptive Statistics β†’ mean, median, mode, standard deviation.

Data Visualization β†’ histograms, scatter plots, boxplots, heatmaps.

Data Cleaning β†’ handling null values, duplicates, and outliers.

Feature Understanding β†’ identifying categorical vs numerical variables.

🧩 PART 1 – Web Scraping & Data Collection πŸ“š Libraries Used

πŸ”— requests β†’ Fetches the website’s HTML content.

πŸ₯£ BeautifulSoup β†’ Parses the HTML to extract structured data.

πŸ“‚ csv β†’ Saves the extracted data into a CSV file.

πŸ—οΈ Script Structure

πŸ›‘οΈ Uses a try...except block for error handling.

πŸ“‹ Data stored in a list of dictionaries (author, quote, tags).

πŸ”„ while loop iterates through pages to fetch quotes.

πŸ” Scraping Process

Starts from page 1, continues until all pages are processed.

Constructs the page URL dynamically.

Uses requests.get() β†’ retrieves HTML.

BeautifulSoup extracts:

✍️ Author

πŸ’¬ Quote text

🏷️ Tags

Each entry is stored as a dictionary β†’ appended to the list.

πŸ“‘ Handling Multiple Pages

πŸ”„ Script continues as long as there’s a "next" button.

πŸ“Œ Current setup β†’ scrapes up to 10 pages.

βœ… Ensures all available quotes are collected.

πŸ’Ύ Saving to CSV

Exports results to quotes.csv.

Defines columns: author, quote, tag_name.

Uses csv.DictWriter to store rows.

πŸ“Š Structured dataset ready for analysis.

🧩 PART 2 – SQL Queries on Quotes Data πŸ”Ž Query 1 – Count Quotes per Author SELECT author, COUNT(*) AS quote_count FROM quotes GROUP BY author ORDER BY quote_count DESC;

πŸ‘‰ Shows authors ranked by number of quotes.

🏷️ Query 2 – Top 5 Most Common Tags SELECT tag_name, COUNT(tag_name) AS tag_count FROM quotes GROUP BY tag_name ORDER BY tag_count DESC LIMIT 5;

πŸ‘‰ Retrieves top 5 tags with the highest frequency.

✍️ Query 3 – Authors with More Than 5 Quotes SELECT author, COUNT(author) AS quote_count FROM quotes GROUP BY author HAVING COUNT(author) > 5;

πŸ‘‰ Filters only authors with >5 quotes.

πŸ“ Query 4 – Find the Longest Quote SELECT author, quote_text FROM quotes ORDER BY LENGTH(quote_text) DESC LIMIT 1;

πŸ‘‰ Returns the longest quote and its author.

🧩 PART 3 – Exploratory Data Analysis (EDA) πŸ“‚ Steps

🐼 import pandas as pd β†’ Import Pandas.

πŸ“₯ pd.read_csv("quotes.csv") β†’ Load dataset.

πŸ“ df.info() β†’ Summary (rows, cols, datatypes, null values).

πŸ‘€ df.head() β†’ Preview first 5 rows.

πŸ”’ df['author'].nunique() β†’ Count of unique authors.

πŸ“Š df.describe(include='all') β†’ Descriptive statistics for all columns.

βœ… Insights

πŸ” Identifies missing values & datatypes.

✨ Shows sample data rows.

πŸ‘©β€πŸ’» Finds number of distinct authors.

πŸ“ˆ Provides statistical overview of dataset.

                                         RELAVENT TAGS
  • Exploratory-data-analysis

  • eda

  • data-science

  • data-visualization

  • data-analysis

  • pandas

  • python

  • statistics

  • jupyter-notebook

About

Exploratory Data Analysis (EDA) is the initial step in data analysis where we explore datasets to understand their structure, patterns, and key characteristics. It helps uncover relationships, anomalies, and trends before applying machine learning or statistical models.

Topics

Resources

Stars

Watchers

Forks

Contributors