Python Project URL Checker

April 05, 2024

Python Project URL Checker

1. Importing Required Modules:

import requests

from urllib.parse import urlparse

from bs4 import BeautifulSoup

- `requests`: This module allows you to send HTTP requests easily.

- `urllib.parse.urlparse`: This function from the `urllib.parse` module is used to parse URLs into their components.

- `BeautifulSoup`: This is a library for pulling data out of HTML and XML files. It's used for web scraping.

2. Defining `get_domain` Function:

def get_domain(url):

parsed_url = urlparse(url)

return parsed_url.netloc

This function takes a URL as input, parses it using `urlparse`, and returns the domain name extracted from the URL.

3. Defining `check_ssl` Function

def check_ssl(url):

try:

response = requests.get(url, timeout=5)

return response.url.startswith("https://")

except requests.exceptions.SSLError:

return False

This function checks if the given URL is served over HTTPS by attempting to make a request to the URL. If the URL starts with "https://", it returns `True`; otherwise, it returns `False`. It handles the case where SSL/TLS errors occur.

4. Defining `check_http_headers` Function:

def check_http_headers(url):

headers = requests.head(url).headers

security_headers = [

"Strict-Transport-Security",

"Content-Security-Policy",

"X-Content-Type-Options",

"X-Frame-Options",

"X-XSS-Protection",

]

missing_headers = [header for header in security_headers if header not in headers]

return missing_headers

This function sends a HEAD request to the URL to fetch only the headers without downloading the entire content. Then it checks for the presence of specific security-related headers in the response. If any of the headers are missing, it returns a list of the missing headers.

5. Defining `check_links` Function:

def check_links(url):

response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

links = [a['href'] for a in soup.find_all('a', href=True)]

return links

This function fetches the HTML content of the URL, parses it using BeautifulSoup, and extracts all the links (`<a>` tags with the `href` attribute) present on the page. It returns a list of these links.

6. Main Execution:

if __name__ == "__main__":

url = input("Enter the URL to perform security checks: ")

print(f"\nPerforming security checks for {url}...\n")

domain = get_domain(url)

print(f"Domain: {domain}")

ssl_result = check_ssl(url)

print(f"SSL/TLS: {'Secure' if ssl_result else 'Not Secure'}")

missing_headers = check_http_headers(url)

if missing_headers:

print(f"Missing Security Headers: {', '.join(missing_headers)}")

else:

print("All Security Headers present")

links = check_links(url)

print(f"\nFound links on the page: {', '.join(links)}")

This part of the code prompts the user to input a URL, then it performs various security checks using the defined functions, and finally prints the results. It displays the domain, SSL/TLS status, missing security headers (if any), and the links found on the page.

import requests
from urllib.parse import urlparse
from bs4 import BeautifulSoup

def get_domain(url):
    parsed_url = urlparse(url)
    return parsed_url.netloc

def check_ssl(url):
    try:
        response = requests.get(url, timeout=5)
        return response.url.startswith("https://")
    except requests.exceptions.SSLError:
        return False

def check_http_headers(url):
    headers = requests.head(url).headers

    security_headers = [
        "Strict-Transport-Security",
        "Content-Security-Policy",
        "X-Content-Type-Options",
        "X-Frame-Options",
        "X-XSS-Protection",
    ]

    missing_headers = [header for header in security_headers if header not in headers]

    return missing_headers

def check_links(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    links = [a['href'] for a in soup.find_all('a', href=True)]
    return links

if __name__ == "__main__":
    url = input("Enter the URL to perform security checks: ")

    print(f"\nPerforming security checks for {url}...\n")

    domain = get_domain(url)
    print(f"Domain: {domain}")

    ssl_result = check_ssl(url)
    print(f"SSL/TLS: {'Secure' if ssl_result else 'Not Secure'}")

    missing_headers = check_http_headers(url)
    if missing_headers:
        print(f"Missing Security Headers: {', '.join(missing_headers)}")
    else:
        print("All Security Headers present")

    links = check_links(url)
    print(f"\nFound links on the page: {', '.join(links)}")

Search This Blog

Genuinely My Thoughts

Featured

Solving Economic Crisis Without Work-From-Home: A Systems Approach to Resource Prioritization

Python Project URL Checker

Comments

Post a Comment

Popular Posts

What If India Loses this mindset of Reusing Things?

Polar Bear is Suffering to Find Land Here is Why?

Smarter move through technology revolution

Carbon and its Compounds

The Role of UX Design in Evolving Technology