Welcome to Module 2 of the tutorial series: “Build an AI-powered documentation assistant with Flask & DeepSeek”. In this module, you’ll learn how to extract code from GitHub repositories, parse Python files, and prepare them for automated documentation generation.
Prerequisites
Before starting Module 2, ensure you’ve completed the following steps from Module 1:
- Set up the development environment and installed dependencies.
- Configured API keys (DeepSeek and GitHub) in the
.env
file. - Ran the Flask app and tested the
/generate-docstring
endpoint. - Reviewed the folder structure and key files.
Lesson 3: Fetching code from GitHub
Objective
In this lesson, you’ll use the GitHub API to retrieve repositories, extract Python files, and handle API rate limits and authentication.
Step 1: Install the requests
Library
Install the requests
library to interact with the GitHub API:
pip install requests
Step 2: Update github_api.py
Add logic to fetch repository contents, filter Python files, and handle rate limits.
Update: app/utils/github_api.py
to contain the following code:
import os import requests import time from dotenv import load_dotenv # Load environment variables load_dotenv() GITHUB_ACCESS_TOKEN = os.getenv("GITHUB_ACCESS_TOKEN") def fetch_repo_contents(owner, repo): """ Fetch the contents of a GitHub repository. """ url = f"https://api.github.com/repos/{owner}/{repo}/contents" headers = {"Authorization": f"token {GITHUB_ACCESS_TOKEN}"} response = requests.get(url, headers=headers) if response.status_code == 200: return response.json() else: raise Exception(f"Failed to fetch repository contents: {response.status_code}") def filter_python_files(contents): """ Filter out Python files from the repository contents. """ return [file for file in contents if file["name"].endswith(".py")] def download_file_contents(download_url): """ Download the raw content of a file from GitHub. """ headers = {"Authorization": f"token {GITHUB_ACCESS_TOKEN}"} response = requests.get(download_url, headers=headers) if response.status_code == 200: return response.text else: raise Exception(f"Failed to download file: {response.status_code}") def make_github_request(url): """ Make a GitHub API request with rate limit handling. """ headers = {"Authorization": f"token {GITHUB_ACCESS_TOKEN}"} while True: response = requests.get(url, headers=headers) if response.status_code == 200: return response.json() elif response.status_code == 403 and "rate limit" in response.text: reset_time = int(response.headers["X-RateLimit-Reset"]) sleep_time = max(reset_time - time.time(), 0) + 1 # Add 1 second buffer print(f"Rate limit exceeded. Sleeping for {sleep_time} seconds.") time.sleep(sleep_time) else: raise Exception(f"Failed to make request: {response.status_code}")
This script interacts with the GitHub API to fetch, filter, and download Python files from a given repository while handling rate limits.
- Load Environment Variables
The script imports necessary modules (os
,requests
,time
, anddotenv
). It then callsload_dotenv()
to load environment variables from a.env
file. It retrieves the GitHub access token from the environment usingos.getenv("GITHUB_ACCESS_TOKEN")
. - Fetch Repository Contents
Thefetch_repo_contents(owner, repo)
function constructs a GitHub API URL for fetching the repository’s file contents. It includes an authorization header with the access token. If GitHub returns a successful response (200
), the function parses and returns the JSON response. Otherwise, it raises an exception with the status code. - Filter Python Files
Thefilter_python_files(contents)
function iterates through the repository contents and selects only files ending in.py
, returning a list of Python files. - Download File Contents
Thedownload_file_contents(download_url)
function makes an authenticated request to download a file’s raw content from GitHub. If the request succeeds (200
), it returns the file’s text content. Otherwise, it raises an exception. - Handle API Rate Limits
Themake_github_request(url)
function makes a request to the GitHub API while handling rate limits. If the request succeeds (200
), it returns the JSON response. If the request fails due to rate limiting (403
), it calculates the wait time using theX-RateLimit-Reset
header and pauses execution before retrying. If the request fails for other reasons, it raises an exception.
This script ensures secure API access, filters relevant files, and handles GitHub’s rate limits efficiently.
Step 3: Update routes.py
Add a new route to fetch and display repository contents.
Update: app/routes.py
to contain the following code:
from flask import Blueprint, jsonify, request from .utils.docstring_generator import generate_docstring from app.utils.github_api import fetch_repo_contents, filter_python_files, download_file_contents main_bp = Blueprint('main', __name__) @main_bp.route('/') def home(): return "Welcome to the AI-Powered Documentation Assistant!" @main_bp.route('/generate-docstring', methods=['POST']) def generate_docstring_route(): code = request.json.get('code') if not code: return jsonify({"error": "No code provided"}), 400 try: docstring = generate_docstring(code) return jsonify({"docstring": docstring}) except Exception as e: return jsonify({"error": str(e)}), 500 @main_bp.route("/fetch-repo", methods=["POST"]) def fetch_repo(): """ Fetch and display Python files from a GitHub repository. """ data = request.json owner = data.get("owner") repo = data.get("repo") try: contents = fetch_repo_contents(owner, repo) python_files = filter_python_files(contents) return jsonify({"python_files": python_files}) except Exception as e: return jsonify({"error": str(e)}), 500
The fetch_repo()
function retrieves Python files from a specified GitHub repository and returns them as a JSON response.
- Receive Request Data
The function extracts the repository owner and name from the incoming JSON request usingrequest.json.get("owner")
andrequest.json.get("repo")
. - Fetch Repository Contents
It callsfetch_repo_contents(owner, repo)
to retrieve the contents of the specified GitHub repository. - Filter Python Files
It processes the fetched repository contents usingfilter_python_files(contents)
, which selects only files ending in.py
. - Return Response
If successful, the function returns a JSON response containing the list of Python files. If an error occurs, it catches the exception and returns an error message with a500
status code.
This function enables the Flask app to interact with the GitHub API and extract Python files from repositories dynamically.
Lesson 4: Parsing Code for Documentation
Objective
In this lesson, you’ll use Python’s Abstract Syntax Tree (AST) to analyze code, extract metadata, and handle common parsing errors.
Step 1: Update code_parser.py
Add logic to parse Python code and extract functions, classes, and metadata.
Update: app/utils/code_parser.py
to contain the following code:
import ast def parse_code(code): tree = ast.parse(code) functions = [node for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)] classes = [node for node in ast.walk(tree) if isinstance(node, ast.ClassDef)] return {"functions": functions, "classes": classes} def extract_functions_and_classes(code): """ Extract functions and classes from Python code using AST. """ tree = ast.parse(code) functions = [node for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)] classes = [node for node in ast.walk(tree) if isinstance(node, ast.ClassDef)] return functions, classes def extract_function_signature(func_node): """ Extract function signature (name, args, returns). """ args = [arg.arg for arg in func_node.args.args] returns = ast.unparse(func_node.returns) if func_node.returns else None return { "name": func_node.name, "args": args, "returns": returns } def extract_class_metadata(class_node): """ Extract class metadata (name, methods, docstring). """ methods = [node.name for node in ast.walk(class_node) if isinstance(node, ast.FunctionDef)] docstring = ast.get_docstring(class_node) return { "name": class_node.name, "methods": methods, "docstring": docstring }
This script analyzes Python code using the Abstract Syntax Tree (AST) module to extract functions, classes, and their metadata.
- Parse Code and Identify Functions and Classes
- The
parse_code(code)
function parses the given Python code into an AST. - It walks through the AST tree to collect function definitions (
ast.FunctionDef
) and class definitions (ast.ClassDef
). - It returns a dictionary containing lists of functions and classes.
- The
- Extract Functions and Classes
- The
extract_functions_and_classes(code)
function also parses the given Python code into an AST. - It extracts function and class definitions separately and returns them as two lists.
- The
- Extract Function Signature
- The
extract_function_signature(func_node)
function retrieves the function name, its arguments, and return type. - It extracts argument names from
func_node.args.args
. - If the function has a return type annotation, it un-parses it using
ast.unparse(func_node.returns)
. - It returns a dictionary containing the function name, argument list, and return type.
- The
- Extract Class Metadata
- The
extract_class_metadata(class_node)
function retrieves the class name, its methods, and its docstring. - It collects method names by walking through the class node and identifying
FunctionDef
nodes. - It extracts the class docstring using
ast.get_docstring(class_node)
. - It returns a dictionary containing the class name, a list of methods, and the docstring.
- The
This script enables Python code analysis by extracting structural information about functions and classes.
Step 2: Update routes.py
Add a new route to parse and display metadata from a Python file.
Update: app/routes.py
to contain the following code:
from flask import Blueprint, jsonify, request from .utils.docstring_generator import generate_docstring from app.utils.github_api import fetch_repo_contents, filter_python_files, download_file_contents from app.utils.code_parser import extract_functions_and_classes, extract_function_signature, extract_class_metadata main_bp = Blueprint('main', __name__) @main_bp.route('/') def home(): return "Welcome to the AI-Powered Documentation Assistant!" @main_bp.route('/generate-docstring', methods=['POST']) def generate_docstring_route(): code = request.json.get('code') if not code: return jsonify({"error": "No code provided"}), 400 try: docstring = generate_docstring(code) return jsonify({"docstring": docstring}) except Exception as e: return jsonify({"error": str(e)}), 500 @main_bp.route("/fetch-repo", methods=["POST"]) def fetch_repo(): """ Fetch and display Python files from a GitHub repository. """ data = request.json owner = data.get("owner") repo = data.get("repo") try: contents = fetch_repo_contents(owner, repo) python_files = filter_python_files(contents) return jsonify({"python_files": python_files}) except Exception as e: return jsonify({"error": str(e)}), 500 @main_bp.route("/parse-file", methods=["POST"]) def parse_file(): """ Parse a Python file and extract metadata. """ data = request.json download_url = data.get("download_url") try: code = download_file_contents(download_url) functions, classes = extract_functions_and_classes(code) function_metadata = [extract_function_signature(func) for func in functions] class_metadata = [extract_class_metadata(cls) for cls in classes] return jsonify({ "functions": function_metadata, "classes": class_metadata }) except Exception as e: return jsonify({"error": str(e)}), 500
The parse_file()
function retrieves a Python file from a given URL, analyzes its contents, and extracts metadata about its functions and classes.
- Receive Request Data
The function reads the JSON request body and extracts thedownload_url
parameter. - Download the File
It callsdownload_file_contents(download_url)
to fetch the raw content of the Python file. - Extract Functions and Classes
It processes the file content usingextract_functions_and_classes(code)
, which returns lists of function and class definitions. - Generate Function Metadata
It iterates over the extracted functions and callsextract_function_signature(func)
to retrieve each function’s name, arguments, and return type. - Generate Class Metadata
It iterates over the extracted classes and callsextract_class_metadata(cls)
to retrieve each class’s name, methods, and docstring. - Return the Metadata
The function returns a JSON response containing the extracted function and class metadata. If an error occurs, it catches the exception and returns an error message with a500
status code.
This function automates the process of analyzing Python files and extracting useful metadata for documentation or code analysis.
Step 3: Functional Testing
Test the /fetch-repo
Endpoint
- Start the Flask app:
python run.py
- Call the
/fetch-repo
endpoint:curl -X POST http://127.0.0.1:5000/fetch-repo \ -H "Content-Type: application/json" \ -d '{"owner": "henrymbuguak", "repo": "Shopping-Cart-Using-Django-2.0-and-Python-3.6"}'
Expected output:
{ "python_files": [ { "_links": { "git": "https://api.github.com/repos/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/git/blobs/a4aada7faa6c7ff51f6d1ca34947525097b62c1d", "html": "https://github.com/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/blob/master/manage.py", "self": "https://api.github.com/repos/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/contents/manage.py?ref=master" }, "download_url": "https://raw.githubusercontent.com/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/master/manage.py", "git_url": "https://api.github.com/repos/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/git/blobs/a4aada7faa6c7ff51f6d1ca34947525097b62c1d", "html_url": "https://github.com/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/blob/master/manage.py", "name": "manage.py", "path": "manage.py", "sha": "a4aada7faa6c7ff51f6d1ca34947525097b62c1d", "size": 538, "type": "file", "url": "https://api.github.com/repos/henrymbuguak/Shopping-Cart-Using-Django-2.0-and-Python-3.6/contents/manage.py?ref=master" } ] }
This output represents the extracted Python files from a GitHub repository. It contains metadata about a single Python file,
manage.py
, found in the repository.- File Identified
The"python_files"
key holds a list of detected Python files. In this case, the list contains one file:"manage.py"
. - File Metadata
"name"
: The file’s name is"manage.py"
, which is typically a Django project management script."path"
: The file is located at the repository’s root directory."size"
: The file is 538 bytes in size."sha"
: The SHA hash (a4aada7faa6c7ff51f6d1ca34947525097b62c1d
) uniquely identifies this file’s content in Git.
- URLs for Accessing the File
"download_url"
: The direct URL to download the raw file content."html_url"
: The GitHub web interface link to view the file."git_url"
: The GitHub API link to retrieve the file’s Git object."self"
: The API URL to fetch file details."_links"
: A dictionary containing various reference links, includinggit
,html
, andself
.
This output confirms that the script successfully fetched Python files from the repository and returned their details.
- File Identified
- Call the
/parse-file
endpoint:curl -X POST http://127.0.0.1:5000/parse-file \ -H "Content-Type: application/json" \ -d '{"download_url": "https://raw.githubusercontent.com/muvatech/Shopping-Cart-Using-Django-2.0-and-Python-3.6/refs/heads/master/cart/views.py"}'
Expected output:
{ "classes": [], "functions": [ { "args": [ "request", "product_id" ], "name": "cart_add", "returns": null }, { "args": [ "request", "product_id" ], "name": "cart_remove", "returns": null }, { "args": [ "request" ], "name": "cart_detail", "returns": null } ] }
This output represents the extracted function metadata from a Python file, showing details about functions but no classes.
- No Classes Found
The"classes"
key contains an empty list ([]
), indicating that the analyzed Python file does not define any classes. - Extracted Functions
The"functions"
key contains a list of dictionaries, each describing a function in the file. - Function Details
Each function entry includes:"name"
: The function’s name (e.g.,"cart_add"
,"cart_remove"
,"cart_detail"
)."args"
: A list of function parameters (e.g.,"request"
,"product_id"
)."returns"
: The function’s return type, which isnull
(indicating no explicit return annotation).
This output shows that the parsed file contains three functions related to managing a shopping cart, likely in a web application.
- No Classes Found
What You’ve Achieved
- You fetched and parsed a real-world repository.
- You built API endpoints to extract code and metadata.
- You tested the functionality to ensure it works as expected.
Full code for module 2
You can find the complete code for this tutorial in the GitHub repository.
Next Steps
- Proceed to Module 3: Learn to generate and improve docstrings using DeepSeek.
- Experiment: Fetch and parse more repositories to prepare for docstring generation.
- Join the Community: Share your progress and get feedback from other learners!
Facebook Comments