Python如何处理URL中的XML数据

来源：微信开发网作者：灯下变量头衔：程序员

导读：本期聚焦于小伙伴创作的《Python如何处理URL中的XML数据》，敬请观看详情，探索知识的价值。以下视频、文章将为您系统阐述其核心内容与价值。如果您觉得《Python如何处理URL中的XML数据》有用，将其分享出去将是对创作者最好的鼓励。

在Python开发中，处理URL中的XML数据是常见需求，通常流程包含发送HTTP请求获取XML内容、解析XML结构、提取目标数据三个核心步骤，整个过程可以借助标准库和常用第三方库高效完成。

准备工作

处理URL中的XML数据需要用到两个核心库，一个是用于发送HTTP请求的requests库，另一个是用于解析XML的内置xml.etree.ElementTree库。如果还没有安装requests库，可以通过pip命令安装：

pip install requests

发送请求获取XML数据

首先需要通过URL发送HTTP请求，获取返回的XML格式内容。使用requests库的get方法可以轻松完成这个操作，同时需要处理请求失败的情况。

import requests

def get_xml_from_url(url):
    try:
        # 发送GET请求，设置超时时间为10秒
        response = requests.get(url, timeout=10)
        # 检查请求是否成功，状态码200表示成功
        if response.status_code == 200:
            # 返回响应的文本内容，即XML数据
            return response.text
        else:
            print(f"请求失败，状态码：{response.status_code}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"请求发生异常：{e}")
        return None

# 示例URL，这里使用一个公开的XML示例地址，实际使用时替换为目标URL
xml_url = "http://ipipp.com/sample.xml"
xml_content = get_xml_from_url(xml_url)

解析XML数据

获取到XML字符串内容后，需要使用xml.etree.ElementTree库将其解析为可操作的元素树结构。解析过程需要注意XML的编码问题，通常requests返回的text会自动处理编码，但如果有特殊情况可以手动指定。

import xml.etree.ElementTree as ET

def parse_xml(xml_content):
    if not xml_content:
        return None
    try:
        # 将XML字符串解析为元素树
        root = ET.fromstring(xml_content)
        return root
    except ET.ParseError as e:
        print(f"XML解析失败：{e}")
        return None

# 解析之前获取的XML内容
xml_root = parse_xml(xml_content)

提取XML中的数据

解析得到根元素后，就可以通过元素的方法来遍历和提取需要的数据。常见的操作包括查找子元素、获取元素文本、读取元素属性等。

def extract_data_from_xml(root):
    if not root:
        return
    # 示例：假设XML结构为<bookstore><book><title>书名</title><price>价格</price></book></bookstore>
    # 查找所有book元素
    books = root.findall("book")
    for book in books:
        # 获取title子元素的文本
        title = book.find("title").text if book.find("title") is not None else "未知书名"
        # 获取price子元素的文本
        price = book.find("price").text if book.find("price") is not None else "未知价格"
        # 获取book元素的category属性
        category = book.get("category", "未分类")
        print(f"分类：{category}，书名：{title}，价格：{price}")

# 提取并打印数据
extract_data_from_xml(xml_root)

完整流程示例

将上面的步骤整合起来，就得到了完整的处理URL中XML数据的流程：

import requests
import xml.etree.ElementTree as ET

def get_xml_from_url(url):
    try:
        response = requests.get(url, timeout=10)
        if response.status_code == 200:
            return response.text
        else:
            print(f"请求失败，状态码：{response.status_code}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"请求发生异常：{e}")
        return None

def parse_xml(xml_content):
    if not xml_content:
        return None
    try:
        root = ET.fromstring(xml_content)
        return root
    except ET.ParseError as e:
        print(f"XML解析失败：{e}")
        return None

def extract_data_from_xml(root):
    if not root:
        return
    books = root.findall("book")
    for book in books:
        title = book.find("title").text if book.find("title") is not None else "未知书名"
        price = book.find("price").text if book.find("price") is not None else "未知价格"
        category = book.get("category", "未分类")
        print(f"分类：{category}，书名：{title}，价格：{price}")

if __name__ == "__main__":
    # 替换为实际的XML数据URL
    target_url = "http://ipipp.com/sample.xml"
    xml_content = get_xml_from_url(target_url)
    xml_root = parse_xml(xml_content)
    extract_data_from_xml(xml_root)

常见注意事项

请求URL时需要设置合理的超时时间，避免请求长时间阻塞。
XML解析时要处理格式错误的情况，避免程序崩溃。
如果XML数据量较大，可以考虑使用迭代解析的方式，减少内存占用。
注意目标URL的访问权限，部分URL可能需要添加请求头才能正常获取内容。

添加请求头获取内容

有些URL需要特定的请求头才能返回正确的XML数据，可以通过给requests.get方法添加headers参数解决：

def get_xml_from_url_with_headers(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    try:
        response = requests.get(url, headers=headers, timeout=10)
        if response.status_code == 200:
            return response.text
        else:
            print(f"请求失败，状态码：{response.status_code}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"请求发生异常：{e}")
        return None

Python XML URL requests xml_etree_ElementTree修改时间：2026-06-14 11:21:24

免责声明：已尽一切努力确保本网站所含信息的准确性。网站内容多为原创整理与精心编撰，观点力求客观中立。本站旨在免费分享，内容仅供个人学习、研究或参考使用。若引用了第三方作品，版权归原作者所有。如内容涉及您的权益，请联系我们处理。