在Python中如何用正则表达式提取xml中的<p>之间的内容
发布网友
发布时间:2024-09-26 01:41
我来回答
共3个回答
热心网友
时间:2024-10-02 20:23
# 代码
html_text = '''
<p>When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,
,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. </p>
<p>(A) R1 cells were cultured for 5 days in the presence of
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic>
<xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). </p>
'''
pattern = r'(<p>.*?</p>)'
html_text = re.sub('\n', '', html_text)
text = re.findall(pattern, html_text)
print(text)# 输出
['<p>When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the <xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. </p>',
'<p>(A) R1 cells were cultured for 5 days in the presence of <xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic> <xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). </p>']
热心网友
时间:2024-10-02 20:16
建议用python BeautifulSoup直接对xml进行解析吧,都不要正则匹配!
热心网友
时间:2024-10-02 20:16
直接用python的库读XML不是更方便