在Python中如何用正则表达式提取xml中的之间的内容

发布网友发布时间：2024-09-26 01:41

共3个回答

热心网友时间：2024-10-02 20:23

# 代码
html_text = '''
When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,
,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. 
(A) R1 cells were cultured for 5 days in the presence of
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic>
<xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). 
'''

pattern = r'(.*?)'
html_text = re.sub('\n', '', html_text)
text = re.findall(pattern, html_text)
print(text)# 输出
['When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the <xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. ',
'(A) R1 cells were cultured for 5 days in the presence of <xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic> <xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). ']

热心网友时间：2024-10-02 20:16

建议用python BeautifulSoup直接对xml进行解析吧，都不要正则匹配！

热心网友时间：2024-10-02 20:16

直接用python的库读XML不是更方便

在Python中如何用正则表达式提取xml中的&lt;p&gt;之间的内容

在Python中如何用正则表达式提取xml中的<p>之间的内容