c# - Regular expression for replacing extra characters between markers -


suppose have sample text below:

;&nbsp; </span>&lt;year&gt;<o:p></o:p> </span>&lt;</span><span style=3d'font-size:9.0pt;mso-bidi-font-family:arial'>manufacturer&gt;</span><span                  style=3d'mso-bidi-font-family:arial'> </span>&lt;model&gt;<o:p> </span>&lt;<span class=3dspelle>serial_number</span>&gt;<o:p> </span>&lt;<span class=3dspelle>accessories_value</span>&gt;<o:p></o:p></span> </span>&lt;<span class=3dspelle>accessories_list</span>&gt; p;&nbsp; </span>&lt;<span class=3dspelle>worldwide_yn</span>&gt; </span>&lt;</b><span class=3dspelle><span style=3d'mso-no-proof:yes'>pet_name</span></span><span style=3d'mso-   no-proof:yes'>&gt;</span><o:p></o:p></p> 

i looking find , replace every occurrences of following pattern:

&lt; any_html_tags markers_text any_html_tags &gt;  

here :

html_tags: optional, may both opening , closing type, may 0 many times in numbers, there may html marker here.

markers_text: can in 1 of 2 formats either xxxxx (any no. of characters) or xxxx_xxxxxx (text can of length).

like want able find following texts in sample file:

1) &lt;year&gt; 2) &lt;</span><span style=3d'font-size:9.0pt;mso-bidi-font-family:arial'>manufacturer&gt; 3) &lt;model&gt; 4) &lt;<span class=3dspelle>serial_number</span>&gt; 5) &lt;<span class=3dspelle>accessories_value</span>&gt; 6) &lt;<span class=3dspelle>accessories_list</span>&gt; 7) &lt;<span class=3dspelle>worldwide_yn</span>&gt; 8) &lt;</b><span class=3dspelle><span style=3d'mso-no-proof:yes'>pet_name</span></span><span style=3d'mso-no-proof:yes'>&gt; 

and replace them corresponding items like:

1) &lt;year&gt; 2) </span><span style=3d'font-size:9.0pt;mso-bidi-font-family:arial'>&lt;manufacturer&gt; 3) &lt;model&gt; 4) <span class=3dspelle></span>&lt;serial_number&gt; 5) <span class=3dspelle></span>&lt;accessories_value&gt; 6) <span class=3dspelle></span>&lt;accessories_list&gt; 7) <span class=3dspelle></span>&lt;worldwide_yn&gt; 8) </b><span class=3dspelle><span style=3d'mso-no-proof:yes'></span></span><span style=3d'mso-no-proof:yes'>&lt;pet_name&gt; 

so want between &lt ; , &gt ; every tag except marker_text gets removed , come before &lt ; , doing using c# regex methods.

can please suggest proper regular expression achieve it?

final sample result should like:

;&nbsp; </span>&lt;year&gt;<o:p></o:p> </span></span><span style=3d'font-size:9.0pt;mso-bidi-font-family:arial'>&lt;manufacturer&gt;</span><span     style=3d'mso-bidi-font-family:arial'>  </span>&lt;model&gt;<o:p>  </span><span class=3dspelle></span>&lt;serial_number&gt;<o:p>  </span><span class=3dspelle></span>&lt;accessories_value&gt;<o:p></o:p></span>   </span><span class=3dspelle></span>&lt;accessories_list&gt;  p;&nbsp; </span><span class=3dspelle></span>&lt;worldwide_yn&gt; </b><span class=3dspelle><span style=3d'mso-no-proof:yes'></span></span><span style=3d'mso-no-  proof:yes'>&lt;pet_name&gt; 

this search/replace looking for:

pattern:

&lt;((?:</?span[^>]*>)*)(\w+)((?:</?span[^>]*>)*)&gt; 

replacement:

$1&lt;$2&gt;$3 

online demo(see "context tab")


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -