Following the is the procedure to extract the attributes from the HTML tags.
Create regular expression to replace the certain pattern from the string.
- HTML tag consists of {<tag attribute="value">}, let's divide it in to the pattern.
- Every tag starts with <.
- Tag name defined in HTML, which is string from a to z as HTML is case-insensitive language so consider A to Z and for custom tags we will consider numeric values also like 0 to 9. So the Regular expression will be something like: ([a-zA-Z0-9]*).
- Now the after tag name we have either the attributes left or space with end of tag with '/' (eg. <br />), so next regex will be some thing like ([^>]*) which means any characters other than '>'.
- And in the last we have end of tag '>'.
- So final regular expression will be like as follows: (<(\/?[a-zA-Z0-9]*)([^>]*?)>).
You can test the different language regular expression on here.
0 comments:
Post a Comment