Avatar billede fredand Forsker
05. november 2004 - 15:47 Der er 4 kommentarer og
1 løsning

How to use RegExp in Java?

Hello
I try to create a class that uses regexp in Java.
The function of the class is to get substrings that matches a pattern.

For example if I got this string:
"Link to <A HREF=\"overview-summary.html\">Non-frame version.</A> </NOFRAMES> Link to <A HREF=\"index.html\">Non-frame version.</A> </NOFRAMES>"

I would like to get all <A HR...</A> like substrings...
<A HREF=\"overview-summary.html\">Non-frame version.</A>
<A HREF=\"index.html\">Non-frame version.</A>
In this case I would like to get 2 substrings since it contains 2 links.

But I can not manage to create the correct pattern to sreach for.

I have tried with: "<A.*</A>"
But then I ended up with one substring starting from the first <A HREF and ended with the last </A>

So if any one could help me out with this it would be great
Best regards
Fredrik

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class TextManager
{
    public static int countAppearences(String in, String p)
    {
        Pattern pattern = Pattern.compile(p);
        Matcher matcher = pattern.matcher(in);

        int i = 0;
        while(matcher.find())
        {
            i++;
        }

        return i;
    }

    public static String[] getSubStringsForPattern(String in, String p)
    {
        String[] subStrings = new String[countAppearences(in, p)];

        int startIndex = 0;
        for(int i = 0; i < subStrings.length; i++)
        {
            Object[] objects = getSubStringForPattern(in, p, startIndex);
            subStrings[i] = (String)objects[0];
            startIndex = ((Integer)objects[1]).intValue();
        }

        return subStrings;
    }

    public static Object[] getSubStringForPattern(String in, String p, int fromIndex)
    {
        Object[] objects = new Object[2];
        Pattern pattern = Pattern.compile(p);
        Matcher matcher = pattern.matcher(in);
        matcher.find(fromIndex);
        int startIndex = matcher.start();
        String subString = matcher.group();

        //System.out.println("Start: " + startIndex + "  End: " + (startIndex + subString.length()));

        objects[0] = subString;
        objects[1] = new Integer(startIndex + subString.length());
        return objects;
    }

    public static void main(String[] args)
    {
        String searchString = "Link to <A HREF=\"overview-summary.html\">Non-frame version.</A> </NOFRAMES> Link to <A HREF=\"index.html\">Non-frame version.</A> </NOFRAMES>";

        int j = TextManager.countAppearences(searchString, "<A.*</A>");
        System.out.println("Count: " + j);

        String[] subStrings = TextManager.getSubStringsForPattern(searchString, "<A.*</A>");

        for(int i = 0; i < subStrings.length; i++)
        {
            System.out.println(subStrings[i]);
        }

    }
}
Avatar billede arne_v Ekspert
05. november 2004 - 15:59 #1
I have used the following regex for link parsing:

"(?:<a href=\")([^\"]*)(?:\">)(.*?)(?:</a>)"
Avatar billede arne_v Ekspert
05. november 2004 - 16:03 #2
import java.util.regex.*;

public class LinkParse {
    private static Pattern p = Pattern.compile("(?:<A HREF=\")([^\"]*)(?:\">)(.*?)(?:</A>)");

    public static void findLinks(String s) {
        Matcher m = p.matcher(s);
        while (m.find()) {
            System.out.println(m.group(1) + " " + m.group(2));
        }
    }

    public static void main(String[] args) throws Exception {
        String searchString = "Link to <A HREF=\"overview-summary.html\">Non-frame version.</A> </NOFRAMES> Link to <A HREF=\"index.html\">Non-frame version.</A> </NOFRAMES>";
        findLinks(searchString);
    }
}
Avatar billede arne_v Ekspert
05. november 2004 - 16:04 #3
Output:

overview-summary.html Non-frame version.
index.html Non-frame version.
Avatar billede fredand Forsker
05. november 2004 - 16:59 #4
Works perfect!

Give a svar so I can reward you mate!

/Fredrik
Avatar billede arne_v Ekspert
05. november 2004 - 18:05 #5
here it comes
Avatar billede Ny bruger Nybegynder

Din løsning...

Tilladte BB-code-tags: [b]fed[/b] [i]kursiv[/i] [u]understreget[/u] Web- og emailadresser omdannes automatisk til links. Der sættes "nofollow" på alle links.

Loading billede Opret Preview
Kategori
Kurser inden for grundlæggende programmering

Log ind eller opret profil

Hov!

For at kunne deltage på Computerworld Eksperten skal du være logget ind.

Det er heldigvis nemt at oprette en bruger: Det tager to minutter og du kan vælge at bruge enten e-mail, Facebook eller Google som login.

Du kan også logge ind via nedenstående tjenester