#1 2012-02-20 22:27:44

Akovia
Member
Registered: 2012-01-30
Posts: 13

Plugin Help

I'm missing something fundamental and am hoping someone could help explain this a little.

I am trying to parse out a list of values from the following code. There could be many entries like this.

Code:

<td class="description">
    <a href="animetitle,181,aswrgy,ghost_in_the_sh.html"> Ghost in the Shell</a>
        <i>
            (1995) 
            <b>Anime</b>
        </i>
        <br/>
    <a href="animetitle,1068,sgszbd,ghost_in_the_sh.html"> Ghost in the Shell Stand Alone Complex 1st GIG</a>
        <i>
            (2002) 
            <b>Anime</b>
        </i>
        <br/>
...

I would like to capture the first two values together Ghost in the Shell (1995). Using similar code I found in the same script I can capture the first values, but I can't figure out how to capture the year with it.

Fumbling around I found I could add another $self->{insideRelatedxxx} = 1; statement inside the start routine with a corresponding entry in the text routine that was identical in all but name, and each subsequent one would capture the next value.

ie..
'related' => ',Ghost in the Shell,Ghost in the Shell Stand Alone Complex 1st GIG
'relatedyear' => 'Official Site,(1995),(2002)
'relatedtype' => 'http://www.innocence-movie.jp/,Anime,Anime

As you can see I'm somehow capturing the Official Site link as well, even though it doesn't match my regex.

Code:

sub start
elsif ($tagname eq 'a')
{
        if ($attr->{href} =~ /animetitle.*\.html$/)
    {
        $self->{insideRelated} = 1;
        $self->{insideRelatedYear} = 1;
        $self->{insideRelatedType} = 1;
    }

sub text
elsif ($self->{insideRelated})
{
    $origtext =~ s/^\s//;
    $self->{curInfo}->{related} .= $origtext.',';
    $self->{insideRelated} = 0;
}
elsif ($self->{insideRelatedYear})
.....

I am really missing something on the logic here. I was thinking I was setting 3 separate true statements in the start routine and that the same information would be passed to each text routine that could be parsed out differently. (That's not what I really want to do, but it made me realize that I don't understand what's going on.) Instead it seems to be incrementing to the next tag with every start statement.

So I'm asking what would be the proper way to capture the first 2 values together in a comma separated list, and if someone wants to take the time to explain the logic behind it, I'd be most grateful.

Thanks

Offline

 

#2 2012-02-21 01:15:18

Akovia
Member
Registered: 2012-01-30
Posts: 13

Re: Plugin Help

Well I managed to hack my way to success.tongue

Code:

sub start
elsif (($tagname eq 'a') && ($attr->{href} =~ m/animetitle,[0-9]*,[a-z]*,[a-z0-9_]*\.html/))
{
$self->{insideRelated} = 1;
}

sub end
if ($tagname eq 'b')
{
    $self->{insideRelated} = 0;
}

sub text
elsif ($self->{insideRelated})
{
$self->{curInfo}->{related} .= $origtext;
$self->{curInfo}->{related} =~ s/(.*)(\(\d{4}\))(.*)/$1$2,/g;
$self->{curInfo}->{related} =~ s/(\w)(\()/$1 $2/g;
$self->{curInfo}->{related} =~ s/^\s//g;
$self->{curInfo}->{related} =~ s/\,\s/,/g;
print Data::Dumper->Dump([$self], [qw(self)]);
}

'related' => 'Ghost in the Shell (1995),Ghost in the Shell Stand Alone Complex 1st GIG (2002),Ghost in the Shell Stand Alone Complex 2nd GIG (2003),Kokaku Kidotai S.A.C. Solid State Society (2006),Ghost in the Shell / Kokaku Kidotai 2.0 (2008),',

I'm sure there's a perl coder somewhere rolling in his grave right now.

I understand it a little bit better, but it was still more trial and error than anything. I'd still appreciate anyone in the know that would be gracious/patient enough to guide me a little bit.

Now I can go work on my first problem.
http://forums.gcstar.org/viewtopic.php?pid=8494#p8494

Offline

 



Should you have a problem using GCstar, you can open a bug report or request some support on GCstar forums.