#1 2010-06-18 07:50:52

tenbaht
Member
Registered: 2010-06-14
Posts: 22

current state of the GCfilm plugins

I am still in the process of evaluating and fixing the GCfilm plugins. The table below shows the current state of my findings. The Plugins are grouped by language. For all languages except russian is now at least one usable plugin available. The Columns show if the corresponding property is fetched.

Key to the used columns:

'+': information avaliable and correctly fetched.
'o': information not avaliable for the plugin.
(o): too much information avaliable, plugin can not automaticly choose one.
'-': avaliable, but not fetched. Needs work.

act (Actors):
+  actor names ok, role names not listed on the website
+- actor names ok, but role name are not fetched. Needs work.
++ actor and role names ok

pics (Pictures):
four digits for small front, small back, big front, big back.

res (Result): overall state of the plugin from 0 (useless) to 10 (perfect)
0:    useless. Even the search does not work. Needs work.
3:    fetches some few properties, but not really useful. Needs work.
5:    the most important properties and pictures are ok. Needs work.
7:    useful, but room for further improvement. Maybe good enough, may needs some work.
9:    virtually perfect. Only minor details are missing (age rating, not all possible columns in the search window etc.). Good for release.
10:    perfect. All avaliable information is fetched and correctly presented. Good for release.

Code:

Plugin           lan tit org dat dir len cou age gen act syn rat pics  res checked
---------------  --- --- --- --- --- --- --- --- --- --- --- --- ----  ---
GCCsfd           CS   +   +   +   +   +   +   o   +   +   +   +  +ooo   10 17.06.10
GCAmazonDE       DE
GCOFDb           DE   +   +   +   +   o   +   o   +   +   +   +  +ooo   10 17.06.10
GCAllmovie       EN   +   o   +   +   +   +   +   +   o   +   +  +o+o   10+ 18.06.10
GCAmazon         EN
GCAniDB          EN   -   -   +   -   ?   o   o   -   -   -   -  -ooo    0 18.06.10
GCAnimeNfoA      EN   +   +   +   +   +   o   o   +   o   +   +  +ooo   10 18.06.10
GCDVDSpot        EN
GCFilmAffinityEN EN   +   +   +   +   +   +   o   +   +   +   +  +o+o   10 18.06.10 only result page 1 is listed
GCImdb           EN   +  (o)  +   +   +   +  (o)  +   ++  +   +  +o+o   10 17.06.10 use always original title, for the right age rating a country has to be configured.
GCMediadis       EN   +   +   +   +   +   +   o   +   +   +   +  +ooo    10 18.06.10
GCThemoviedb     EN   +   +   +   +   +   +   -   +   ++  +   +  ++++    9 17.06.10
GCAlpacineES     ES   +   +   +   +   +   +   o   +   +   +   +  +o-o    9 17.06.10
GCCulturalia     ES   +   +   +   +   +   +   -   +   +   +   o  +o+o    9 17.06.10
GCFilmAffinityES ES   +   +   +   +   +   +   o   +   +   +   +  +o+o   10 17.06.10 only result page 1 is listed
GCMetropoliES    ES                                                      0 17.06.10 search does not work.
GCAlapage        FR
GCAllocine       FR   +   +  (+)  +  (+)  +   -   +   ++  +   +  +o+o    9 17.06.10 Date and length in a strange format, should be converted.
GCAmazonFR       FR
GCAnimeka        FR
GCCinemaClock    FR                             0 suche geht nicht
GCCinemotions    FR
GCDVDFr          FR   +   +   +   +   +   +   +   +   +   +   o  +o+o   10 17.06.10
GCDVDPost        FR
GCMonsieurCinema FR
GCMovieClubFR    FR
GCMoviecovers    FR
GCOdeonHU        HU   +   +   +   +   +   +   o   +   +   +   +  +ooo    7 17.06.10 date is missing on the search page. Even date+original would be possible.
GCPortHU         HU   +   -   +   +   +   -   o   -   +   +   o  -ooo    3 17.06.10 Only result page 1 is listed. If a search is an exact match no result list is returned but the movie page - gcfilm cannot handle this.
GCFilmUP        *IT   +   +   +   +   +   +   o   +   +   +   o  +o-o    9
GCIbs            IT                                                      0 17.06.10 Website uses post requests now. Not working.
GCMovieClubNL    NL                             0
GCMovieMeter     NL   +   +   +   +   +   +   o   +   +   +   +  +o+o   10 17.06.10
GCAnimator       RU                                                      0
GCKinopoisk      RU   -   -   -   -   -   -   -   -   -   -   -  ----    0 search ok, movie page not.
GCNasheKino      RU                                                      0
GCFilmWeb        PL                             0
GCOnet           PL   +   +   +   +   +   +   +   +   +-  +   +  +o+o    9 17.06.10
GCStopklatka     PL                                                      0 17.06.10 never finds anything.
GCDicshop        SV   +   +   +   +   +   +   +   +   +   +   +  +o++   10 17.06.10
GCBeyazPerde     TR   +   +   +   +   +   +   o   +   ++  +   +  +ooo   10 17.06.10

It is still work in progress. Is there a place where I can post a more detailed changelog of the work done? Not the global on but a plugin changelog?

Offline

 

#2 2010-06-19 00:46:45

zombiepig
Moderator
Registered: 2007-08-30
Posts: 331

Re: current state of the GCfilm plugins

Thanks for this - I've made a wiki page at http://wiki.gcstar.org/en/plugin_status and copied this table into it. I added two columns to the table as well, referring to whether a plugin works ok with a drag-and-dropped url, and whether it's compatible with the new 'refresh' button.

For GCPortHU, you mention that if only one result is obtained the website directly returns the movie info. A few other plugins do that as well, eg IMDB. You'll see this part around line 174:

Code:

            if (($self->{inside}->{h1}) && ($origtext !~ m/IMDb\s*Title\s*Search/i))
            {
                $self->{parsingEnded} = 1;
                $self->{itemIdx} = 0;
                $self->{itemsList}[0]->{url} = $self->{loadedUrl};
            }

This basically tells gcstar that the page isn't a results page, and to treat it as the movie info page instead. For IMDB, this is achieved by checking the page heading to see if it doesn't include "Title Search".

I'm not sure if you've run across it yet, but http://wiki.gcstar.org/en/websites_plugins has some other useful info that might come in handy for you smile

Thanks heaps for all this work your doing, it's really appreciated and long overdue big_smile

Offline

 

#3 2010-06-19 02:38:16

tenbaht
Member
Registered: 2010-06-14
Posts: 22

Re: current state of the GCfilm plugins

Thank you for this hint! I noticed this code in some plugins and was not sure about the purpose. So I should add it to the others as well. one more column in the big table hmm I will update it as I continue the work.

Which behaviour do you expect for the refresh button? That only empty fields are filled and existing values are not touched? This is missing in most of them: Next big issue.

And what does dnd support mean for the code? Which input and which expected result?

Answers to these questions would be worth adding to the plugin information page as it covers only the basics so far.

Offline

 

#4 2010-06-19 04:33:50

zombiepig
Moderator
Registered: 2007-08-30
Posts: 331

Re: current state of the GCfilm plugins

Ok - for drag and drop support to correctly there's basically two parts that need to be fulfilled within the getItemUrl function.

Here's an example, from the boardgamegeek plugin:

Code:

    sub getItemUrl
    {
        my ($self, $url) = @_;
        
        if (!$url)
        {
            # If we're not passed a url, return a hint so that gcstar knows what type
            # of addresses this plugin handles
            $url = "http://www.boardgamegeek.com";
        }
        elsif (index($url,"xmlapi") < 0)
        {
            # Url isn't for the bgg api, so we need to find the game id
            # and return a url corresponding to the api page for this game       
            $url =~ /\/([0-9]+)[\/]*/;
            my $id = $1;
            $url = "http://www.boardgamegeek.com/xmlapi/boardgame/".$id;
        }
        return $url;
    }

So there's two parts to it. First, is that if getItemUrl is called without a url, the plugin needs to return a sample url showing the domain the plugin handles (in this case www.boardgamegeek.com). This is so when a url is drag and dropped, gcstar can correctly determine which plugin is able to handle that url. The second part is only applicable sometimes, mostly for plugins that use an api, so that the page with the details to parse is different to the page the user will drag-and-drop. For the example above, the bgg plugin checks to see if the url isn't for the xmlapi. If so, it extracts the game id from the url, and then returns a url for the actual page to parse. If those two conditions are met, drag and dropping should work fine. Mostly, for scraped pages, there's nothing really required here. Eg, for imdb, the function is only:

Code:

    sub getItemUrl
    {
        my ($self, $url) = @_;
        
        return $url if $url =~ /^http:/;
        return "http://www.imdb.com".$url;
    }

This is the main criteria for the "update" button to work (i'm going to change that string to "refresh" I think). It uses the same routines as the drag and drop code to change the stored item url to the url that the plugin needs to parse. So if drag-and-drop works, then refresh should work as well. That said, atm refresh is not working for any comic plugins, which is something I need to debug tongue

I'm after opinions on how refreshing information should function. Currently, it just grabs the information from the web and writes over anything the user has in the fetched fields. I'm toying with the idea of making it only change blank fields though. That way, if a user has manually changed information it won't get overwritten by hitting refresh, and if they wanted to grab the latest info from the site, they'd first clear out whatever fields they actually want to update. What do you think?

Offline

 

#5 2010-06-19 05:02:35

tenbaht
Member
Registered: 2010-06-14
Posts: 22

Re: current state of the GCfilm plugins

On updating/refreshing:

I think it should preserve manually entered/modified data. So overwriting the press rating is ok, all the rest is not. At least not without notice.

But what kind of notice? Maybe a window to choose which data to overwrite? Maybe useful, maybe annoying.

Or keep a 'modified' flag for every field? Cleared on automatically filled or manually deleted fields, set for every other manual change. You might think of the 'modified' flag as a kind of automatic write protection flag.

But what to do with existing data base files after a program update? On reading the old data base file for the first time simply mark every non-empty field as write protected (except for the press rating) and handle the flag the proper way for new entries? This way it data refresh for old entries will behave different then for new entries - might be confusing for the user. Who will remember after a few years which entries were made before a particular software update and which after?

I like the modified flag, but I am not sure if all users will understand the concept. The simple rule "no overwriting except for press rating" will be easier to understand for the non-technical.

KISS - keep it simple, stupid.

Offline

 

#6 2010-06-19 05:16:14

zombiepig
Moderator
Registered: 2007-08-30
Posts: 331

Re: current state of the GCfilm plugins

Strangely enough, having a modified flag was my original idea. I decided against it after more reflection though. If it was a hidden flag, it could get really confusing for the user as to why some fields are being updated and some not. And I couldn't think of an elegant, simple way of exposing this flag to the user without overloading the UI. I think only overwriting blank fields is probably the best compromise, but I agree with your thoughts about press rating, so we'll make that field the one exception. It shouldn't be too hard for me to change it to this behaviour.

Oh, one other thing I keep forgetting to tell you - don't worry about making any changes to any amazon plugins at the moment. I'm in the process of reworking them.

Offline

 

#7 2010-06-19 09:47:48

Fringale
Member
Registered: 2010-05-24
Posts: 15

Re: current state of the GCfilm plugins

zombiepig wrote:

Oh, one other thing I keep forgetting to tell you - don't worry about making any changes to any amazon plugins at the moment. I'm in the process of reworking them.

On that subject, do you intend to use the Amazon API, and are music plugins in the scope? So that I know if I should make further improvement to this. I pondered using the Amazon API as a more stable solution than HTML scraping, but since none of the available Amazon plugins worked that way and I'm not (yet) familiar with perl / GTK (as personal API keys would have to be stored in the options)... I chose not to.

If Amazon API is indeed the way to go, I'd gladly help with the music plugins.

Offline

 

#8 2010-06-19 10:42:47

zombiepig
Moderator
Registered: 2007-08-30
Posts: 331

Re: current state of the GCfilm plugins

Yeah, that's exactly what I'm trying to do. I'm sorry I didn't reply to your thread sooner, I was hoping to finish changing over the base amazon plugin code and then let you know how to update the music plugin to match. I'm just working out the details with Tian at the moment as to how best to approach this, and then it shouldn't take much longer to finish coding.

But once again, sorry for not replying sooner - your contributions are definitely appreciated smile

Offline

 

#9 2010-06-19 11:17:58

zombiepig
Moderator
Registered: 2007-08-30
Posts: 331

Re: current state of the GCfilm plugins

hey tenbaht - i noticed from the plugin list you were having trouble with a site which required cookies and was failing on the first request. I hit this once before in the past, and got around it with that site with a slightly hacky call in getSearchUrl:

Code:

    sub getSearchUrl
    {
        my ($self, $word) = @_;
 
        # Grab the home page first, or the pages fetched are blank (who knows why... must be something funky with the website)
        my $response = $ua->get('http://www.comicbookdb.com/');

        return "http://www.comicbookdb.com/search.php?form_search=$word&form_searchtype=Title";
    }

Maybe something like that might help?

Offline

 

#10 2010-06-19 11:52:38

tenbaht
Member
Registered: 2010-06-14
Posts: 22

Re: current state of the GCfilm plugins

Ok. Ugly, but it would solve the problem. A cleaner way would be a multi pass search like this:

  - normal search is done, returning the wrong page but setting the needed cookie.
  - parsing starts and realises that it is the wrong page. Set a flag on finish parsing.
  - main tries to evaluate the search results, but the flag shows that another pass is needed, so start over again.

The flag could be something like $self->{need_second_pass}=1 or a negative value for $self->{parsingEnded}.

Or, the most flexible approach, return a string value for $self->{parsingEnded} containing the new search URL.

what do you think?

Offline

 



Should you have a problem using GCstar, you can open a bug report or request some support on GCstar forums.