#1 2010-10-03 05:31:29

tknoeller
Member
Registered: 2007-02-27
Posts: 17

IMDB parser failing

Hi Tien,

Looks like the IMDB is starting to fail to parse on a couple of movies that I have tried recently.  I am still using v1.4.3, but installed a fresh v1.6.1 to see if an upgrade would help.  It did not.

The two movie examples I have are these.  (Don't judge. smile)

    Bride Wars - http://www.imdb.com/title/tt0901476/
    Four Rooms - http://www.imdb.com/title/tt0113101/

Both movies get a title that includes the year and the imdb name, they both get the year filled in, and they both get nothing else.

    Four Rooms (1995) - IMDb
    Bride Wars (2009) - IMDb

If the fix it easy, could you backport it to 1.4.3?   My centos5 box does not like the 1.6.1 noarch.rpm install.  And a source install fails on a GTK2 tooltips error when trying clicking on the "fetch" button when adding a new movie.  I am seriously thinking about moving to Ubuntu soon, which has a supported package.  But until I do, a backport to 1.4.3 would be really appreciated.

Please let me know if I can do any debugging to help figure out what the problem is.

Thanks!

Offline

 

#2 2010-10-07 23:25:19

vong
Member
Registered: 2008-09-24
Posts: 11

Re: IMDB parser failing

IMDB has recently updated their layout, so I am thinking that it is just the parsing engine needs to be updated.

The same thing is happening with the image fetcher (no longer works)

Code:

<a href="/media/rm2863176704/tt0147800"><img src="http://ia.media-imdb.com/images/M/MV5BMTU0OTA1MzQ1OF5BMl5BanBnXkFtZTcwMzQzNzkxMQ@@._V1._SY314_CR4,0,214,314_.jpg"
     height="314" width="214" 
     alt="10 Things I Hate About You Poster"
     title="10 Things I Hate About You Poster" /></a>

Also, when a pages is looked up and found, can it detect if there is a redirect and update the link accordingly

eg this search

Back to the Future part III
serach link: http://www.imdb.com/find?q=Back+to+the+ … t=on;mx=20
redirects to: http://www.imdb.com/title/tt0099088/

stored as:webPage="http://www.imdb.com/find?q=Back+to+the+Future+Part+III;tt=on;mx=20##IMDb"

As future movies are released with similar names, it may not auto redirect in the future.

Last edited by vong (2010-10-07 23:29:08)

Offline

 

#3 2010-10-11 19:11:03

groms
New member
Registered: 2010-10-11
Posts: 5

Re: IMDB parser failing

Hi,

I rewrote the IMDb fetcher to work with the new design.
Additionally I fixed/added these features:

Multiple directors separated by comma
Multiple countries separated by comma
Correct URL in case of redirection
Fetches Original Title

Just copy the GCImdb.pm into the lib/GCPlugins/GCfilms directory


Attachments:
Attachment Icon GCImdb.pm, Size: 12,306 bytes, Downloads: 926

Offline

 

#4 2010-10-12 14:28:53

vong
Member
Registered: 2008-09-24
Posts: 11

Re: IMDB parser failing

thanks for this smile  works like a charm

[edit]

Found a glitch (not really, but a feature cleanup)

http://www.imdb.com/title/tt0473022/

this has no image, but when you import it, it imports the "no image" image.

here is an example of a movie with no image:

http://www.imdb.com/title/tt0177422/

Last edited by vong (2010-10-12 15:07:24)

Offline

 

#5 2010-10-12 17:29:54

groms
New member
Registered: 2010-10-11
Posts: 5

Re: IMDB parser failing

Thanks for the bug report.
I'm glad to hear it works for you smile

Here is the fixed Version.


Attachments:
Attachment Icon GCImdb.pm, Size: 12,411 bytes, Downloads: 791

Offline

 

#6 2010-10-12 18:22:35

vong
Member
Registered: 2008-09-24
Posts: 11

Re: IMDB parser failing

Glad to help (as a programmer myself, I know how valuable feedback and testing is... just wish i had more free time to code).  this one works well smile

Thanks for programming it smile

Offline

 

#7 2010-10-12 20:13:21

groms
New member
Registered: 2010-10-11
Posts: 5

Re: IMDB parser failing

I have to admit I introduced a bug with descriptions in the previous version. Fixed it now.

Last edited by groms (2010-10-12 21:17:59)


Attachments:
Attachment Icon GCImdb.pm, Size: 12,866 bytes, Downloads: 1,402

Offline

 

#8 2010-10-13 14:20:38

vong
Member
Registered: 2008-09-24
Posts: 11

Re: IMDB parser failing

What was the bug?  just checking cause I used it a bunch and didnt notice it, but will take a look in my collection if something did happen (and then fix it)

thanks again for the update

Offline

 

#9 2010-10-13 15:07:19

Sam
New member
Registered: 2010-10-13
Posts: 2

Re: IMDB parser failing

Hi,

I have the same trouble with imdb parsing in my php code. smile

But did you have an another problem with language ?
Today for me : http://www.imdb.com/title/tt0104257/ title is in french
Ok I'm french smile
But for parsing page for US user when server are in french, this is a problem hmm

Did you know how to select title language ?
I try to change "Accept-Language" in http request without succes.
I hope they don't use IP to select language...

But I may be off topic for GCstar.. sorry tongue

Samuel

Offline

 

#10 2010-10-13 20:36:05

groms
New member
Registered: 2010-10-11
Posts: 5

Re: IMDB parser failing

@vong: In some cases the description was not fetched at all.

@Sam: I didn't have any problems with the language, because the http request is sent from the clients pc and not from the server as in your case.

Offline

 

#11 2010-10-14 14:16:07

Sam
New member
Registered: 2010-10-13
Posts: 2

Re: IMDB parser failing

thx groms

I find a solution with user agent :
$user_agent = "Wget/1.12 (darwin10.2.0)";

And now title are in English wink

Samuel

Offline

 

#12 2010-10-14 14:28:13

vong
Member
Registered: 2008-09-24
Posts: 11

Re: IMDB parser failing

@groms: Thanks - you are right.  It seems it is taking in the rating from imdb and storing it in Rating not Press Rating

simple chage:
                $self->{curInfo}->{rating} = int($origtext + 0.5);

to
                $self->{curInfo}->{ratingpress} = int($origtext + 0.5);

Offline

 

#13 2010-10-14 17:04:23

groms
New member
Registered: 2010-10-11
Posts: 5

Re: IMDB parser failing

@vong: Why should the imdb rating be stored in the Press Rating? The previous imdb fetcher (the one that does not work with the new design) didn't do that either.

Offline

 

#14 2010-10-15 19:00:49

Sathors
New member
Registered: 2010-08-26
Posts: 4

Re: IMDB parser failing

Well it doesn't work for me I don't have any field but the title and the year.
I have copied your GCImdb.pm in /usr/share/lib/ but it has changed nothing.

Thanks for your help and good night.

Offline

 

#15 2010-10-16 18:51:51

tknoeller
Member
Registered: 2007-02-27
Posts: 17

Re: IMDB parser failing

@groms:  Version of the IMDB parser in comment #7 works great.  Thank you for the fix.

@Tien:  Any chance of getting the @groms version of the IMDB parser into the mainline code release and perhaps added to the patches downloads for v1.4.3?

@Sathors: double check the directory you added it to.  For my release, it is in /usr/lib/ not /usr/share/lib/.  And make sure you set the same permissions as the previous file after moving the new file into place.

$ locate GCImdb.pm
/usr/lib/gcstar/GCPlugins/GCfilms/GCImdb.pm

Offline

 

#16 2010-10-16 19:29:20

Sathors
New member
Registered: 2010-08-26
Posts: 4

Re: IMDB parser failing

Thank you, I was in the wrong directory ^^.

Offline

 

#17 2010-10-20 20:17:07

Tian
Administrator
From: France
Registered: 2006-12-08
Posts: 1647
Website

Re: IMDB parser failing

Hello,

Thanks a lot for the fix groms. It's in SVN and in the update repository for 1.6.1.

The rating was previously stored in {rating} because {ratingpress} field didn't exist. Now this field is here, I also think it is better to use it. So I changed the plugin that way.

Offline

 

#18 2011-02-09 11:02:45

macks
Member
Registered: 2010-04-15
Posts: 16

Re: IMDB parser failing

Thanks for the fix!
Works perfeclty...except sometimes I get the German, not the original title...
I.e. for the movies "Lille Soldat" or "The Assassination of Jesse James by the Coward Robert Ford"

Thanks again!

Offline

 

#19 2011-02-24 04:24:12

sicklemoon
New member
Registered: 2011-02-24
Posts: 1

Re: IMDB parser failing

Thanks groms for updating the IMDb fetcher.  It's working for me but I'd like to make some comments.  Would it be possible to retrieve the full cast list?  I tried fetching information for "The Next Three Days" and it did not include 'Liam Neeson' because his name is not in the initial cast overview list on the IMDb page.  It would be very helpful to include the full cast list so Search function will be accurate.  Also, would it be possible for the IMDb fetcher to fill in the Language detail?  One more thing I noticed is that Genre retrieval is limited to 3 even though there are 4 different Genres listed in the IMDb page. Thanks again.

Last edited by sicklemoon (2011-02-24 04:26:03)

Offline

 

#20 2011-06-07 00:02:09

robertmf
Member
From: Telford, PA USA
Registered: 2011-06-06
Posts: 11
Website

Re: IMDB parser failing

groms wrote:

I have to admit I introduced a bug with descriptions in the previous version. Fixed it now.

Using Ubuntu linux 10.04 'lucid'; GCstar 1.5.0 in debian repository

Overwriting the existing imdb.pm with this version works for Ubuntu.

Note Ubuntu linux is using
/usr/share/gcstar/lib/GCPlugins/GCfilms/ for the .pm files.

Last edited by robertmf (2011-06-10 13:43:04)

Offline

 

#21 2011-06-10 13:42:10

robertmf
Member
From: Telford, PA USA
Registered: 2011-06-06
Posts: 11
Website

Re: IMDB parser failing

tknoeller wrote:

Hi Tien,

Looks like the IMDB is starting to fail to parse on a couple of movies that I have tried recently.  I am still using v1.4.3, but installed a fresh v1.6.1 to see if an upgrade would help.  It did not.

I'm using gcstar 1.5.0 on Ubuntu linux 10.04 'lucid'. 

With the updated imdb.pm I'm now getting imdb.com movie info fields with the exception of the SUMMARY field.  Apparently imdb.com has changed this fieldname to STORYLINE, which may be the problemo ?

Offline

 

#22 2011-10-07 15:47:57

caguiar
New member
Registered: 2011-10-07
Posts: 1

Re: IMDB parser failing

I have GCStar 1.6.2, and I have noticed that for at least the last few days, the rating is not being updated.
It used to update the "Press Rating" field but not any more???

Offline

 

#23 2011-10-08 10:39:01

macks
Member
Registered: 2010-04-15
Posts: 16

Re: IMDB parser failing

hey!
the plugin is still fetching the "local" title for me...

any hints why?

Offline

 

#24 2011-10-10 11:33:41

fandrew
New member
Registered: 2011-10-08
Posts: 2

Re: IMDB parser failing

IMDb tries to guess your locale even based on IP address. At first I thought the only way to intervene is by logging in (AFAIK not possible with current GCstar interface). However there is an easy way: they respect the "Accept-Language" HTTP request header!

Modify "GCstar\lib\gcstar\GCPlugins\GCfilms\GCImdb.pm" by adding:

Code:

        $self->{ua}->default_header('Accept-Language' => 'en-US');

in "sub new" after "bless ($self, $class);" (line 352 in mine).

Unfortunately custom options for plugins are not supported. Maybe we can have a "site language" field in "Preferences / Internet / Data import"? That would make sense.

Last edited by fandrew (2011-10-10 11:37:32)

Offline

 



Should you have a problem using GCstar, you can open a bug report or request some support on GCstar forums.