Max_Headroom
23rd June 2009, 02:31 AM
Hi. I'm sending BIIIIG thanx to CyberMatt for IHG and you all for supporting it ;)
At the moment I'm using it to download HQ-celeb-pics.
Unfortunately I came to a hoster "pic-upload. de" that uses a temp-name for its image and an included real filename in the HTML.
I tried to write a script that grabs the short temp-name and downloads the image, but saves it with the real filename found in the comment field of the image.
This is the source of the line in the HTML-file:
<a href=" http://www3.pic-upload. de/18.06.09/7ui9oz.jpg" rel="lightbox" title="Franziska-van-Almsick-Premiere-zum-Film-Cars-07.09.2006.jpg"><img src="http://www3.pic-upload. de/18.06.09/7ui9oz.jpg" width="750" height="1212.93800539" id="thepic" alt="Klicken, um in Original Größe zu sehen." border="1" style="border-color: #d4d6d7;border-width:1px;" /></a>
Original URL: http://www.pic-upload. de/view-2357935/Franziska-van-Almsick-Premiere-zum-Film-Cars-07.09.2006.jpg.html
The important part is the first href-line, because it contains the filename on the server. The real filename is included in the title-tag.
Scanning through the included hosts of IHG I found out, that there's a retVal.fileName variable that seems to be used for this.
Is it possible to download the file using the href-tag and save it using the filename given in the title-tag ? ;)
I must admit... I'm a big "n00b" in the regex-world :P
This is my script that causes a lot of headaches:
// URL Pattern: ^http:\/\/www\.pic-upload\. de\/view.+\/.+\.html
//
// Using ONLY "ID: thepic" works. But it saves the image using the "href"-tag.
// TODO: Grab the title-tag and use it's name.
function(pageData, pageUrl) {
var retVal = new Object();
// Scan for picture-URL
var sPattern = pageData.match(/\"(http:\/\/www[0-9]\.pic-upload\. de)\"/);
// If not found
if (!sPattern) {
retVal.imgUrl = null;
retVal.status = "ABORT";
}
else {
retVal.imgUrl = sPattern[1];
retVal.status = "OK";
}
// This is the ID-value for the site - used to locate the image
var theId = "thepic";
// Scan for filename and use it.
// Else use a random filename.
var imgs = pageData.match(/rel="lightbox" title="(.+?)".+?>/);
try {
retVal.fileName = imgs[1] + ".jpg";
}
catch(e) {
retVal.fileName = Math.random(). toString(). substring(2) + ".jpg";
}
// Return array to ImageHost Grabber
return retVal;
}
Sorry, I had to include some spaces after the dots to skip the posting-rules :P
When I only include the "theid" tag in the scripts field, it loads the image - and all other images (eg. signatures) hosted there. But saves it under the server's filename. When I use my script, it tries to load the image by using the real filename. But this fails of course :(
using "theid" tag: http://server. org/img001_shortname.jpg
using the script: http://server. org/full_filename_in_the_title_tag_but_NOT_on_server.j pg
I bet I made some stuuuupid mistakes in the code, but maybe it serves as inspiration for a (hopefully) bugfree addition to the hosts-file :d
I need to get into JScript to fully understand what I did :whack0:
Previously I used WGet and some AutoIt scripts to automate the process of thread-sucking. But since I got IHG I switched over to this fantastic tool. Unfortunately my JScript-skills aren't nearly as good as my AU3 or PureBasic-skills ;) So I hope you can help me understanding a bit more about scripting IHG, so maybe I can contribute to the list with some unknown hosts.
Sincerly,
Max
At the moment I'm using it to download HQ-celeb-pics.
Unfortunately I came to a hoster "pic-upload. de" that uses a temp-name for its image and an included real filename in the HTML.
I tried to write a script that grabs the short temp-name and downloads the image, but saves it with the real filename found in the comment field of the image.
This is the source of the line in the HTML-file:
<a href=" http://www3.pic-upload. de/18.06.09/7ui9oz.jpg" rel="lightbox" title="Franziska-van-Almsick-Premiere-zum-Film-Cars-07.09.2006.jpg"><img src="http://www3.pic-upload. de/18.06.09/7ui9oz.jpg" width="750" height="1212.93800539" id="thepic" alt="Klicken, um in Original Größe zu sehen." border="1" style="border-color: #d4d6d7;border-width:1px;" /></a>
Original URL: http://www.pic-upload. de/view-2357935/Franziska-van-Almsick-Premiere-zum-Film-Cars-07.09.2006.jpg.html
The important part is the first href-line, because it contains the filename on the server. The real filename is included in the title-tag.
Scanning through the included hosts of IHG I found out, that there's a retVal.fileName variable that seems to be used for this.
Is it possible to download the file using the href-tag and save it using the filename given in the title-tag ? ;)
I must admit... I'm a big "n00b" in the regex-world :P
This is my script that causes a lot of headaches:
// URL Pattern: ^http:\/\/www\.pic-upload\. de\/view.+\/.+\.html
//
// Using ONLY "ID: thepic" works. But it saves the image using the "href"-tag.
// TODO: Grab the title-tag and use it's name.
function(pageData, pageUrl) {
var retVal = new Object();
// Scan for picture-URL
var sPattern = pageData.match(/\"(http:\/\/www[0-9]\.pic-upload\. de)\"/);
// If not found
if (!sPattern) {
retVal.imgUrl = null;
retVal.status = "ABORT";
}
else {
retVal.imgUrl = sPattern[1];
retVal.status = "OK";
}
// This is the ID-value for the site - used to locate the image
var theId = "thepic";
// Scan for filename and use it.
// Else use a random filename.
var imgs = pageData.match(/rel="lightbox" title="(.+?)".+?>/);
try {
retVal.fileName = imgs[1] + ".jpg";
}
catch(e) {
retVal.fileName = Math.random(). toString(). substring(2) + ".jpg";
}
// Return array to ImageHost Grabber
return retVal;
}
Sorry, I had to include some spaces after the dots to skip the posting-rules :P
When I only include the "theid" tag in the scripts field, it loads the image - and all other images (eg. signatures) hosted there. But saves it under the server's filename. When I use my script, it tries to load the image by using the real filename. But this fails of course :(
using "theid" tag: http://server. org/img001_shortname.jpg
using the script: http://server. org/full_filename_in_the_title_tag_but_NOT_on_server.j pg
I bet I made some stuuuupid mistakes in the code, but maybe it serves as inspiration for a (hopefully) bugfree addition to the hosts-file :d
I need to get into JScript to fully understand what I did :whack0:
Previously I used WGet and some AutoIt scripts to automate the process of thread-sucking. But since I got IHG I switched over to this fantastic tool. Unfortunately my JScript-skills aren't nearly as good as my AU3 or PureBasic-skills ;) So I hope you can help me understanding a bit more about scripting IHG, so maybe I can contribute to the list with some unknown hosts.
Sincerly,
Max