regex - Trying to match a partial name of file in text of file + additional text -


hi i'm trying match partial name of file in text of file + additional text.

basically i've got files named this:

pieceiwanttomatch_don't_care_about_this.txt 

and i'm trying match first 7 letters of file name plus string in file , i'm not having luck.

here's have far:

use strict; use warnings;  use file::path qw(make_path remove_tree);  $calls_dir = "ask/parsed/html/"; opendir(my $search_dir, $calls_dir) or die "$!\n"; @files = grep /\.txt$/i, readdir $search_dir; closedir $search_dir;  #print "got ", scalar @files, " files\n";  #my %seen = (); $file (@files) {    %seen         = ();   $current_file = $calls_dir . $file;   open $file, '<', $current_file or die "$file: $!\n";    while (<$file>) {      #if (/phone/i) {     chomp;      #if (/phone\s*(.*)\r?$/i) {     #if (/^phone\s*:\s*(.*)\r?$/i) {     #if (/contact\s*(.*)\r?$/i) {     #if (/^*(.*)team\s*(.*)\r?$/i) {      print substr(${file}, 0, 7);      if (/^(?=.* 'substr(${file}, 0, 7)')(?=.*management)/s) {        $seen{$1} = 1;        #print $file."\t"."$_\n";       #open $fh, '>', "ask/parsed/html2/"."${file}.parsed_for_contact_us.txt" or die $!;        make_path('ask/parsed/html2/');       open $fh, '>', "ask/parsed/html2/" . "${file}.parsed_for_management.txt" or die $!;       #open $fh, '>', "$_"."result".".txt" or die $!;        #$fh->print("$file\t$_\n");       $fh->print("$_\n");       print "$_\n";        #print "\t";       print "\n";       print "\t";        #print "$_\n";       #print "\t";       #print "\n";        foreach $addr (sort keys %seen) {        }     }   }    close $file; } 

here's example people at:

i think example of i'm trying do: file named nintendo_ask_parse.html. i'm trying use string nintendo file name string, game, find line in file , print file.

added 11-12-2014 here's more data requested few have kindly been helping me far. i'm running first script wrote pull urls files. here's script:

 use strict;  use warnings;  use lwp::simple;   $link1 = "http://www.ask.com/web?q=";  $link2 = "+video+game&qsrc=0&o=0&l=dir&qo=homepagesearchbox";  #my $link3 = "http://www.";  #my $link4 = "http://www.manta.com/search?          search_source=nav&pt=&search_location=burlingame+ca&search=";   open (my $fh2, "untitled.txt")  or die "could not open file";   while (my $row = <$fh2>) {  chomp $row;  print "$row\n";  $xml1 = $link1 . $row. $link2 ;  #my $xmla = $link3 . $row . ".com";  #my $xmlx = $link4 . $row;  mkdir 'ask', 0755;  $filename1 = "ask/".($row)."_"."ask".".html";  open $fh1, ">", $filename1 or die("could not open file. $!");   print $row;  $xml2 = $xml1;  print $xml1;  print "\n";  print $fh1 $xml2;    } 

============================================================================= after script runs html files based on # of entries in untitled.txt file, 1 per entry.

i have 4 example files, named activision_ask.html, apple_ask.html, atari_ask.html, nintendo_ask.html running script above. here contents 1 file activion_ask.html:

     answers      q&a community      advanced search             images      news      first video game invented      video game design      wii      video game designer career      video game companies      spider-man 3 video game      video game walkthroughs      video game statistics      call of duty 4      more answers      amazon.com results activision        source      activision publishing, inc. american video game publisher. founded on october 1,      1979 , world's first independent developer , distributor of video games gaming   consoles. first products cartridges atari 2600 video console system published july 1980 market , august 1981 international market (uk). activision 1 of largest video game publishers in world , top publisher 2... read more » go to: ask encyclopedia · images · videos browse article: history · studios · notable games published · upcoming games · references · source: wikipedia related questions:      •      video game publisher of loom?      •      developing games activision , have done in past? hear  handheld versions of game different console versions. care enlighten us?      •      game created "activision" "atari 2600". 4 players play @ 1 time. 1 it?      view more q&a »       www.giantbomb.com/activision/3010-78/       oct 9, 2014 ... activision largest third-party publisher in world. became first third- party developer video game consoles, , responsible ...        explore more answers       source: www.kgbanswers.com       · privacy · terms · careers · ask blog · q&a · mobile · · feedback © 2014 ask.com      **truncated 

=============================================================================

there's second script pulls out of links html file above , puts file. here's script:

=============================================================================

  use lib '/users/lialin/perl5/lib/perl5';           use strict; use warnings;           use feature 'say';      use file::slurp 'slurp';  # makes   easy read files.      use mojo;      use mojo::useragent;      use uri;      use file::path qw(make_path remove_tree);        #my $html_file = shift @argv; # take file command lin       $calls_dir = "ask/";      opendir(my $search_dir, $calls_dir) or die "$!\n";      @html_files = grep /\.html$/i, readdir $search_dir;      closedir $search_dir;      #print "got ", scalar @files, " files\n";       #my %seen = ();      foreach $html_files (@html_files) {         %seen = ();         $current_file = $calls_dir . $html_files;         open $file, '<', $current_file or die "$html_files: $!\n";       $dom = mojo::dom->new(scalar slurp $calls_dir .$html_files);      print $calls_dir .$html_files ;       #for $csshref ($dom->find('a[href]')->attr('href')->each) {      #for $link ($dom->find('a[href]')->attr('href')->each) {      #  print $1;      #say $1 #if $link->attr('href') =~ m{^https?://(.+?)/index\.php}s;      make_path('ask/parsed/html/');      open $fh, '>', "ask/parsed/html/${html_files}.result.txt" or die $!;      $csshref ($dom->find('a[href]')->attr('href')->each) {      $cssurl = uri->new($csshref)->abs($calls_dir .$html_files);       #open $fh, '>', "ask/${html_files}.result.txt" or die $!;      $fh->print("$html_files\n");      $fh->print("$cssurl\n");      #$fh->print("\t"."$_\n");      #print "$cssurl\n";      #print $file."\t"."$_\n";}} 

====================================================

the resulting files (using activision example again):

=============================================================================

    activision_ask.html      http://www.ask.com/answers/browse?     qsrc=167&q=activision+video+game&qo=channelnavigation&o=0&l=dir      activision_ask.html      http://www.ask.com/answers/browse?qsrc=167&q=activision+video+game&o=0&l=dir#opensignin      activision_ask.html      http://www.ask.com/answers/profile?qsrc=3099      activision_ask.html      http://www.ask.com/answers/profile?qsrc=3099      activision_ask.html      javascript:void(0);      activision_ask.html      http://www.ask.com/advancedsearch?     qsrc=167&q=activision+video+game&qo=channelnavigation&o=0&l=dir      activision_ask.html      http://www.ask.com/?o=0&l=dir&qsrc=14137      activision_ask.html      http://www.ask.com/pictures?q=activision+video+game&qsrc=167&qo=channelnavigation&o=0&l=dir      activision_ask.html      http://www.ask.com/news?q=activision+video+game&qsrc=167&qo=channelnavigation&o=0&l=dir      activision_ask.html      http://www.ask.com/youtube?q=activision+video+game&qsrc=167&qo=channelnavigation&o=0&l=dir      activision_ask.html      http://www.ask.com/shopping?q=activision+video+game&qsrc=167&qo=channelnavigation&o=0&l=dir      activision_ask.html      javascript:void(0);      activision_ask.html      http://www.ask.com/maps?q=activision+video+game&qsrc=167&qo=channelnavigation&o=0&l=dir      activision_ask.html      javascript:void(0);      activision_ask.html      http://www.ask.com/web?q=video+game+cheats&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=video+game+tester&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=create+your+own+video+games&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=first+video+game+invented&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=video+game+design&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=wii&qsrc=466&o=0&l=dir&qo=relatedsearchexpand      activision_ask.html      http://www.ask.com/web?q=video+game+designer+career&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=video+game+companies&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=spider-man+3+video+game&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=video+game+walkthroughs&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=video+game+statistics&qsrc=466&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/web?q=call+of+duty+4&qsrc=466&o=0&l=dir&qo=relatedsearchexpand      activision_ask.html      http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3daps&field-     keywords=activision&x=0&y=0&tag=askcom05-20      activision_ask.html      http://www.amazon.com/activision-anthology-playstation-  2/dp/b00006z7hq%3fpsc%3d1%26subscriptionid%3d06kmpshedsxxqmqvt482%26tag%3daskcom05-20%26linkcode%3dxm2%26camp%3d2025%26creative%3d165953%26creativeasin%3db00006z7hq activision_ask.html http://www.amazon.com/activision-anthology-playstation-2/dp/b00006z7hq%3fpsc%3d1%26subscriptionid%3d06kmpshedsxxqmqvt482%26tag%3daskcom05-20%26linkcode%3dxm2%26camp%3d2025%26creative%3d165953%26creativeasin%3db00006z7hq      activision_ask.html      http://www.amazon.com/destiny-xbox-360/dp/b002i096q4%3fpsc%3d1%26subscriptionid%3d06kmpshedsxxqmqvt482%26tag%3daskcom05-20%26linkcode%3dxm2%26camp%3d2025%26creative%3d165953%26creativeasin%3db002i096q4      activision_ask.html      http://www.amazon.com/destiny-xbox-360/dp/b002i096q4%3fpsc%3d1%26subscriptionid%3d06kmpshedsxxqmqvt482%26tag%3daskcom05-20%26linkcode%3dxm2%26camp%3d2025%26creative%3d165953%26creativeasin%3db002i096q4      activision_ask.html      http://www.amazon.com/skylanders-trap-team-not-machine-specific/dp/b00nca6zt0%3fpsc%3d1%26subscriptionid%3d06kmpshedsxxqmqvt482%26tag%3daskcom05-20%26linkcode%3dxm2%26camp%3d2025%26creative%3d165953%26creativeasin%3db00nca6zt0      activision_ask.html      http://www.amazon.com/skylanders-trap-team-not-machine-specific/dp/b00nca6zt0%3fpsc%3d1%26subscriptionid%3d06kmpshedsxxqmqvt482%26tag%3daskcom05-20%26linkcode%3dxm2%26camp%3d2025%26creative%3d165953%26creativeasin%3db00nca6zt0      activision_ask.html      http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3daps&field-keywords=activision&x=0&y=0&tag=askcom05-20      activision_ask.html      http://www.ask.com/wiki/activision      activision_ask.html      http://www.ask.com/wiki/activision      activision_ask.html      http://en.wikipedia.org/wiki/file:activision.svg      activision_ask.html      http://www.ask.com/allabout?q=video%20game%20publisher&qsrc=470      activision_ask.html      http://www.ask.com/allabout?q=video%20game%20console&qsrc=470      activision_ask.html      http://www.ask.com/allabout?q=atari%202600&qsrc=470      activision_ask.html      http://www.ask.com/wiki/activision      activision_ask.html      http://www.ask.com/wiki/activision#upcoming_games      activision_ask.html      http://www.ask.com/wiki/activision#references      activision_ask.html      http://en.wikipedia.org/wiki/activision      activision_ask.html      http://www.ask.com/web?q=who+was+the+video+game+publisher+of+loom%3f&qsrc=469&o=0&l=dir&qo=relatedquestions      activision_ask.html      http://www.ask.com/web?q=activision+video+game&qsrc=3060&o=0&l=dir      activision_ask.html      http://www.activision.com/      activision_ask.html      http://www.activision.com/games      activision_ask.html      http://clk.about.com?zi=13/1to&ity=boostorg&o=0&ldid=4451&eng=boost&zu=http://vgstrategies.about.com/od/gameboycheatscodes/a/activision-anthology.htm      http://www.gametrailers.com/company/pou3yf/activision      activision_ask.html      http://www.cnbc.com/id/102026893      activision_ask.html      http://www.giantbomb.com/activision/3010-78/      activision_ask.html      http://www.ask.com/web?q=history+of+video+game+systems&qsrc=467&o=0&l=dir&qo=relatedsearchnarrow      activision_ask.html      http://www.ask.com/mobile?&o=0&l=dir&qsrc=0      activision_ask.html      http://help.ask.com      activision_ask.html      http://feedback.ask.com 

============================================================================= i'm working on final script use part name of file , string read line or multiple lines file contain matching or close matching text.

in above example interested in 'http://www.activision.com/games' or url word 'activision' file name , word 'game' in it.

my file names in size , word game may come before or after file name.

i hope explanation , code helps others understand trying accomplish.

the problem have right the regex command searching strings. i'm working on making less strict , can't matching work properly.

as mentioned before i'm pretty versed in html , java know perl right language in , not expert (if @ code above) trying learn , complete task.

i'm not clear want do, given example file name

pieceiwanttomatch_don't_care_about_this.txt 

suppose want find files first 7 characters pieceiw end .txt write

if ( /^pieceiw.*\.txt$/ ) { ... } 

i hope helps


update

okay think want search .txt files in directory lines contain first n characters of file name other specified string.

if don't know appear first -- file name prefix or other string -- along right lines double look-ahead. 1 refinement enclose strings in \q...\e escapes non-word characters prevent regex metacharacters messing pattern.

note following

  • i've used autodie, explained in answer previous question. if you're running version of perl earlier v5.10 , can't upgrade won't able , have check status of each file operation separately

  • it's important use absolute paths directories; otherwise user has make sure have correct current working directory before running program

  • i've put parameters program -- 2 directories , additional string searched - definitions @ top of program

  • i've used glob instead of opendir / readdir / grep because it's tidier, , file names include full path

use strict; use warnings; use 5.010; use autodie;  use file::path qw/ make_path remove_tree /; use file::basename qw/ fileparse /;  $calls_dir  = '/path/to/ask/parsed/html'; $parsed_dir = '/path/to/ask/parsed/html2'; $wanted     = 'game';  @files = glob "$calls_dir/*.txt";  printf "got %d files\n", scalar @files;  $file (@files) {    open $in_fh, '<', $file;    $prefix = substr $file, 0, 8;   print $prefix, "\n";    $basename = fileparse($file);   make_path($parsed_dir);   open $out_fh, '>', "$parsed_dir/${basename}_parsed_for_management.txt";    while (<$in_fh>) {     print $out_fh $_ if / \q$prefix\e .* \q$wanted\e /x;   }    close $out_fh; } 

update

this works fine

my ($wanted, $prefix) = qw/ game nintendo /;  ( 'game.nintendo.com/phoenix.zhtml?c=121127&p=irol-gom' ) {   print "ok\n" if / \q$wanted\e .* \q$prefix\e /x; } 

output

ok 

Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -