Thursday, March 31, 2011

How do I turn a table into a matrix?

If I got a table in a text file such like

  • A B 1
  • A C 2
  • A D 1
  • B A 3
  • C D 2
  • A E 1
  • E D 2
  • C B 2
  • . . .
  • . . .
  • . . .

And I got another symbol list in another text file. I want to transform this table into a Perl data structure like:

  • _ A D E . . .
  • A 0 1 1 . . .
  • D 1 0 2 . . .
  • E 1 2 0 . . .
  • . . . . . . .

But I only need some selected symbol, for example A, D and E are selected in the symbol text but B and C are not.

From stackoverflow
  • Use an array for the first one and a 2-dimentional hash for the second one. The first one should look roughly like:

    $list[0] # row 1 - the value is "A B 1"
    

    And the hash like:

    $hash{A}{A} # the intersection of A and A - the value is 0
    

    Figuring out how to implement a problem is about 75% of the mental battle for me. I'm not going to go into specifics about how to print the hash or the array, because that's easy and I'm also not entirely clear on how you want it printed or how much you want printed. But converting the array to the hash should look a bit like this:

    foreach (@list) {
      my ($letter1, $letter2, $value) = split(/ /);
      $hash{$letter1}{$letter2} = $value;
    }
    

    At least, I think that's what you're looking for. If you really want you could use a regular expression, but that's probably overkill for just extracting 3 values out of a string.

    EDIT: Of course, you could forgo the @list and just assemble the hash straight from the file. But that's your job to figure out, not mine.

    daotoad : If the hash keys are simple enough, you can use a single hash if you concatenate the keys. So $hash{AD} holds 1. Used appropriately, this approach can simplify your code. Used inappropriately, it gets you weird bugs and ugly code. This technique should only be used for very simple hash keys.
    : I tried to use a hash, but there are some problem when I print out the value. My code was post below(Answer4). Would you please take a look. Thanks! -Debbie
  • Another way to do this would be to make a two-dimensional array -

    my @fArray = ();
    ## Set the 0,0th element to "_"
    push @{$fArray[0]}, '_';
    
    ## Assuming that the first line is the range of characters to skip, e.g. BC
    chomp(my $skipExpr = <>);
    
    while(<>) {
        my ($xVar, $yVar, $val) = split;
    
        ## Skip this line if expression matches
        next if (/$skipExpr/);
    
        ## Check if these elements have already been added in your array
        checkExists($xVar);
        checkExists($yVar);
    
        ## Find their position 
        for my $i (1..$#fArray) {
         $xPos = $i if ($fArray[0][$i] eq $xVar);
         $yPos = $i if ($fArray[0][$i] eq $yVar);
        }
    
        ## Set the value 
        $fArray[$xPos][$yPos] = $fArray[$yPos][$xPos] = $val;
    }
    
    ## Print array
    for my $i (0..$#fArray) {
        for my $j (0..$#{$fArray[$i]}) {
         print "$fArray[$i][$j]", " ";
        }
        print "\n";
    }
    
    sub checkExists {
        ## Checks if the corresponding array element exists,
        ## else creates and initialises it.
        my $nElem = shift;
        my $found;
    
        $found = ($_ eq $nElem ? 1 : 0) for ( @{fArray[0]} );
    
        if( $found == 0 ) {
         ## Create its corresponding column
         push @{fArray[0]}, $nElem;
    
         ## and row entry.
         push @fArray, [$nElem];
    
         ## Get its array index
         my $newIndex = $#fArray;
    
         ## Initialise its corresponding column and rows with '_'
         ## this is done to enable easy output when printing the array
         for my $i (1..$#fArray) {
          $fArray[$newIndex][$i] = $fArray[$i][$newIndex] = '_';
         }
    
         ## Set the intersection cell value to 0
         $fArray[$newIndex][$newIndex] = 0;
        }
    }
    

    I am not too proud regarding the way I have handled references but bear with a beginner here (please leave your suggestions/changes in comments). The above mentioned hash method by Chris sounds a lot easier (not to mention a lot less typing).

    : If I save all required characters into a array @all_nodes. Can I change "next if (/$skipExpr/);" into "next unless (/@all_node/);"?
    muteW : he array @all_node will be interpolated with the help of the list separator variable - $" which is set to a space ' ' by default. So it could work but you'd need to set it to '' before using the array in the regular expression. local $" = ''; next unless(/@all_node/);
  • CPAN has many potentially useful suff. I use Data::Table for many purposes. Data::Pivot also looks promising, but I have never used it.

  • My source code was listed below.

    But the output file just look like

    • _ A D E....
    • A
    • D
    • E

    I think there would be some problem when returning the matrix value. Would anyone can help me?

    Thanks!

    =============================

    require "nodes.dump"; # it's a required symbol list. ex: A, D, E

    my $ppi_file = "PPI_files.txt"; # it's the table. ex: A B 1
    
    my @all_node = sort keys %nodes;
    

    open FILE,">distance_matrix.txt";

    print FILE "\t";

    foreach ( @all_node ){

        print FILE "$_\t";
    

    } #print first line of distance matrix>

    print FILE "\n";

    foreach my $a ( @all_node ){

        print FILE "$a\t";
    
        foreach my $b ( @all_node ){
                my $value = &search_distance_value($a,$b);
                print FILE "$value\t";
        }
        print FILE "\n";
    

    }

    close FILE;

        sub search_distance_value
        {
    
             my $a = $_[0];
             my $b = $_[1];
             my %ppi; 
    
        open TABLE, "<$ppi_file"; #table: A B 1, A D 2,...
    
        while ( $line = <TABLE> ){
                  chomp $line;
                  my ( $node1, $node2, $dist ) = split /\s+/, $line;
     $ppi{$node1}{$node2} = $dist;
        }
        if ( ( $node1 = $a ) && ( $node2 = $b ) ){
                return $dist;
        }
    

    }

    Chris Lutz : In &search_distance_value, $node1 and $node2 are declared in the scope of the while() block, but you're reusing them after the while() block is over. Try moving the if() statement inside the while() loop for starters.

0 comments:

Post a Comment