computers are unable to align entire human genome

its amazing how few people know this but computers cannot do an alignment of entire human genome against some other similarly sized animal genome (like some lizard that would be a very distant relative of humans, or maybe a mouse that is much closer relative than any lizard are)

human genome is 3 gigabytes, mouses are 2.7 gigabytes (and 21 chromosomes where as human has 24)

a computer cannot align two files that are that huge

alignment means a human has a gene sequence of lets say AAATTGGAA and a mouse has AAATCGGAA, a computer tries to align them and finds out for these parts TT is changed to TC but otherwise this piece was identical

then it is put into format
AAATTGGAA
AAATCGGAA
and computer continues to build it until it reaches full 3 gigabytes

the problem is no computer is able to do it, they run out of memory or if not, the task that started in 2016 is still ongoing with no end in sight..

however mitochondrion of animals is much smaller, in fact almost entire mitochondrion as written text (gene sequence) fits on a full screen in 1920x1080 resolution if font size is 10 (in this we presume you have hit enter to end a line of text when its about to reach notepads width so that a new line begins)

CRIME Shirt $21.68

Yakub: World's Greatest Dad Shirt $21.68

CRIME Shirt $21.68

  1. 2 months ago
    Anonymous

    Human and mouse mitochondrion can be compared very easily, even with computers from the year 2000 (it takes many minutes with 2000s computer but for modern computer its merely seconds)

  2. 2 months ago
    Anonymous

    Consequence of Man being created by God

    • 2 months ago
      Anonymous

      >compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes

      That's too many flops to multiply bucko. It'll never happen, kiddo.

      here is but a small part of source of the program used today

      float Pevo_full[]=
      {
      // - H E C
      1.00, 0.00, 0.00, 0.00,
      0.00, 0.94, 0.00, 0.04,
      0.00, 0.00, 0.92, 0.04,
      0.00, 0.06, 0.08, 0.92
      };

      //psipred accuracy for confidence values 0-9
      const float p_acc[]={0.00,0.47,0.53,0.56,0.58,0.62,0.69,0.74,0.82,0.88,0.96};

      /**
      * @brief
      */
      void
      SetBlosumMatrix(const float BlosumXX[])
      {
      int a,b,n=0;
      if (v>=3) printf("Using the BLOSUM%2i matrixn",par.matrix);
      for (a=0; a<20; ++a)
      for (pb[a]=0.0f, b=0; b<=a; ++b,++n)
      P[a][b] = BlosumXX[n];
      for (a=0; a<19; a++)
      for (b=a+1; b<20; ++b)
      P[a][b] = P[b][a];
      for (a=0; a<20; ++a) P[a][20]=P[20][a]=1.0f;
      return;
      }

      /////////////////////////////////////////////////////////////////////////////////////
      /**
      * @brief Set (global variable) substitution matrix with derived matrices and background frequencies
      */
      void
      SetSubstitutionMatrix()
      {
      int a,b;
      switch (par.matrix)
      {
      default:
      case 0: //Gonnet matrix
      if (v>=3) cout<<"Using the Gonnet matrix ";
      for (a=0; a<20; ++a)
      for (pb[a]=0.0f, b=0; b<20; ++b)
      P[a][b] = 0.000001f*Gonnet[a*20+b];
      for (a=0; a<20; ++a) P[a][20]=P[20][a]=1.0f;
      break;

      case 30: //BLOSUM30
      SetBlosumMatrix(Blosum30);
      break;
      case 40: //BLOSUM40
      SetBlosumMatrix(Blosum40);
      break;
      case 50: //BLOSUM50
      SetBlosumMatrix(Blosum50);
      break;
      case 65: //BLOSUM65
      SetBlosumMatrix(Blosum65);
      break;
      case 80: //BLOSUM80
      SetBlosumMatrix(Blosum80);
      break;
      }

      • 2 months ago
        Anonymous

        >Running an O(n^2) algorithm on a 3Gb dataset
        Gee why isn't it finishing.

        • 2 months ago
          Anonymous

          there are no better programs made

          I wish someone made one with x86 assembly thou

          that c++ program works fine when dataset is very much smaller than 3GB

  3. 2 months ago
    Anonymous

    the problem is: if we only take little parts of genome from here and there and compare a human piece to mouse piece, the chances are we dont get two pieces that were even doing the same thing in human and mouse, its like comparing apples to oranges

    what we actually need is computing power to compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes

    the animal tree of life you see on the internet is based on mitochondria data only because its the only data computers can handle

    • 2 months ago
      Anonymous

      >compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes

      That's too many flops to multiply bucko. It'll never happen, kiddo.

  4. 2 months ago
    Anonymous

    What if you use more than 3 + 2.7 GB of ram

    • 2 months ago
      Anonymous

      it doesnt work, 128GB of RAM was not enough

      >compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes

      That's too many flops to multiply bucko. It'll never happen, kiddo.

      maybe if someone is able to write a program in x86 ASSEMBLY that does these calculations, we may go somewhere with it

  5. 2 months ago
    Anonymous

    >mouses
    mice
    >human has 24
    23

    • 2 months ago
      Anonymous

      humans have 24 according to human genome projects database, 22 of these are non sexual by their nature

  6. 2 months ago
    Anonymous

    can't you use some kind of convolution or transform to make it smaller? The match has to be substantial anyway to avoid random matches, so why not shrink it in some way?

    • 2 months ago
      Anonymous

      It can be shrung considerably in length: transform the entire 3 gigabytes of ATCG:s into protein code

      Then it becomes 1 gigabyte file, because every 3 letters of nucleotiedes (ATCG) will produce one letter of protein.

      But now we have a new problem: proteins have 16 different letters so now instead of 4 nucleotides we have to deal with 16 different letters

      • 2 months ago
        Anonymous

        When would you know where to begin? You would have three possible offsets.

        What I had in mind was that you could let's convert AAAT into 3xA 0xC 0xG 1xT (or bigger) the same number of bits but less sympbols, then perhaps shrink it even more, and search for similar patterns.

        • 2 months ago
          Anonymous

          this could work but for some reason nobody has so far made such a program

    • 2 months ago
      Anonymous

      also if you are contemplating on something like "lets compare human X chromosome to mouse X chromosome only and forget about the rest for now" we run into the problem that maybe we dont know for 100% certainty where X chromosome begns and where it ends and if there is some hidden pieces of code elsewhere in the genome that should belong to X and which the DNA coding machinery of cells do get right but humans havent figured it out how the code is to be accessed

  7. 2 months ago
    Anonymous

    more effective algoritmer are needed
    Instead of just wasting computing power

    • 2 months ago
      Anonymous

      I agree

Your email address will not be published. Required fields are marked *