Gene coding. Coding and implementation of biological information in the cell. Genetic code. DNA and protein code system. Use of knowledge in medicine and genetics

Nikitin A.V.

Challenges in Understanding the DNA Coding System


Yes, I must admit that I was wrong. Biologists are concerned about the coding of DNA information. Even more. And there is a technocratic approach to this problem. It may not be exactly what I wanted, but... there is an interest in finding the truth. And this is the main point.

Petr Petrovich Garyaev sent me his latest monograph for study and understanding, for which special thanks to him.

But along with new information, new questions arose. I will try to talk about some of them in this article.

We write two, one in our minds...

We have already noted the fuzzy following of triplets during protein translation. P.P. Garyaev is also exploring the same question. Here is a visible contradiction:

“The accuracy of encoding amino acid sequences of proteins in this model strangely coexists with the double degeneracy of the proposed “code” along the lines of excess transfer RNA (tRNA) compared to the number of amino acids and ambiguous codon-anticodon correspondence, when only two (and not three) nucleotides of mRNA triplets precise pairing with the anticodon pair of tRNA nucleotides is necessary, and at the third nucleotide nature allows incorrect pairing, the so-called “wobble” (from the English word “wobble” - swinging) according to F. Crick’s hypothesis. This means that some anticodons can “recognize” more than one codon depending on which base is at the 1st position of the anticodon, corresponding to the 3rd position of the nucleotide, given their antiparallel complementary interaction. “Recognition” of this kind is “wrong” if we follow the paradigm of the genetic code, since non-canonical base pairs “Adenine-Guanine”, “Uracil-Cytosine” and others with energetically unfavorable hydrogen bonds arise. The “code,” especially the mitochondrial one, becomes so degenerate, and the logically following arbitrariness of the inclusion of amino acids in the peptide chain is so great that the very concept of genetic coding seems to disappear.”


The question is posed:

“The accuracy of protein synthesis is evolutionarily conservative and high, but can it be achieved by this kind of “secret writing”, when the “sign” (codon) and the “designated” (amino acid) are not always isomorphic, not unambiguous? If we adhere to the old dogma of the genetic code, it is logical to think that two different amino acids, encrypted by two identical (the third is not important) nucleotides of the mRNA codons, will be equally likely to be included in the peptide chain, i.e. accidentally. And there are six such paired ambiguities even in the non-mitochondrial code, not counting two more at stop codons (they are also “nonsense” or meaningless). So, is there an “indulgence of permission” for frequent and random amino acid substitutions during protein synthesis? However, it is known that such random substitutions in most cases have the most negative consequences for the body (sickle anemia, thalassemia, etc.). There is an obvious contradiction: accuracy (unambiguity) of the “sign-signified” (codon-amino acid) relationship is needed, but the code invented by people does not provide it.”

Explanation of the essence of the contradictions and the proposed solution:

“It can be seen that pairs of different amino acids are encrypted by identical significant doublets of codon nucleotides (“wobbling” little significant, according to Crick, and generally unreadable, according to Lagerkvist, nucleotides are shifted to the index). In linguistic terms, this phenomenon is called homonymy, when the same words have different meanings (for example, the Russian words “bow”, “braid” or the English “box”, “ring”, etc.). On the other hand, redundant different codons designating the same amino acids have long been considered synonymous.”

“...For greater illustrative purposes, we present a table of the genetic code presented by Lagerquist and rearranged by him into codon families, focusing on the first two working nucleotides:

From Table 1. It can be seen that the same amino acid can be encoded by four codon families. For example, the CU family four encodes leucine. The four of the GU family encode valine, UC – serine, CC – proline, AC – tryptophan, GC – alanine, CG – arginine, GG – glycine. This is a fact of degeneracy lying on the surface, and immediately noticed, i.e. information redundancy of the code. If we borrow the concepts and terms of linguistics for the protein code, which has long been universally and easily accepted, then the degeneracy of the code can be understood as synonymy. This was also unanimously adopted. In other words, the same object, for example, an amino acid, has several codes - codons. Synonymy does not pose any danger to the accuracy of protein biosynthesis. On the contrary, such redundancy is good because it increases the reliability of the translational ribosomal “machine”.”

I added a little color variation to the table to make it clear what we're talking about. Synonymous fours are highlighted in yellow. There are 8 such fours in total. Homonymous fours had to be divided into three categories, according to the degree of diversity. Further:

“... However, Table 1 also shows another, fundamental, genolinguistic phenomenon, seemingly unnoticed or ignored. This phenomenon is revealed in the fact that in some codon families, four codons, or more precisely, their significant identical twos of nucleotides, encrypt not one, but two different amino acids, as well as stop codons. Thus, the doublet UU family encodes phenylalanine and leucine, AU – isoleucine and methionine, UA – tyrosine, Och and Amb stop codons, CA – histidine and glycine, AA – asparagine and lysine, GA – aspartic and glutamine, UG – cysteine, tryptophan and Umb stop codon, AG – serine and arginine. Continuing linguistic analogies, let's call this phenomenon HOMONYMY of the first two coding nucleotides in some codon families.

Unlike synonymy, homonymy is potentially dangerous, as Lagerkvist noted, although he did not introduce the term-concept of “homonymy” as applied to the protein code. This situation, it seems, should really lead to ambiguity in the coding of amino acids and stop signals: the same codon doublet, within some families identified by Lagerquist, encodes two different amino acids or is “different stop”.

It is fundamentally important to understand: if code synonymy is a blessing (excess information), then homonymy is a potential evil (uncertainty, ambiguity of information). But this is an imaginary evil, since the protein synthesizing apparatus easily bypasses this difficulty, which will be discussed below. If you automatically follow the table (model) of the genetic code, then evil becomes not imaginary, but real. And then it is obvious that the homonymous code vector leads to errors in protein synthesis, since the ribosomal protein synthesizing apparatus, each time encountering one or another homonymous doublet and guided by the “two out of three” reading rule, must select one and only one amino acid from two different ones, but encoded by ambiguously identical homonym doublets.

Consequently, the 3'-nucleotides in codons and the 5'-nucleotides in anticodons paired with them do not have a gene-sign character and play the role of “steric crutches” filling the “empty spaces” in codon-anticodon pairs. In short, the 5'-nucleotides in anticodons are random, “wobble” - from the English “wobble” (swing, oscillation, wobble). This is the essence of the Wobble hypothesis.”

The essence is stated quite clearly. No translation required. The problem is clear.

Stop codons and start codons, they are highlighted in bold in the table, also do not always work unambiguously, but depending on something..., as biologists believe, on the context.

“Let us continue our analysis of the seminal work of Crick and Nirenberg, which postulates the concept of the genetic code.

P.142 -143: “... so far all experimental data have been in good agreement with the general assumption that information is read in triplets of bases, starting from one end of the gene. However, we would get the same results if the information was read in groups of four or even more bases” or “...groups containing a multiple of three bases.” This position is almost forgotten or not understood, but it is here that doubt is visible whether the code is necessarily triplet. And no less important, it predicts a future understanding of DNA and RNA texts as semantic fractal formations akin to natural languages, as demonstrated in our research.”

With 4 different bases in the DNA code system, reading groups can only be 3 or 4 bases long. 4 bases when read in pairs give only 16 possible combinations. Lacks. But how many: 3 or 4 bases in the reading group is impossible to establish mathematically. Because all possible combinations will be used one way or another. Or 64 for a triplet, or 256 for a tetraplet.

By increasing the code reading area by “groups containing a multiple of three bases,” the number of possible code combinations will increase unlimitedly. Just what does this give us? If you focus on the coding of amino acids, then... nothing. And this is in no way compatible with the doublet approach of biologists.

But, most importantly, in this quote for the first time, although implicitly, a “reading zone” of information that does not correspond to the triplet appeared. A triplet is one thing, but a reading zone is another. And one may not coincide with the other. Very important note.

In fact, the swing theory proposes that only the first two bases are considered the codon reading zone. Those. in this case, it is proposed to recognize that the reading area is smaller than the encoding area.

Now let's consider the reverse approach:

“Some mRNAs contain signals to change the reading frame. Some mRNAs contain stop codons in the translated region, but these codons are successfully bypassed by changing the reading frame before or directly on them. The frame can shift by -1, +1 and + 2. There are special signals in mRNA that change the reading frame. Thus, a translation frameshift of -1 on retroviral RNA occurs at a specific heptanucleotide sequence in front of the hairpin structure in the mRNA (Fig. 5c). For a +1 frameshift on the bacterial termination factor RF-2 mRNA, the nucleotide sequence at the shift site (UGA codon), the subsequent codon, and the preceding sequence complementary to the 3"-terminal sequence of the ribosomal RNA (analogous to the Shine-Dalgarno sequence) are important. (Fig. 5, d)".

The quote has already been given earlier, but now let’s look at its content more carefully. What is meant by the term reading frame? This concept comes from the hoary antiquity of computer technology, when the area for reading information from a punched tape or punched card was limited by an opaque frame in order to reduce the risk of errors when reading information with a light flux onto a photodetector through holes in the card or tape, marking lines knocked out in the right places. The principle of reading is long gone, but the term remains. Since the concept of a reading frame is clear to all biologists, it apparently means the reading zone of only one base from a triplet. And by “reading frame shift” we must understand that at +1, the base following the last element of the triplet is read, and -1, that the base before the first element of the same triplet is read. Which base pair remains the basis in the read triplet? This is not specified...

But it seems that not everyone understands the reading frame, as in this case. If the concept of a reading frame is understood as a frame delimiting 3 bases, then with a shift of +2, 1 element remains from the readable triplet, and two from the neighboring one.

So what reading frame are we talking about? Well, yes, okay, let it remain unclear for now...

But in any case, then these bases, already read by the frame, will be read again when the frame returns to its place and the ribosome moves on to reading the next triplet... but what about the non-overlapping code?

In this case, biologists' mechanistic approach to estimating changes in triplet readout positions does not take into account the actual size of what they are talking about. The terminology is clearly misleading. How they themselves figure it out later is unclear. Obviously, no “frame” is moving anywhere...

The selection of the required positions in the reading area moves. And if we add the maximum reading frame shifts listed above with the length of the readable codon, we get: 2+3+2 = 7. Thus, the total width of the ribosome reading zone is already 7 bases. The ribosome selects a triplet from 7 possible bases. How? This is another question...

But something else is more important to us. Now we can really estimate that the zone for reading information from RNA can be larger than a triplet and consist of 7 or more bases, while only three bases are fixed as the necessary reading positions. What are the other positions? Perhaps this is the very “context” that changes the options for reading the triplet. Homonemic, according to the terminology of P.P. Garyaev.

Of course, this is only one of many special cases of understanding the multifaceted concept of context. But... at least it allows us to understand something without resorting to higher philosophical generalizations. At a very real level of mechanistic understanding.

About the alphabet of cellular texts.

The question is, of course, interesting...

The understanding of DNA bases as letters of some cellular alphabet has been adopted by biologists for a long time. Hence the emergence of the concept of semantic context in the assessment of triplet coding, and the search for a meaningful approach of the cell to this coding, and the gradual transition to the Higher Mind, which wrote this Book of Life...

Only now, with the exact indication of the letters of this alphabet, disagreements arise all the time. What are the letters? Bases (A, T, C, G), codons composed of them, or amino acids in the composition of the protein obtained during translation?

There are 4 bases, 20 amino acids, 64 codons, what should we take as a basis?

Everyone talks about the need for linguistic evaluation of sequences of DNA, RNA and protein molecules, regardless of their understanding of the letters of the cellular alphabet. Biologists are required to approach DNA information as a semantic text with an understanding of the context applicable for literary evaluation. Thus, it is assumed that the language under study has all the attributes of a developed literary language and an appropriate approach is needed to assess its multi-semantic information content.

Wonderful. And yet, where are the letters? How was this literary text written that requires such close attention from linguists? So far, within the framework of the same mechanistic approach...

Bases or nucleotides? Looks like no. The majority of biologists agree with this. 4 reasons for creating a literary text are not enough. Moreover, in the presence of sequence continuity throughout the DNA.

With the codon, as a letter of this alphabet, difficulties arise immediately. Where is it, this codon, on DNA and RNA, how to find it? Only a ribosome can do this, and then only through direct contact. And what kind of compound letters are these, from triplets? Difficult to understand. Nevertheless, this understanding of codons as letters of the cellular alphabet has many supporters.

Mistake amino acids for letters of the alphabet? Yes, the majority agrees with this. But then protein, not DNA, becomes the Book of Life. In a protein there is a semantic context, but in DNA, it turns out, maybe not? Or it will be, but different, different from protein...

And therefore, there is a requirement to evaluate both DNA and protein from the standpoint of semantic context, but there is no clarification of what and how to evaluate.

In this situation, P.P. Garyaev proposed, including linguistically, to evaluate not DNA and protein, but their holographic three-dimensional “portraits.” A very strong position, I must admit. And very productive...

But with the alphabet of the cell, with a mechanistic, already familiar approach, then it is completely incomprehensible. Does he exist, or does he not exist at all, and is this concept only an allegory?

Biologists do not give clarifications. But they stubbornly continue to apply this concept. Everyone has their own understanding...

About the original coding system.

It is about the original one, which was perhaps at the stage of division of cells into prokaryotes and eukaryotes. Now it is hidden by numerous overlaps and deviations in both. Millions of years of evolution have not passed without a trace.

But still…

DNA was not always a repository of information; previously, RNA could play this role. It completely replaced protein at some stage. Numerous studies show this. And DNA and RNA bases were not always 4, but we are not talking about that now...

But at some stage of development, an information encoding system appeared, which then fully satisfied all the requirements of the information and logical structure for controlling cell processes.

The same classic that everyone points to and immediately begins to refute...

Information array – DNA, RNA. A sequence consisting of a combination of 4 nucleotides: A,T(U),C,G.

The information reading step is 1 nucleotide.

The method of reading information is sequential.

The volume of a single reading is a triplet.

No logical system can count. But she is able to count to one. This is already a lot further. And differentiate different units in two neighboring pairs do the same. And if the axis of symmetry is real, then it is quite capable of determining the logical states of neighboring positions relative to such an axis. But it was apparently very difficult at that stage to further increase the reading area without counting.

And therefore, at that stage - A triplet is the maximum possible form of a system's information unit. Discharge on the axis of symmetry, discharge on the right and discharge on the left.

Three different accounting units...even for step-by-step reading...that's a lot.

The DNA and RNA information coding system uses 4 possible logical states, triplet reading. The complexity for the cell is extreme.

How to prove that a code is triplet? I have already shown this more than once. Let’s write it again: Bases – 4, amino acids – 20, codons or triplets – 64.

The math is simple: 64/3 = 21

This number of non-overlapping triplets can be obtained with a fixation step through one base. These are 20 triplets for amino acids and one STOP codon.

On the other hand: 4 3 = 64, these are the same 21x3 = 63, these are 60 combinations of triplets, 3 stop codons and a start codon, closing the variational set. This is just mathematics, but... it shows that initially, three bases in a row were actually read - a codon at a step of 1 base. This determined the number of amino acids used - 20. Thus, it is still a triplet.

In this case, the degeneracy of the amino acid code in the triplet is clear. It arose from code overlap.

We misunderstand the emergence of codon degeneracy. This is not an expansion of the system’s capabilities in encoding information, but “mistakes of its past.” This is an echo of the original coding system...

Information on the topic:

“P.153: “... one amino acid is encrypted by several codons. Such a code is called degenerate... this kind of degeneracy does not indicate any uncertainty in the construction of the protein molecule... it only means that a certain amino acid can be directed to the appropriate place in the chain of the protein molecule using a few code words.”

Of course, to encode any amino acid in DNA bases, one code triplet is sufficient. Moreover, with non-overlapping coding. Repeat one codon as many times as you like, and get as many molecules of the desired amino acid in the protein. It’s easy, simple, understandable, and energy costs are minimal.

The degeneracy of a triplet code is a necessary measure, directly related to the original method of reading the code. It just happened in the course of evolution.

The mechanism for the appearance of code degeneracy looks like this:

When reading triplets in a step of 1 base, only one sign of the triplet changes at each step, and two signs of the triplet remain constant. Only their positions shift synchronously. With two steps, the information of only one sign of the triplet remains unchanged, but it passes sequentially through all display positions.

Why do we need this?

With 3 coding characters, 2 characters are repeated at each step. And only one changes. In the next step, the second sign will also change. And one sign will remain unchanged along the path traveled. A complete change of signs will occur only after the third step. Only now the new triplet combination will not have the influence of previous combinations.

With a triplet step, each new triplet in formation does not depend on the previous one, but... such a step for such a reading system was then impossible.

And the formed DNA triplets turned out to be dependent on each other during reading.

Such a smooth flow of one triplet into another leads to a limitation in the ability to quickly use all permutations in the triplet. For the possible use of all 64 triplet variants, 64 * 3 = 192 single steps of reading DNA triplets are required. And vice versa, out of 64 steps of reading possible combinations, with sequential step-by-step reading of all codons, from the first to the 64th, there will be 42 repeats, and no more than 1/3 = 21 combinations will be unique. And another 1/3….

This is the answer why there are only 20 amino acids. It could be more, but the system for encoding and reading information does not allow it.

So the cell began to use additional codes from the existing 42 repetitions. She couldn’t do it any other way, because gaps in the broadcast are unacceptable. There is a code - any one, and the ribosome must perform the translation operation. Transitional variants from one independent triplet code to another quickly began to deal with the same 20 amino acids, but depending on the frequency of use. For one there are 6 codes, and for the other one is enough. We register this as code degeneracy.

It is clear that with the use of dependent codons, the base of transport tRNAs should also expand. And so it happened. In a full-scale system, the number of codons on the mRNA must match the number of anticodons on the tRNA. So, a large number of tRNAs only indicates that the system was originally formed in this way.

As we can see, the initial or initial coding system at the stage of the appearance of 4 nucleotides in DNA is clearly visible. Next came the layers of later evolutionary processes. And today we have...what we have.

Initial basic amino acid codes.

On the other hand, if you follow this path, then out of the 64 possible, you can choose some 21 combinations and apply them as the main ones. But which ones?

How could a cell choose? The simplest answer is based on the maximum symmetry of the triplet.

Let's apply the principle of symmetry in searching for the necessary combinations and check how correctly we understand the way of natural coding of amino acids in DNA. To do this, let's collect all the variants of symmetric codes in Table 2. Excellent result..., 15 out of 16 possible amino acids received symmetric codes.

But, there are still 5 amino acids left and STOP.

Apparently Nature walked the same path... and stumbled in the same place. All symmetrical options have been used, there is no room to expand the system, and there are not enough codes. What next option did she use to continue searching for codes?

Now repetitions and one additional element...

Eat. CAA, AAC, UGG, and here is the main Stop codon - UAA.

There are two more codons left to find...

GAC and AUG. The latter became the Start codon...

And the total number of main combinations used in DNA and RNA became 21. Table 2 reflects the search path for the main code designations.

But here, too, the evolutionary logic of development shows an interesting example. Only complete symmetries are used to the end and immediately. The remaining options were not used immediately and not completely. For example, for the amino acid Gly, the main codon GGG was used, and then GGU was added from an unused reserve...

The created coding reserves worked until the last minute. Today, all reserves have long been used up and the time has come to combine functions where possible. For example, for the Start codon. The search began for new ways to expand the capabilities of triplet coding. amino acids in RNA. This is probably how the selection of the main codes went. By symmetry and simplest permutations...

table 2

The logic of action is clear. We may have made a mistake in the sequence of actions, but this is not so important for now. Of course, these are just my variations on the theme; professionals probably know better whether this was how it really was or not, but still... it turned out interesting.

Ends don't meet...

Strange, ... symmetric codes can only be used with triplet reading, without overlap. This point forces us to take another look at the above mathematics of obtaining 20 amino acids for use in triplet coding. Obviously, one does not correspond to the other.

Mathematics shows the objective reality of the element-by-element movement of a ribosome along RNA. But such a widespread use of symmetries in amino acid coding also cannot be accidental, and points to triplets of independent reading.

It is possible that element-by-element reading of RNA information existed before triplet coding and for some time together with the appearance of triplets. It determined the amount of amino acids used.

But at some stage there was a leap in development. The coding system has been completely revised. Triplet independent reading forced us to re-encode the amino acids used based on symmetry. But evolution does not know how to discard old options...

There are already additional codes; we had to redistribute them among amino acids depending on the frequency of their use.

And a paradoxical picture emerged. The readout seems to be non-overlapping, and one codon is enough to encode an amino acid, but all 64 variants were used. The potential redundancy of coding is covered by the degeneracy of the codes. There is a calculated reserve, but in fact there is not. We have already seen how this happened.

Most likely, the rapid development of cellular ribosomes was a factor in the revision of the system. Ultimately, they determine the entire coding system and its application in the cellular organism.

It can be assumed that the information reading zone of the ribosome has long exceeded three digits and has gone far beyond these limits. It became possible to select and remember the information of the desired codon within a large information reading area. This made it possible to leave the ribosome with an element-by-element step, but the possibility of triplet reading in an independent mode was also realized. The ribosome somewhere acquired RAM.

The information reading zone for the ribosome, even in prokaryotes, as we see, has reached 7 nucleotides. And this is not the limit. If we take as a basis that ribosomes have two centers for translation or information reading, then their total area for information reading by one ribosome has already reached 14 nucleotides. Some sections of the codes are taken as triplets, and the rest constitutes the context...

And now…

And now everything is completely confused. According to biologists, the counting occurs in triplets, although no one explains how this happens. The immediate context is not taken into account. Comparing the RNA code sequence and the protein obtained from it is a very difficult task, and it is apparently impossible to clearly understand how the system has changed and what is taken into account during translation.

Moreover, biologists focus not on systematization, but on finding deviations from the system, thereby increasing the already vast variety of facts, and creating a puzzling unsolved problem for themselves. The confusion is complemented by the complete confusion of various deviations in the mechanisms for reading triplets of prokaryotes and eukaryotes into one big crossword puzzle... where they themselves seem to have become confused.

Why? They have different tasks. They work with biological objects, as is customary in their science. Therefore, the conclusions on the issues of RNA coding were reflected in the “swing” theory, and not in the system of principles of information reading and coding theory. They can be understood, but a way out must be found...

The technocratic approach to the problem of understanding DNA coding, proposed by biologists themselves, has not yet exhausted its capabilities. In fact, it hasn’t really been used yet. Only the terminology was used, but not the approach.

Perhaps the time has come to use machine analysis of DNA sequences, taking into account the expanded information reading area in relation to the coding triplet. Then the mechanism of action of the coding context closest to the triplet reading, and possibly also the programming elements of the protein translation process, memorized by the ribosome, will become clear. Such analysis is especially important for studying the untranslated regions of RNA and DNA. Since it is already clear that these are software elements of the coding system. All processes depend on them, including protein translation. The name “garbage” clearly doesn’t suit them at all...

And there cannot be “garbage” in the arrays of strategically important information stored in DNA. No information system can afford this.

The current level of development of computer technology makes it possible to solve these problems. Build an information management system in the cellular structure, clarify communication channels, establish key control elements and a signal system. Then at least the approximate level of technical complexity of this control system will be clear. So far, the only thing that is clear is that the ribosome plays a key role in it, but how technically complex is this universal cellular automaton? How does the technical complexity of the rest of the cell's executive mechanisms look against its background?

I haven't found any answers yet...

Literature:

  1. Garyaev P.P. Tertyshny G.G. Leonova E.A. Mologin A.V. Wave biocomputer functions of DNA. http://nature.web.ru/db/msg.html?mid=1157645&s
  2. Nikitin A.V., Reading and processing DNA information // “Academy of Trinitarianism”, M., El No. 77-6567, pub.16147, 08.11.2010

Nikitin A.V., Problems of understanding the DNA coding system // “Academy of Trinitarianism”, M., El No. 77-6567, pub.16181, 11/27/2010


Each living organism has a special set of proteins. Certain nucleotide compounds and their sequence in the DNA molecule form the genetic code. It conveys information about the structure of the protein. A certain concept has been accepted in genetics. According to it, one gene corresponded to one enzyme (polypeptide). It should be said that research on nucleic acids and proteins has been carried out over a fairly long period. Later in the article we will take a closer look at the genetic code and its properties. A brief chronology of the research will also be provided.

Terminology

The genetic code is a way of encoding the sequence of amino acid proteins involving the nucleotide sequence. This method of generating information is characteristic of all living organisms. Proteins are natural organic substances with high molecularity. These compounds are also present in living organisms. They consist of 20 types of amino acids, which are called canonical. Amino acids are arranged in a chain and connected in a strictly established sequence. It determines the structure of the protein and its biological properties. There are also several chains of amino acids in a protein.

DNA and RNA

Deoxyribonucleic acid is a macromolecule. She is responsible for the transmission, storage and implementation of hereditary information. DNA uses four nitrogenous bases. These include adenine, guanine, cytosine, thymine. RNA consists of the same nucleotides, except that it contains thymine. Instead, there is a nucleotide containing uracil (U). RNA and DNA molecules are nucleotide chains. Thanks to this structure, sequences are formed - the “genetic alphabet”.

Implementation of information

Protein synthesis, which is encoded by the gene, is realized by combining mRNA on a DNA template (transcription). The genetic code is also transferred into the amino acid sequence. That is, the synthesis of the polypeptide chain on mRNA takes place. To encrypt all amino acids and the signal for the end of the protein sequence, 3 nucleotides are enough. This chain is called a triplet.

History of the study

The study of proteins and nucleic acids has been carried out for a long time. In the middle of the 20th century, the first ideas about the nature of the genetic code finally appeared. In 1953, it was discovered that some proteins consist of sequences of amino acids. True, at that time they could not yet determine their exact number, and there were numerous disputes about this. In 1953, two works were published by the authors Watson and Crick. The first stated about the secondary structure of DNA, the second spoke about its permissible copying using template synthesis. In addition, emphasis was placed on the fact that a specific sequence of bases is a code that carries hereditary information. American and Soviet physicist Georgiy Gamow assumed the coding hypothesis and found a method for testing it. In 1954, his work was published, during which he proposed to establish correspondences between amino acid side chains and diamond-shaped “holes” and use this as a coding mechanism. Then it was called rhombic. Explaining his work, Gamow admitted that the genetic code could be a triplet. The physicist’s work was one of the first among those that were considered close to the truth.

Classification

Over the years, various models of genetic codes have been proposed, of two types: overlapping and non-overlapping. The first was based on the inclusion of one nucleotide in several codons. It includes a triangular, sequential and major-minor genetic code. The second model assumes two types. Non-overlapping codes include combination code and comma-free code. The first option is based on the encoding of an amino acid by triplets of nucleotides, and the main thing is its composition. According to the "code without commas", certain triplets correspond to amino acids, but others do not. In this case, it was believed that if any significant triplets were arranged sequentially, others located in a different reading frame would be unnecessary. Scientists believed that it was possible to select a nucleotide sequence that would satisfy these requirements, and that there were exactly 20 triplets.

Although Gamow and his co-authors questioned this model, it was considered the most correct over the next five years. At the beginning of the second half of the 20th century, new data appeared that made it possible to discover some shortcomings in the “code without commas”. It was found that codons are capable of inducing protein synthesis in vitro. Closer to 1965, the principle of all 64 triplets was comprehended. As a result, redundancy of some codons was discovered. In other words, the amino acid sequence is encoded by several triplets.

Distinctive features

The properties of the genetic code include:

Variations

The first deviation of the genetic code from the standard was discovered in 1979 during the study of mitochondrial genes in the human body. Further similar variants were further identified, including many alternative mitochondrial codes. These include the decoding of the UGA stop codon, which is used to determine tryptophan in mycoplasmas. GUG and UUG in archaea and bacteria are often used as starting options. Sometimes genes encode a protein with a start codon that differs from that normally used by the species. Additionally, in some proteins, selenocysteine ​​and pyrrolysine, which are nonstandard amino acids, are inserted by the ribosome. She reads the stop codon. This depends on the sequences found in the mRNA. Currently, selenocysteine ​​is considered the 21st and pyrrolysane the 22nd amino acid present in proteins.

General features of the genetic code

However, all exceptions are rare. In living organisms, the genetic code generally has a number of common characteristics. These include the composition of a codon, which includes three nucleotides (the first two belong to the defining ones), the transfer of codons by tRNA and ribosomes into the amino acid sequence.

Russian scientists have found that DNA hides encoded information, the presence of which makes us consider a person a biological computer, which consists of complex programs.

Experts from the Institute of Quantum Genetics are trying to decipher the mysterious text in DNA molecules. And their discoveries are increasingly convincing that in the beginning there was the Word, and we are the product of the vacuum Superbrain. The President of the ICG spoke about this Petr Petrovich Garyaev.

More recently, scientists have come to an unexpected discovery: the DNA molecule consists not only of genes responsible for the synthesis of certain proteins, and genes responsible for the shape of the face, ear, eye color, etc., but mostly of encoded texts.
Moreover, these texts occupy 95-99 percent of the total chromosome content! ( NOTE: Western scientists consider this an unnecessary part...as they say, it is garbage). And only 1-5 percent is occupied by the notorious genes that synthesize proteins.

The main part of the information contained in chromosomes remains unknown to us. According to our scientists, DNA is the same text as the text of a book. But it has the ability to be read not only letter by letter and line by line, but also from any letter, because there is no break between words. By reading this text with each subsequent letter, more and more new texts are obtained. You can read it in the opposite direction if the row is flat. And if a chain of text is unfolded in three-dimensional space, like in a cube, then the text is readable in all directions.

The text is non-stationary, it is constantly moving, changing, because our chromosomes breathe, sway, generating a huge number of texts. Work with linguists and mathematicians from Moscow State University showed that the structure of human speech, book text and the structure of the DNA sequence are mathematically close, that is, these are really texts in languages ​​still unknown to us. Cells talk to each other, just like you and me: the genetic apparatus has an infinite number of languages.

A person is a self-reading text structure, cells talk to each other in the same way as people talk to each other - concludes Pyotr Petrovich Garyaev. Our chromosomes implement the program for building an organism from an egg through biological fields - photon and acoustic. Inside the egg, an electromagnetic image of the future organism is created, its socio-program is recorded, if you like - Fate.


This is another unexplored feature of the genetic apparatus, which is realized, in particular, with the help of one of the varieties of the biofield - laser fields, capable of not only emitting light, but also sound. Thus, the genetic apparatus manifests its potencies through topographic memory.
Depending on the light with which the holograms are illuminated - and there are many of them, because many holograms can be recorded on one hologram - one or another image is obtained. Moreover, it can only be read in the same color in which it is written.
And our chromosomes emit a wide spectrum, from ultraviolet to infrared, and therefore can read each other’s multiple holograms. As a result, a light and acoustic image of the future new organism appears, and in progression - all subsequent generations.

The program that is written on DNA could not arise as a result of Darwinian evolution: to record such a huge amount of information requires time that is many times longer than the existence of the Universe.

It’s like trying to build a Moscow State University building by throwing bricks. Genetic information can be transmitted over a distance; a DNA molecule can exist in the form of a field. A simple example of the transfer of genetic material is the penetration of viruses into our body, such as the Ebola virus.

This principle of the “immaculate conception” can be used to create some kind of device that allows it to be introduced into the human body and influence it from the inside.
« We have developed, - says Pyotr Petrovich, - laser on DNA molecules. This thing is potentially formidable, like a scalpel: it can be used to treat, or it can kill. Without exaggeration I will say that this basis for creating psychotropic weapons. The principle of operation is this.

Lasers are based on simple atomic structures, and DNA molecules are based on texts. You enter a certain text into a section of the chromosome, and these DNA molecules are transformed into a laser state, that is, you influence them so that the DNA molecules begin to glow and make a sound - talk!
And at this moment, light and sound can penetrate another person and introduce someone else’s genetic program into him. And the person changes, he acquires different characteristics, begins to think and act differently.”

*****

The genetic code appears to have been invented outside the solar system several billion years ago.

This statement supports the idea of ​​panspermia - the hypothesis that life was brought to Earth from outer space. This is, of course, a new and bold approach to the conquest of galaxies, if we imagine that this was a deliberate step by alien superbeings who know how to operate with genetic material.

Researchers suggest that at some stage our DNA was encoded with an alien signal from an ancient extraterrestrial civilization. Scientists believe that the mathematical code found in human DNA cannot be explained by evolution alone.

The galactic signature of humanity.

Surprisingly, it turns out that once the code has been installed, it will remain unchanged over cosmic time scales. As the researchers explain, our DNA is the most durable “material” and that is why the code is an extremely reliable and intelligent “signature” for those aliens who will read it, says Icarus magazine.

Experts say: “The recorded code can remain unchanged over cosmic time scales; in fact, this is the most reliable design. Therefore, it provides an exceptionally robust storage solution for smart signatures. The genome, having been appropriately rewritten into a new code with a signature, will remain frozen in the cell and its offspring, which can then be carried through space and time.”

Researchers believe that human DNA is organized in such a precise way that it reveals “a set of arithmetic and ideographic structures of symbolic language.” Scientists' work leads them to believe that we were literally "created outside of Earth" several billion years ago.

The universal language of the Universe - living cosmic codes

These ideas and beliefs are not accepted in the scientific community. However, these studies proved what some researchers have been saying for decades, that evolution could not have happened on its own, and that there is something extraterrestrial about our entire species.

However, these studies and statements do not reveal the main secret. A mystery that remains as it is now; if extraterrestrial beings truly created humanity and life on planet earth, then “who” or “what” created these extraterrestrial beings?


So, we are the MESSAGE?
Humanity has been assigned the role of SMS with a view to the future...


Source - http://oleg-bubnov.livejournal.com/233208.html
.

An intelligent signal is written in the genetic code

Scientists have discovered in the genetic code a whole series of purely mathematical and ideographic language constructs that cannot be attributed to chance. This can only be interpreted as a reasonable signal.

In 2013, the results of a study were published, the authors of which tried to apply the technique of searching for a signal from an extraterrestrial intelligent source (SETI project) not to the vast expanses of the Universe... but to the genetic code of terrestrial organisms.

“...We show that Earth's code exhibits a highly precise ordering that satisfies the criteria of an information signal. Simple code structures reveal a coherent whole of arithmetic and ideographic constructions of the same symbolic language. Precise and systematic, these hidden constructs appear to be the products of precise logic and non-trivial calculations, rather than the result of stochastic processes (the null hypothesis that this is the result of chance, together with the putative evolutionary mechanisms, is rejected with significance< 10-13). Конструкции настолько чётки, что кодовое отображение уникально выводится из своего алгебраического представления. Сигнал демонстрирует легко распознаваемые печати искусственности, среди которых символ нуля, привилегированный десятичный синтаксис и семантические симметрии. Кроме того, экстракция сигнала включает в себя логически прямолинейные, но вместе с тем абстрактные операции, что делает эти конструкции принципиально несводимыми к естественному происхождению. ...»

Thus, the genetic code is not only a code used to record information necessary for the construction and functioning of living organisms, but also a kind of “signature”, the probability of a random origin of which is less than 10-13. This practically without alternative indicates an intelligent source of creation genetic code.

Signs of the body associated with certain proteins. Proteins are made up of amino acids. Hereditary information about proteins is stored in DNA, which consists of nucleotides. Questions arose: 1) how is the hereditary information about proteins encoded in DNA? In 1961, a gene code was created - the principle of recording the traces of information about the afterbirth of amino acids in the protein through the afterbirth of DNA nucleotides. Saints of the gene code: 1) tripletity is a position of one amino acid - you are encoded by a combination of 3 nucleotides (a combination of 3 nucleotides - triplet, codon). There are 64 possible triplets known and 3 of them do not carry a semantic load and are Stop codons: UAA, UAG, UGA. 2) DEGENERITY - position 1 amino acids can be encoded by several triplets or codons. 3) THE GENE CODE DOES NOT OVERLIP WITHIN 1 GEN. i.e. 1 nucleotide cannot simultaneously correspond to 2 triplets. 4) GENETIC CODE WITHOUT COMMA, i.e. there are no free nucleotides between triplets. 5) THE GENETIC CODE IS UNIVERSAL, that is, the same in different organizations. 6) STABLE, that is, it does not change over generations. When characterizing the gene code, the concept of complementarity is used, that is, the complete correspondence of the afterbirth amii-t in the protein, the afterbirth nucleotide of DNA. Implementation will follow inf in the cell. In eukoryotic cells, almost all the DNA is in the nucleus. Protein synthesis occurs in the cytoplasm, which means there must be an intermediary, the cat will transfer the inf gene from the nucleus to the cytoplasm. The messenger is an mRNA molecule. The implementation of inherited inf consists of 2 processes: 1) transcription - synthesis of RNA molecules on DNA, as on a matrix. 2) translation - translation of the last nucleotides into an amino acid sequence.

29) Realization of biological information in the cell. Transcription. post-transcription processes. Splicing phenomenon. Transfer of biological information to protein (translation). In a eukaryotic cell, all DNA is located in the nucleus; protein synthesis occurs in the cytoplasm on ribosomes. This means that there is an intermediary that transfers hereditary information from the nucleus to the cytoplasm, it is an mRNA molecule. Thus, the mechanisms for implementing hereditary information in a cell consist of the processes of transcription and translation, which also occur during the active work of genes. Transcription- this is the synthesis of an RNA molecule on DNA, as on a matrix, a complex enzymatic process that requires the consumption of ATP energy. Section 1 of the DNA chain is the template for the synthesis of RNA molecules. Synthesis comes from free nucleotides and is based on the principle of complementarity. The main synthesis enzyme is RNA polymerase. A type of this enzyme exists in a prokaryotic cell. In a eukaryotic cell, there are 3 types of this enzyme: 1) RNA polymerase1 - responsible for the synthesis of r-RNA 2) RNA polymerase2 - for the synthesis of i-RNA 3) RNA polymerase3 is responsible for all small RNA molecules, in particular t-RNA and some rRNA species with low molecular weight. The transcription process consists of 3 stages: initiation (beginning); elongation (lengthening); termination (end). At the first stage, the enzyme RNA polymerase recognizes the last nucleotide before the gene. This last nucleotide is called pronator . Having recognized the pronator, RNA polymerase fixes on it. This unwinds the DNA double helix. The area corresponding to this gene becomes free. Section 1 of the chain DNA becomes a matrix for the synthesis of RNA molecules. During the second stage, RNA polymerase moves along the DNA section, synthesizing RNA molecules in the 5"-3" direction. At stage 3: RNA synthesis continues until RNA polymerase reaches the last nucleotide at the end of the gene. These last nucleotides are called a transcription termination signal or stop signal. Here the transcription ends. Conclusion: during transcription, the primary transcript of mRNA, r-RNA, and t-RNA is synthesized.



Primary mRNA transcript. In a prokaryotic cell, a mature RNA molecule is immediately synthesized, which becomes a matrix for the synthesis of a protein molecule. In a eukaryotic cell, an immature mRNA molecule is synthesized, which is called pro-RNA. Then, in the nucleus, at the end of transcription, maturation occurs in the nucleus - processing. It includes 3 stages: 1) Capping. One chemically modified nucleotide, methylguanosyl, is attached to the 5’ end of the pro-mRNA molecule. A cap structure is formed. This structure further facilitates the binding of mRNA to the ribosome. 2) Polyadenylation. 100-200 adenyl nucleotides are added to the 3' end of the pro-RNA. A polyadenylated region is formed, it stabilizes the mRNA molecule and promotes its release from the nucleus into the cytoplasm.

3) Splicing. The mRNA contains exons and introns. Splicing is the removal of introns from a pro-RNA molecule and the joining of exons using ligases. As a result of processing, a mature -RNA is formed, which corresponds in length to 1/10 of the primary transcript. Then this RNA goes into the cytoplasm, then only 3-5% leaves the nucleus, and the rest is destroyed in it. Broadcast . Components necessary for protein biosynthesis: amino acids, t-RNA, mRNA, ribosomes, ATP, enzymes. Broadcast warehouse of 3 stages: initiation, elongation, termination. Initiation : a complex of mRNA and ribosomes is formed. This is facilitated by the cap region of mRNA. The first initiator tRNA is suitable for this complex. With its anticodon, t-RIK recognizes the initiation codon in mRNA-AUG (mythionine). In a eukaryotic cell, the first amino acid of the polypeptide chain is methionine. At the end of protein biosynthesis, this amino acid can be removed from the polypeptide chain. Elongation : in the ribosome there is a functional center of 2 sections: a-section (amino acid-t-RNA binding section, t-RNA with an amino acid, i.e. aminoacyl-t-RNA, comes to this section), b) section (peptide-tRNA connection site. In this area there is a tRNA associated with the peptide - peptidyl-tRNA). The functioning of these regions is associated with the elongation of the protein chain. Suppose a certain peptide chain has already been synthesized, the peptidyl-tRNA complex is located in the P site of the ribosome. A tRNA with an amino acid arrives at the A site of the ribosome. If the t-RNA anticodon is complementary to the i-RNA codon, then this t-RNA with the amino acid remains in the A-site. Ribosomal enzymes break the bond between t-RNA and the peptide, which is located in the P-site, and free t-RNA leaves the P-site. Other ribosomal enzymes, transferases, establish a peptide bond between the peptide and the amino acid located in the A site of the ribosome. This is how the peptide chain lengthens by one amino acid. The ribosome takes one step equal to 3 nucleotides along the i-RNA molecule, the t-RNA-peptide complex moves from the A-site to the P-site, and the A-site is free and ready to accept a new t-RNA with an amino acid. Termination – elongation of the polypeptide chain continues until the A-site of the ribosome comes to one of the stop codons, because not a single amino acid corresponds to them, and this ends the biosynthesis of becle, the polypeptide chain, the mRNA molecule are released, and the ribosome breaks up into subunits. If a cell needs a large amount of this protein, then a complex of several ribosomes and mRNA is formed - polysome.

In any cell and organism, all anatomical, morphological and functional features are determined by the structure of the proteins that comprise them. The hereditary property of the body is the ability to synthesize certain proteins. Amino acids are located in a polypeptide chain, on which biological characteristics depend.
Each cell has its own sequence of nucleotides in the polynucleotide chain of DNA. This is the genetic code of DNA. Through it, information about the synthesis of certain proteins is recorded. This article describes what the genetic code is, its properties and genetic information.

A little history

The idea that there might be a genetic code was formulated by J. Gamow and A. Down in the mid-twentieth century. They described that the nucleotide sequence responsible for the synthesis of a particular amino acid contains at least three units. Later they proved the exact number of three nucleotides (this is a unit of genetic code), which was called a triplet or codon. There are sixty-four nucleotides in total, because the acid molecule where RNA occurs is made up of four different nucleotide residues.

What is genetic code

The method of encoding the sequence of amino acid proteins due to the sequence of nucleotides is characteristic of all living cells and organisms. This is what the genetic code is.
There are four nucleotides in DNA:

  • adenine - A;
  • guanine - G;
  • cytosine - C;
  • thymine - T.

They are denoted by capital Latin or (in Russian-language literature) Russian letters.
RNA also contains four nucleotides, but one of them is different from DNA:

  • adenine - A;
  • guanine - G;
  • cytosine - C;
  • uracil - U.

All nucleotides are arranged in chains, with DNA having a double helix and RNA having a single helix.
Proteins are built on where they, located in a certain sequence, determine its biological properties.

Properties of the genetic code

Tripletity. A unit of genetic code consists of three letters, it is triplet. This means that the twenty amino acids that exist are encoded by three specific nucleotides called codons or trilpets. There are sixty-four combinations that can be created from four nucleotides. This amount is more than enough to encode twenty amino acids.
Degeneracy. Each amino acid corresponds to more than one codon, with the exception of methionine and tryptophan.
Unambiguity. One codon codes for one amino acid. For example, in a healthy person's gene with information about the beta target of hemoglobin, a triplet of GAG and GAA encodes A in everyone who has sickle cell disease, one nucleotide is changed.
Collinearity. The sequence of amino acids always corresponds to the sequence of nucleotides that the gene contains.
The genetic code is continuous and compact, which means that it has no punctuation marks. That is, starting at a certain codon, continuous reading occurs. For example, AUGGGUGTSUAUAUGUG will be read as: AUG, GUG, TSUU, AAU, GUG. But not AUG, UGG and so on or anything else.
Versatility. It is the same for absolutely all terrestrial organisms, from humans to fish, fungi and bacteria.

Table

Not all available amino acids are included in the table presented. Hydroxyproline, hydroxylysine, phosphoserine, iodine derivatives of tyrosine, cystine and some others are absent, since they are derivatives of other amino acids encoded by m-RNA and formed after modification of proteins as a result of translation.
From the properties of the genetic code it is known that one codon is capable of encoding one amino acid. The exception is the genetic code that performs additional functions and encodes valine and methionine. The mRNA, being at the beginning of the codon, attaches t-RNA, which carries formylmethione. Upon completion of the synthesis, it is cleaved off and takes the formyl residue with it, transforming into a methionine residue. Thus, the above codons are the initiators of the synthesis of the polypeptide chain. If they are not at the beginning, then they are no different from the others.

Genetic information

This concept means a program of properties that is passed down from ancestors. It is embedded in heredity as a genetic code.
The genetic code is realized during protein synthesis:

  • messenger RNA;
  • ribosomal rRNA.

Information is transmitted through direct communication (DNA-RNA-protein) and reverse communication (medium-protein-DNA).
Organisms can receive, store, transmit it and use it most effectively.
Passed on by inheritance, information determines the development of a particular organism. But due to interaction with the environment, the reaction of the latter is distorted, due to which evolution and development occur. In this way, new information is introduced into the body.


The calculation of the laws of molecular biology and the discovery of the genetic code illustrated the need to combine genetics with Darwin's theory, on the basis of which a synthetic theory of evolution emerged - non-classical biology.
Darwin's heredity, variation and natural selection are complemented by genetically determined selection. Evolution is realized at the genetic level through random mutations and the inheritance of the most valuable traits that are most adapted to the environment.

Decoding the human code

In the nineties, the Human Genome Project was launched, as a result of which genome fragments containing 99.99% of human genes were discovered in the two thousandths. Fragments that are not involved in protein synthesis and are not encoded remain unknown. Their role remains unknown for now.

Last discovered in 2006, chromosome 1 is the longest in the genome. More than three hundred and fifty diseases, including cancer, appear as a result of disorders and mutations in it.

The role of such studies cannot be overestimated. When they discovered what the genetic code is, it became known according to what patterns development occurs, how the morphological structure, psyche, predisposition to certain diseases, metabolism and defects of individuals are formed.



Have questions?

Report a typo

Text that will be sent to our editors: