1

I have a function that finds and prints the longest common chain between two DNA chains. However I want to add some checks so my program can ignore characters that are not bases ('A', 'T', 'C', 'G') For example CCAATTFFA and CCAATTKA have common: CCAATTA Here is my code:

void CommonSubStr(char *X, char *Y, long int m, long int n) {
    long int maxCommonChain = 0;  
    long int end = 0; 
    for (long int i = 0; i < m; i++) {
        for (long int j = 0; j < n; j++) {
            long int currentLength = 0;  
            long int x = i, y = j; 
            while (x < m && y < n && X[x] == Y[y]) {
                currentLength++;
                x++;
                y++;
            }

            if (currentLength > maxCommonChain) {
                maxCommonChain = currentLength;
                end = i + maxCommonChain - 1; 
            }
        }
    }

    if (maxCommonChain == 0) { 
        printf("No common substring found.\n");
        return;
    }

    long int start = end - maxCommonChain + 1; 
    for (long int i = start; i <= end; i++) { 
        if (X[i] == 'A' || X[i] == 'C' || X[i] == 'G' || X[i] == 'T') {
            printf("%c", X[i]);
        }
    }
    printf("\n");
}

Can anyone help me with the checks I should add? I've tried a lot of checks in the while loop but none of them work.

5
  • Show us what you tried that didn't work. We'll help you fix problems with your code; we won't usually just write the code for you. Commented Jan 12, 2024 at 18:08
  • 2
    Would it not be simplest to preprocess the input to eliminate the non-base characters and do the longest common sub-sequence on the "known to be clean" data? A primary reason for not doing that might be "I need to identify the subsequences in the original data". But remember that in general, you will have two subsequences of different lengths — there is the length of the valid subsequence, the length of the LHS mixed subsequence, and the length of the RHS mixed subsequence. Obviously, the start positions of the two subsequences could be different. Commented Jan 12, 2024 at 18:09
  • @ms complaints Strings in C are sequences of characters terminated by the zero character '\0'. Why are there used the third and fourth parameters in your function declaration? And why the function does not return anything? Commented Jan 12, 2024 at 18:14
  • You should probably have a function that finds the next valid base in each substring. It would need the current position in the string and the length of the string (as well as the string itself) as arguments, and it would need to return, in some shape or form, the new current position and possibly the base itself. Consider using a structure to pass to this function — though there are other possible designs. Anyway, abstract the 'next base' code into a function that's called twice, once for each string. Commented Jan 12, 2024 at 18:19
  • You should probably use const char *X, const char *Y in the argument list; you probably don't want your code modifying those strings. Commented Jan 12, 2024 at 18:26

1 Answer 1

1

If you are dealing with strings then the third and the fouth parameters of the function should be removed.

As passed strings are not changed within the function then the corresponding parameters shall be declared with qualifier const.

The function should do only one thing: determine the common prefix. So it should return some result. The result can be for example pointers after the last equal characters of the common prefix in the both strings. As you just want to output the common prefix then the function could return a string that contains the common prefix.

Within your function the first nested three loops do not make sense. And in general your code is unclear. As any unclear code it has logical errors.

I can suggest the following function declaration and its definition as shown in the demonstration program below.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char * CommonSubStr( const char *s1, const char *s2, const char *s3 )
{
    size_t n = 0;

    for (const char *p1 = s1, *p2 = s2;
        ( p1 = strpbrk( p1, s3 ) ) != NULL && 
        ( p2 = strpbrk( p2, s3 ) ) != NULL && 
        *p1 == *p2;
        ++p1, ++p2)
    {
        ++n;
    }

    char *common_prefix = calloc( n + 1, sizeof( char ) );

    if (common_prefix != NULL)
    {
        char *current = common_prefix;

        for (const char *p1 = s1; n--; )
        {
            p1 = strpbrk( p1, s3 );
            *current++ = *p1++;
        }
    }

    return common_prefix;
}

int main( void )
{
    char *common_prefix = CommonSubStr( "CCAATTFFA", "CCAATTKA", "ATCG" );

    if (common_prefix != NULL) printf( "\"%s\"\n", common_prefix );

    free( common_prefix );
}

The program output is

"CCAATTA"

If the common prefix is empty the function returns an empty string.

Now having the above shown function it is easy to write a function that just outputs the common prefix of two strings in any stream.

Here you are.

#include <stdio.h>
#include <string.h>

FILE * CommonSubStr( const char *s1, const char *s2, const char *s3, FILE *fp )
{
    while(  ( s1 = strpbrk( s1, s3 ) ) != NULL &&
            ( s2 = strpbrk( s2, s3 ) ) != NULL &&
            *s1 == *s2 )
    {
        fputc( *s1, fp );

        ++s1;
        ++s2;
    }

    return fp;

}

int main( void )
{
    fputc( '\n', CommonSubStr( "CCAATTFFA", "CCAATTKA", "ATCG", stdout ) );
}

The program output is

CCAATTA

As you can see the function implementation is very simple. There is only one loop due to using the standard C string function strpbrk.

If you want to output a message when a common substring is not found then just add an additional variable as for example

FILE * CommonSubStr( const char *s1, const char *s2, const char *s3, FILE *fp )
{
    size_t n = 0;

    while(  ( s1 = strpbrk( s1, s3 ) ) != NULL &&
            ( s2 = strpbrk( s2, s3 ) ) != NULL &&
            *s1 == *s2 )
    {
        ++n;

        fputc( *s1, fp );

        ++s1;
        ++s2;
    }

    if ( n == 0 ) fprintf( fp, "%s", "No common substring found." );

    return fp;

}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.